Would you like to have this page translated automatically? Then click on the link below to download data from the external Google Translate Server. Note: Personal data (e.g. your IP address) will be transmitted to this server. Google Translate now ...
Content
sprungmarken_marker_3982
Goals
RADAR4KIT - Research Data Repository for KIT - is an interdisciplinary research data repository for archiving and publishing research data from scientific studies and projects of KIT researchers. In RADAR4KIT, research data is understood as digital data generated in the research process.
RADAR4KIT stores research data in the form of data sets, which may consist of one or more files. These contain the actual research data as well as descriptive metadata. In the following, such a compilation is referred to as a "data package".
Operator
RADAR4KIT is offered by the Karlsruhe Institute of Technology ("Operator") and is based on the RADAR service offered by FIZ Karlsruhe. Data is stored exclusively on the IT infrastructure at the Scientific Computing Center (SCC) of KIT. The service is primarily aimed at KIT researchers (" data providers ") who want to archive or publish their data. In RADAR4KIT, data can be made accessible to third parties ("data users") or published on the Internet.
Registration and User Profiles
RADAR4KIT is an online service and can only be used via the Internet. All KIT researchers can register with RADAR4KIT via their KIT account (Shibboleth). In addition, in special cases, further accounts can be created for KIT external users.
Roles and Rights
To upload research data, KIT Library (administrator) has to set up a work area first. For that please refer to KIT Library directly (see below for contact details).
The administrators can then set up separate work areas for different user groups (e.g. research groups, projects, institutes). Subsequently, the administrator can provide additional users registered with RADAR4KIT with rights as data providers for one or more of these workspaces via the online service ("Curator"). Usually the administrator will designate staff members of his own institution as curators; however, if necessary, he can also authorize persons from outside the institution. In RADAR4KIT, curators can upload, edit, archive and, if necessary, publish research data exclusively in the workspaces designated for them by the administrator. Curators designated by the administrator can authorize other users registered with RADAR4KIT as subcurators for their workspace. Subcurators are also data providers, but do not have the possibility to archive or publish data packages or to appoint other users as subcurators.
Services
RADAR4KIT can be used via a web-based user interface with current web browsers or via a REST-based programming interface (API). The data transmitter can create data packages within the workspace assigned to it and assign individual files or ZIP files with several contained files to them, which it transfers to RADAR4KIT via the Internet. He can add or delete individual data via the user interface.
The data provider can describe data packages with metadata. The service provides a form for this purpose on the RADAR4KIT platform. Alternatively, the metadata can be created offline as an XML file and then uploaded to the RADAR4KIT platform. The RADAR4KIT platform allows data providers to download a template for a corresponding XML file as well as an XML schema for validating the metadata in the latest version. The administrator also has the possibility to define default values for descriptive metadata for workspaces. These default values are then suggested to data providers when the descriptive metadata is created.
Once the compilation of a data package and its description with metadata is complete, the curator can choose between two options: archiving or publication of the research data.
Archiving allows the optional description of the data package with descriptive metadata. Normally, neither data nor metadata are made publicly accessible, but this can be changed by the curator independently by granting appropriate rights via the online system. In this case, the data is not assigned a Persistent Identifier. The curator must determine the desired retention period.]{.T3}
For the publication of a data package, its valid description in the form of descriptive metadata and the granting of a license by the data provider are required. In principle, metadata and data are publicly searchable and accessible. For the data, the data provider can optionally determine an embargo period during which only the metadata are publicly searchable and accessible. After the embargo period has expired, the data automatically becomes publicly accessible. RADAR4KIT assigns a Persistent Identifier (here: Digital Object Identifier, in short DOI) for each published data package and registers it with DataCite.
For the publication of a data package, its valid description in the form of descriptive metadata and the granting of a license by the data provider are required. Basically, metadata and data are publicly searchable and accessible. For the data, the data provider can optionally determine an embargo period during which only the metadata are publicly searchable and accessible. After the embargo period has expired, the data automatically becomes publicly accessible. RADAR4KIT assigns a Persistent Identifier (here: Digital Object Identifier, in short DOI) for each published data package and registers it with DataCite. With this DOI the published data package is persistently identifiable, citable and can be linked to a conventional scientific publication, e.g. at KITopen. At the same time, the descriptive metadata are transferred to DataCite. The data provider must license it under Creative Commons Zero (CC0) 1.0 Universal. For the transfer, the descriptive metadata is automatically converted from RADAR format to DataCite format. Furthermore, the descriptive metadata in both RADAR and DublinCore format will be publicly offered for harvesting via an OAI provider.
For published and archived data packages a license must be selected from a predefined list of recommended licenses. The list includes all Creative Commons 4.0 International licenses (including CC0 1.0 Universal), an \"All rights reserved\" license and the option to specify a proprietary license. The operator reserves the right to add further licenses to the list.
Temporary storage
For the compilation and description of data packages, the operator provides so-called temporary storage. If the total available temporary storage is full, no further data can be added by the authorized data providers until files are deleted, data packages are archived or published or the temporary storage is increased. In the case of archiving or publication, the data packages move from temporary storage to permanent storage. These data packages then no longer occupy any space in temporary storage. RADAR4KIT restricts the use of temporary storage for a limited time. The maximum usage time of the temporary storage is checked per data packet. Data packets can be left in temporary storage for a maximum of six months. After this period, they must either be archived, published or deleted. One month before the end of the storage period, RADAR4KIT sends an e-mail to the data provider, informing them that the storage period is about to end. Every week, RADAR4KIT sends an e-mail reminder to the data submitter with a notice that the retention period is about to end. If the data package has not been deleted, archived or published by the data provider after six months, RADAR4KIT will delete the data package.
Quotas
In RADAR4KIT the maximum usable size of temporary storage, permanent storage used for archived data packets and permanent storage used for published data packets is limited. The amount of storage available per employee for the individual categories is regulated in the basic IT contingent.
Holding periods and immutability of data packages
RADAR4KIT enables the permanent and unaltered storage of data packets for a defined period of time ("retention period"). For archived data packages, the data provider specifies a retention period. The actual retention period for archived data packets may be shorter if the RADAR4KIT service is discontinued before the end of the retention period. There is no need to select a retention period for published data packets, it is in principle unlimited. The KIT guarantees an actual retention period of at least 10 years.
During the retention period, RADAR4KIT does not modify the stored data packets, but only ensures their physical preservation ("bitstream preservation"). Accordingly, RADAR4KIT does not guarantee the long-term usability or interpretability of the data contained in a data packet, as this depends on the availability of the data formats selected by the data provider and the corresponding programs for interpreting them.
Data packets in permanent memory cannot be changed. Deletions can be carried out by the administrator in justified exceptional cases after consultation with the operator. Justified exceptional cases include, for example, legal violations or incorrect data. In case of a deletion only the data is deleted, but not the metadata. These contain an indication that the data has been deleted.
Assessment of research data
RADAR4KIT supports a review process before data publication. For this purpose a data package can be set to the status "in review" before publication. In this state the data package is no longer editable. RADAR4KIT generates a unique link that the data provider can pass on to the responsible publisher or reviewers. This link allows access to the not yet published data package without prior authentication. After completion of the review process, the data submitter can either return the status for the work package to edit mode or publish the data package. In both cases, the generated unique link becomes invalid, so that reviewers can no longer access the data package. The data provider can set a data package to the status "in review" several times in succession.
Beyond that, the operator does not perform any further quality assurance of the content of the posted research data. The data providers themselves are responsible for this.
Technical and organizational measures for data security
Data in the temporary storage is located on magnetic disks, which are protected against data loss due to failure of individual disks via a RAID6 array. In addition, the stored data is incrementally written to tape once a day as a backup. The backup is done at file level and is kept for two generations.
Archived and published data packages are stored on magnetic tapes in so-called tape libraries. RADAR4KIT stores all data packages moved to permanent storage in two copies at different locations on different tapes. The storage is done at the Scientific Computing Center (SCC) of the Karlsruhe Institute of Technology (KIT) at two locations. Data packets are provided with a checksum before storage, which is automatically checked after each copying process. This enables errors during data transmission to be detected and eliminated ("end-to-end checksum"). When a data packet is accessed, the checksum is calculated again and compared with the stored value to identify any data consistency errors. If, exceptionally, an error is detected, RADAR4KIT accesses the second copy of the data packet. Regular checks of the copies for possible bit errors ("fixity checks") are currently not performed. The SCC always keeps its respective storage infrastructures up to date. As a result, all data is migrated to new data carriers within a period of about five to eight years. With each read operation and at the latest during this data migration, a check for bit errors is performed.
The data packages intended for archiving or publication are transferred to permanent storage before being transferred to the permanent storage in a structure corresponding to the BagIt specifications. This structure contains not only the actual research data in its original arrangement with all files and directories but also technical and descriptive metadata as well as a manifest corresponding to the specification. The BagIt structure is summarized in a TAR file and stored as an Archive Information Package (AIS) according to the OAIS standard.
Accesses to the temporary storage are synchronous, i.e. the delivery of the requested data starts without any noticeable delay. Accesses to the permanent tape storage are asynchronous, i.e. in some cases several minutes may pass between request and delivery. In times of high access rates, the waiting time can exceptionally be in the hourly range. Frequently accessed data packets, even if they are already stored in permanent memory, are regularly delivered quickly (i.e. synchronously) via a cache. An assurance of fast delivery from the permanent memory cannot be given.
Assignment of access rights and embargoes
Data packages that have not yet been archived or published, i.e. are in the processing state, can only be viewed by the data providers and administrators. There is an exception for reviewers (see section "Review of research data"). A curator can grant the right to act as a data provider (curator or subcurator) to other users registered with RADAR4KIT at any time in their own workspace.
Archived data packages are normally only accessible to data providers and the administrator. The data provider can grant other users registered with RADAR4KIT the right to view the descriptive metadata and retrieve the archived data packages. If desired, he can also make the archived data set publicly accessible. These rights assignments can be changed by the data provider at any time.
Archived data (unless the data provider has made it publicly accessible in whole or in part) cannot be found either via search or via OAI. Third parties can neither view nor research the data or the metadata.
Published data packages as well as archived data packages for which the data provider has set the access rights in such a way that they are accessible to the public without restriction can be retrieved by all data users who are registered with RADAR4KIT and anonymous (not registered) data users. The descriptive metadata is searchable in the web interface and is additionally offered for harvesting via an OAI provider. Furthermore, they are publicly accessible at www.datacite.org. This also applies if the actual research data is still under embargo. The data provider can set up an embargo period of up to 12 months after publication for the actual research data, during which only the metadata is searchable and retrievable, but not the research data. After the embargo period has expired, the research data will also be generally accessible.