Documentation and metadata

Metadata is crucial for making relevant shared information, discoverable and accessible to others. In short - metadata can be described as structured information, that describes a resource, such as research data or other publications. Metadata also clarifies and contextualizes documents/collections, making them searchable. Therefore, it is imperative to make time for filling in metadata fields when research data is being published in order to allow the information to be located and reused. The purpose of metadata is to facilitate automatic management and categorization of information. In order for this to function, created metadata must follow certain existing standards. Instead of documentation, which is legible for human users, metadata is required so that it is legible for computer software.

What are the different kinds of metadata?

  • Descriptive: Information concerning contents and context. This is used to enable others to cite the information using scholarly notation. Examples of descriptive metadata include: titles, authors, subject, keywords, abstracts, methodology, etc.
  • Administrative: Information that allows the data to be categorized and correctly managed. Examples of administrative metadata are: file format, rights/licenses/copyright, preservation, etc.
  • Structural: Structural metadata is necessary to organize the previous two categories. Examples of structural metadata are: persistent links (e.g. DOI or URN), relational data as to how separate files are associated with one another, etc.

Metadata can be used for several purposes, such as:

  • Citations: Creates the possibility for rewarding and recognizing those who have created the content.
  • Reusability: In order to enable others to build upon research, they need to be able to easily understand how information has been structured. There must be a sufficient amount of metadata to allow another researcher to understand, for example - how data collection was performed and the meanings of different variables.
  • Searching/Finding: So that others are able to find the information and verify that it is correct. Metadata needs to answer the questions: Who? What? Where? When? Why? How?
  • Interpretation of data: By making it possible to understand how information was structured and collected, the data can be interpreted through different perspectives, thus more thoroughly evaluating the results. It is often also a great help for an author who wants to reuse their own data months or years later.

Data that cannot be shared

For certain legal, ethical or commercial reasons, it is not possible to share all data openly. It may, however, be possible to make the information searchable without granting access to the raw data.

When can research data not be shared openly?

  • If research data contain sensitive personal details or sensitive information. Bear in mind that non-sensitive personal details within the material can be published if made anonymous.
  • If there is no written consent from participants in a study whereby they agree to open publication of results (documentation of this is required).
  • If it includes materials to which someone else owns the copyright.
  • If the material contains information which reveals proprietary or financial information.
  • If the material has not undergone ethical vetting when such vetting is necessary.

Even if the data to be shared falls into one of the categories above, it is still possible to publish information stating that the research data has been collected. Many data repositories offer the option to register only information about data using keywords and a description. It is recommended that contact information is included when registering the metadata so that users can send inquiries concerning the data made searchable.

Information security

All employees of Stockholm University must work actively, efficiently and continuously with information security – that is to say, how different types of information are handled in different contexts.

The University’s information security procedures are coordinated by IT Services. They operate in terms of confidentiality, accuracy, traceability and accessibility.

  • Confidentiality - means that no unauthorized party will have access to the information. This is of particular importance for researchers who manage research data containing personal details.
  • It is also important to researchers that they are able to guarantee the accuracy of data, namely that the information will not allow unauthorized persons to make changes to it; neither intentionally nor unintentionally.
  • Traceability - means that it is possible to trace who did what within a system. This can be particularly important if a researcher handles sensitive personal details or other confidential information.
  • In order to fulfill the ideal of open research data, it is necessary that the material is made accessible. The information should always exist and be reachable by users when needed, either via the internet or by request.

Accessibility and long-term storage

When research data is made digitally accessible, it is important to consider the type of file format in which the information has been saved so that others can reuse the material. All types of digital file format have a risk of being made obsolete and thereby become illegible in the future. If this occurs, there is a risk that valuable research data may be lost.

The most important points concerning file formats to be considered by researchers are:

  • To use a preservation format right from the beginning if it is possible.
  • To use a file format which is not proprietary (e.g. .csv for tabular (spreadsheet) data and .txt or .odt/.odf for text).
  • To use a file format which follows an open standard, such as those developed by OASIS.
  • To use a file format which is commonly used.

The Swedish National Data Service (SND) has evaluated a number of different fileformats that they consider suitable for handling, long-term storage and the availability of research data. However, these may change over time according to technical developments.

Regulation RA-FS 2009:2 issued by the Swedish National Archives explains which types of file format have been accepted for preservation. The type of file format to be used depends on the original usage of the data. For example, databases and registers could have been stored as sequential files or XML files, while office documents may have been stored as PDF/A files. Unfortunately, there are currently no recommended preservation formats for digital sound or image files.

Archiving research information

Stockholm University is a state body, therefore taking responsibility for housing preservation archives of official documents. Research information created in connection with research projects is a meaningful part of the university archives and must be archived. It is important to consider preserving information, which will allow others to understand what occurred during the research project, thus how the material should be interpreted.

Delete – Purge (gallra) – Preserve/Archive

After a research project is completed, it is important to delete and purge documents and collect and arrange what is to be preserved and archived. Documents (arbetshandlingar) can be deleted. Public documents (allmänna handlingar) of incidental or limited importance can be purged (gallrad), but support is required in the regulations. If you are unsure what types of documents that you can delete or purge, please contact the central archive function at Stockholm University at arkivet@su.se.

Research material to Preserve/Archive

Administrative documents, eg.

Applications and funding decisions, also rejected.
Disclosure of information subject to reservation, OSL Chapter 10 Section 14.
Transfer of confidentiality, OSL Chapter 11 Section 3.
Ethical vetting, permit and disclosure of information subject to reservation.
Data Management Plan.

Economic documents, e.g.

Financial interim and final reports to the financier.

Research data, eg.

Research data (raw and processed).
Consent forms and information about processing of personal data.
Code keys, variable list or codebook explaining the variables in your data.
Methodology descriptions.
Templates and forms.
Software, computer code used to perform analyses.
In addition, all information needed to understand the material.

One advantages of early upload/“posting” to a repository e.g Figshare is that you prepare for automatic Archiving (= long-term preservation), meaning you will not have to send in/upload the same metadata and/or data files again for local archiving.

Publications and other research outputs, eg.

Theses, interim and final reports, papers, presentations

Research material should be submitted to the Archives and Records Office.