Den här sidan på svenska.

Publish in a data repository

There are major advantages with publishing research data formally in a data repositoty rather than sharing it informally on a webpage, between colleagues, or as an appendix to a published article. Research data that is published in a data repository is a publication on its own. It is:

  • described and documented in an interpretable and standardized way, so that it can be correctly understood and reused.
  • given a persistent identifier (PID, usually a digital object identifier, DOI), so that it can be persistently found, retrieved and linked to other publications.
  • given the user license you deem appropriate (for instance CC-BY), to let others know what they are allowed to do with your research data and how they should be cited.

In all, a data repository helps you make your research data more FAIR: Findable, Accessible, Interoperable and Reusable. This makes it easier to find, download, understand, handle and reuse them - and later to archive your entire research project.

Publish open access and in line with the FAIR-principles

Select data repository

Investigate what data repositories are commonly used in your field and are appropriate for your research data. The metadata fields in domain-specific data repositories can be more detailed and use domain-specific vocabularies that improve the description of the material. General-purpose data repositories can have greater cross-disciplinary reach.

The registry Re3data lists data repositories and can be of help in the process of choosing a data repository.

Re3data, a registry of data repositories

Data repositories curated by Stockholm University

Currently, Stockholm University offers curation and support when you publish in the following data repositories:

The curation entails that the Research Data Management Team reviews and suggests improvements to your metadata, to make your research data more FAIR and to enable automatic archiving of the published material. You forestall the Research Data Management Team and facilitate your data publishing process by following the checklist below.

Checklist for publishing

When you have chosen a data repository for your research data (or metadata, if the research data cannot be published open access) you can make the research data as open access and FAIR as possible upon publication.

  1. Please fill out all the relevant metadata fields in the web form as completely as possible. An added document, a well-structured README text file (.txt) is helpful for future understanding of the research data.
  2. File names. You improve research data's accessibility and sustainability if you name your files wisely before publishing and preservation. It is good if you decide on a consistent structure for how you name your files. Filenames should be informative and descriptive as to be findable and understandable in a cross-disciplinary setting. Filenames ought to include a date stamp. Filenames must not contain any forbidden characters or white space. The only permitted character set is A-Za-z0-9_-. Preferably use dot (.) only once, for separation of the file extension. These recommendations improve machine-readability and findability. DataCarpentry, DataOne, Dryad and Stanford offer guides to best practices in file naming.
  3. File formats. You improve research data's accessibility and sustainability if you save your research data in common, open file formats before publishing and preservation. This makes the research data accessible to more users and for longer. The Swedish National Data Service offer more information about the file formats best suited for long-term preservation and accessibility. When proprietary formats offer important functionality and layout options (e.g. an Excel workbook with several sheets, embedded diagrams, images etc.), you should of course publish and preserve the research data in that format, but please consider also adding a version of the research data in an open, non-proprietary file format. It is important to describe the file formats used as accurately as possible, including references to the software (if possible with the version used) by which they were produced and the preferred software needed to open the files. This is particularly important when the item contains .zip or .tar-folders containing several different file formats.
  4. Variables (column headings). Are your variables understandable, possible to interpret correctly – even by yourself in 5-10 years' time? Or by someone from another discipline? Is the unit of measurement noted clearly for every variable? Is there a need for any additional documentation as a README text-file (.txt), or a separate codebook to provide these details?
  5. Standards and authorities. If there are standards or authorities (vocabularies, ontologies or other) that help describe and interpret your research data, link to these in both metadata and the actual data files. Authorities improve machine-readability and help make your research data more FAIR. Learn more about standards that enrich cultural heritage research from the Swedish National Heritage Board.
  6. References. Please check that all links work properly.
  7. Publication. Give full reference, including DOI(s), to the publication(s) that are based on the research data you are about to publish. If the DOI is unknown, e.g. because the article is not yet accepted for publication, a “dummy” entry can be made and amended later. Metadata can always be amended, even after a dataset has been published. Changes in datafiles and filenames however render a new version of the post (and a new DOI).
  8. Please connect your ORCID to your personal account in the data repository. If you do not have an ORCID, you can register one and associate it with your university account.
  9. Affiliation. State your affiliation correctly in the metadata. If the information has to be typed, please copy-paste the name of your department/institution from these lists: English/Swedish.

Recommended file formats, SND
Best practices for folder structure and file naming, SND
Webinar 'Enriching Metadata - Enriching Research', the Swedish National Heritage Board
Register an ORCID
How to state affiliation (copy & paste): English/Swedish

Advice for publishing software

When publishing software, it is useful to follow the advice below to make the software as FAIR as possible.

  1. Describe clearly in metadata and README-file the programming language(s) of your scripts (e.g. C#, Go, Javascript, Python, R), if applicable also with version.
  2. Do not put the README-file together with scripts (or datafiles) in a zip-file, but keep it separate (as .txt or .md – markdown), to be displayed directly in the repository interface, thereby allowing (re-)users to evaluate the content without first downloading the whole package.
  3. Place a brief explanatory comment at the start of every program [and possibly inherent version history], including a good example of how the program is used. [1]
  4. Decompose programs into smaller functions, that is a reusable section of software. Name functions, list their input parameters, and describe what information they produce. Functions makes it easier to test and troubleshoot when things go wrong.[1]
  5. Avoid duplication. Write and re-use functions instead of copying and pasting code, and use data structures like lists instead of creating many closely-related variables, e.g. create "score = (1, 2, 3)" rather than "score1", "score2", and "score3". [1]
  6. Document software dependencies and requirements explicitly so that mechanisms to access these exist. [1,2]
  7. Provide a simple example or test data set that users (including yourself) can run to determine whether the program is working and whether it gives a known correct output for a simple known input. [1]
  8. Submit code/scripts to a reputable DOI-issuing repository, just as you do with data. Your software is as much a product of your research as your papers, and should be as easy for people to credit. DOIs for software are provided e.g. by Figshare and Zenodo, both integrating with GitHub. [1] For software code/scripts specifically related to climate research the Bolin Centre at Stockholm University has a local GitLab code repository instance that will issue DOIs on demand for fixed releases of submitted software scripts. See the Bolin Centre support site for information and help.
  9. We encourage all software produced in research projects to be published under an open source license. Examples are found in this list: https://spdx.org/licenses/ [3]  
  10. To benefit fully from possible tab completion, make all variable-, directory- and file names in to unique strings with distinct beginnings (so that no name is a substring of another in the same context). For directories and file names, use only the restricted character set [A-Za-z0-9-_.], with no white space inside.

[1] Wilson et al. (2017): Good enough practices in scientific computing. 
[2] Lamprecht et al. (2020): Towards FAIR principles for research software.
[3] Akhmerov et al. (2019): Raising the Profile of Research Software.

Certain data cannot be published open access

Research data containing personal data or sensitive personal data, data that is protected by secrecy in accordance with the Public Access to Information and Secrecy (2009:400), or data that is limited by proprietary right or copyright is not to be published open access.

However, research data that cannot be published open access should be made as open as possible and as closed as necessary. You publish metadata, a description of the research data, in a datarepository with open access but keep the actual data files in a secure data storage (for example SunetDrive/NextCloud). You only make data files accessible upon request, for applicants that, if applicable, have gone through an ethical review and a secrecy examination. The processes for publishing research data and metadata are identical. The only difference is that you, in the latter, omit uploading your data files in the data repository. Truly anonymized personal data, i.e. data that no longer is possible to connect to a person, can be published open access.

Contact the Research Data Management Team on how to handle and make your research data accessible as open as possible and as closed as necessary.

How to manage research material with personal data, SND
Stockholm University's secure datastorage, SunetDrive/NextCloud
Documents, public documents (only in Swedish)
Secrecy regulations at Stockholm University (only in Swedish)

Contact

Research Data Management Team
For questions on research data management, publication and preservation.
E-mail: opendata@su.se