Research data and data management plan (DMP)

Table of contents

Research data and data management plan (DMP)

What is research data?

Research Data - data collected and compiled as part of the research process that is considered by the scientific community to be necessary to evaluate the results of scientific research.

 Research data can be:

  •  numerical data (e.g., measurements)
  • text documents, notes
  • questionnaires, surveys, survey results
  • audio and video recordings, images
  • contents of databases
  • mathematical models, algorithms
  • software (scripts, input files)
  • results of computer simulations
  • laboratory protocols
  • methodological descriptions
  • samples, artefacts, objects

 

The data should be as open as possible and as closed as necessary.

Sharing data allows other scientists to reuse research results and supports the creation of new science that builds on previous findings, increasing the efficiency of the research process. Data sharing also promotes transparency and reproducibility, building trust in science.

Data should be made available as soon as possible. The NCN (National Science Center) requires data to be made available no later than when research results are published.

 

Data Management Plan

A Data Management Plan (DMP) is a document that describes the data a researcher intends to acquire or generate during a research project, how those data will be managed, described, analyzed, and stored, and the mechanisms used at the end of the project to share and archive their data.

 

What should the plan contain?

  • What data will be produced or collected? (types of data, file formats,

estimated volume of data)

  • How will the data be acquired or produced? (standards, methods,

software, tools)

  • How will they be structured and described? (metadata, documentation)
  • Ethical and legal issues (data protection issues,

classified data, etc.)

  • How will the data be shared? (how, when, to whom),
  • Which data will be stored on a long-term basis? (where, for how long)

 

Examples of repository locations

  • The Zenodo Repository (https://zenodo.org/) is a multidisciplinary repository developed under the European OpenAIRE programme and operated by CERN. It allows the deposit of scientific articles, datasets, research software, reports and any other digital artefacts related to research.

 

  • RepOD Open Research Data Repository (https://repod.icm.edu.pl/) - It is a Polish repository of open research data, created for researchers and institutions interested in sharing their resources. Each deposited dataset is provided with a DOI number. It is run by the Open Science Platform team of the Interdisciplinary Centre for Mathematical and Computational Modelling at the University of Warsaw.

 

Search engine for trusted repositories

Trustworthy repositories are reported to the Registry of Research Data Repositories list at https://www.re3data.org/.

 Repositories meeting the highest criteria can also apply for CoreTrustSeal certification - these can be found at https://amt.coretrustseal.org/certificates.

 

 Recommended file formats

When choosing a storage format for research data, it is advisable to limit yourself to so-called open formats, i.e. formats that can be opened using free software.

Selection of file formats for archiving:

Data type Recommended Not recommended
Tabels CSV, TSV, SPSS portable XLS, XLSX
Texts HTML, RTF, PDF DOC, DOCX
Media MP4, FLAC QTFF
Image TIFF, JPGE2000, PNG GIF, JPG
Structured data XML, RDF RDBMS

 

File Naming

File names should be short but descriptive, without special characters and spaces.

The name should include: the project name, the date the data was collected or processed (a good format for date designations is YYYYMMDD), the file identifier or version.


What are metadata?

Metadata means data about data. They are used to summarise basic information about a particular set of research data.

  •  Title
  • Source of data
  • Description of the data
  • Creator or author of the data
  • Format
  • Language
  • Methods of data production
  • Purpose of data creation
  • Date of creation
  • License
  • Funding statement
  • File size
  • Quality of data
  • Process used to create the data

 

FAIR principles - essential guidelines for shared data

  • Findable - so that it is easy to find
    •  A dataset is provided with metadata, which makes it possible to find this collection.
    • A unique identifier (e.g., DOI) is assigned to the collection
    • The metadata is indexed in publicly accessible databases that make them searchable.

 

  • Accessible - to be available to all
    •  Metadata are always available, even if the dataset itself has already been deleted or moved.
    • Access to the dataset, or at least to the metadata, is possible directly via a unique identifier and does not require no additional tools or software.

 

 

  • Interoperable - so that it can be linked to other data
    •  Datasets and the metadata describing them contain links to other related datasets.
    • The data and metadata are provided in a format that ensures easy reading and processing by both humans and computers.

 

  •  Reusable - so that they can be reused
    • Metadata contains numerous properties that accurately describe the dataset and enable users to determine its usefulness for their own research.
    • The dataset contains a license which clearly defines the conditions for reuse and processing of the data;
    • Metadata clearly identifies the author and place of creation of the data;
    • Metadata is structured according to generally accepted standards that are specific to the discipline and type of data.

 

 Sources:

 Dane badawcze w pigułce. Poradnik

Szuflita-Żurawska M., Plan Zarządzania Danymi

Milewska A., Zarzadzanie danymi badawczymi

 

  Contact for depositing research data and preparing Data Management Plans

 

Lukasz Jeszke, MA

Library

Scientific Information Department

phone 61 665 3521, e-mail: lukasz.jeszke@put.poznan.pl