Skip to main content
Version: 1.0.0

FAIR Principles

The FAIR Principles published in 2016 are considered the gold standard when it comes to making data optimally reusable by humans and machines such as AI/ML models. They have been widely promoted and adopted in all fields of research worldwide by all stakeholders including research communities, scientific publishers, and funders. We believe that for data to be AI-ready, it needs to align with the FAIR Principles. We provide below an assessment of how this version of the AI-READI dataset is FAIR, i.e. complies with each of the FAIR Principles.

FAIR Principles Interpretation Compliance by the AI-READI dataset

F1. (Meta)data are assigned a globally unique and persistent identifier.

This principle can be fulfilled by sharing data and metadata on a repository that provides a digital object identifier (DOI) or another similar globally unique and persistent identifier for your dataset. Globally unique here means that the identifier is guaranteed to unambiguously refer to exactly one resource in the world. Persistent means never reused in another context, and continues to identify the same resource, even if that resource no longer exists, or moves.

The AI-READI dataset is shared on fairhub.io, which provides a DOI for the dataset (https://doi.org/10.60775/fairhub.1)

F2. Data are described with rich metadata (defined by R1 below).

While other principles speak to the specific kinds of metadata that should be included, principle F2 simply says that a digital resource that is not well-described cannot be accurately discovered. Thus, this principle encourages data providers to consider the various facets of search that might be employed by a user of their data, and to support those users in their discovery of the resource. This principle can be fulfilled by sharing data in a repository that requires metadata. Including metadata directly in the data files (automatically or manually) can also help fulfill this principle. Additionally, including standalone metadata files (along with your data files) can help with fulfilling this principle as well.

Data files in the AI-READI dataset have embedded metadata in them (e.g., headers in the DICOM files). This is described for each data type in the dataset documentation at docs.aireadi.org.

Additionally, the AI-READI dataset is structured according to the Clinical Dataset Structure (CDS), which prescribes to include several metadata files (README.md, Healthsheet.md, dataset_description.json, etc.). More details are available in the CDS specification documentation.

Finally, the AI-READI dataset is shared on fairhub.io, where metadata from the CDS-prescribed metadata files is embedded in the dataset's landing page following the schema.org schema to enable discovery through search engines

F.3. Metadata clearly and explicitly include the identifier of the data they describe.

Sharing data on a suitable repository that issues a DOI (F1) will typically fulfill this principle since the DOI will typically be embedded in the metadata stored by the repository. Additionally, the DOI can also be included in any of the metadata files that are included along with the data files (F2)

The DOI of the AI-READI dataset is included in the various metadata files prescribed by the CDS. The DOI is also included in the metadata embedded in the landing page of the dataset.

F4. (Meta)data are registered or indexed in a searchable resource.

This principle can be fulfilled by sharing data on a repository that requires metadata (F2) and indexes that metadata such that it is searchable. It is suitable to consult what schema/format the repository follows to make the data discoverable locally (within the repository's search feature) and globally (e.g. in Google search)

The AI-READI dataset is shared through FAIRhub, which embes the dataset metadata in the landing page of the dataset using the schema.org schema and it is therefore searchable globally in search engines. The dataset is also searchable within FAIRhub itself.

A1. (Meta)data are retrievable by their identifier using a standardised communications protocol.

Sharing data and metadata on a repository that issues a DOI (http protocol) or another similar identifier (F1) will typically allow fulfilling this principle.

The AI-READI dataset can be retrieved by its DOI using HTTP, which is a standardized protocol

A1.1. The protocol is open, free, and universally implementable.

Sharing data and metadata on a repository that issues a DOI (http protocol) or another similar identifier (F1) will typically allow fulfilling this principle since http is an open, free, and universally implementable protocol

The AI-READI dataset can be retrieved by its DOI using HTTP, open, free, and universally implementable protocol.

A1.2. The protocol allows for an authentication and authorisation procedure, where necessary.

Sharing data and metadata on a repository that handles any authentication/authorization procedure can allow fulfilling this principle

The AI-READI dataset is shared on the FAIRhub data repository, which includes a protocol for accessing the dataset.

A2. Metadata are accessible, even when the data are no longer available

This principle is automatically fulfilled if the dataset is shared on a repository issuing a DOI since getting a DOI requires registering the metadata with a DOI issuing organization (e.g., Datacite, Crossref) and the metadata theoretically remains always accessible through that organization.

Alternatively, sharing data and metadata on a repository that keeps the metadata accessible even if data is not available (for any reason) can also help fulfill this principle

The AI-READI dataset is shared on FAIRhub which registers a DOI for the dataset by sending major metadata from elements of the dataset to DataCite. This metadata will always remain accessible through DataCite's registry even if the dataset itself is no longer available for some reason on FAIRhub.

I1. (Meta)data use a formal, accessible, shared and broadly applicable language for knowledge representation.

To fulfill this principle, make data available in file formats and/or schemas that are readable by both humans and machines and are standard for the corresponding data type.

Similarly, if metadata is provided in standalone files, ensure that they are in file formats and/or schemas readable by both humans and machines and are standard for the corresponding metadata.

The data files in the AI-READI dataset are in formal, accessible, shared and broadly applicable language for knowledge representation for each data type (e.g. DICOM for images). This is described in the AI-READI dataset documentation at docs.aireadi.org.

The AI-READI dataset includes metadata files prescribed by the CDS which specifies certain metadata files such as README.md and Healthsheet.md that are readable by humans and in standard format for such metadata files. Moreover, the same information is also included in machine-friendly format such as the dataset_descritption.json and study_description.json metadata files. All of these follow the accepted format for such knowledge representation. More details are provided in the CDS specification documentation.

I2. (Meta)data use vocabularies that follow FAIR principles

To fulfill this principle, ensure that the vocabularies used in the metadata are themselves FAIR, i.e. the vocabulary is controlled, well-documented in a suitable format, and resolvable through a standard protocol using a globally unique and persistent identifier. Ontologies defined in the “Web Ontology Language” (OWL) and shared via a publicly accessible registry (e.g. BioPortal for life science ontologies) are examples of formally represented, accessible, mapped, and shared knowledge representations in a broadly applicable language for knowledge representation, that are also compliant with the Findability requirements of FAIR, since BioPortal provides a machine-accessible search interface.

The AI-READI dataset includes machine-friendly metadata files prescribed by the CDS that use vocabularies that follow the FAIR principles as they use popular schemas such as the DataCite schema and the ClinicalTrials.gov schema.

I3. (Meta)data include qualified references to other (meta)data

To fulfill this principle, include in the metadata qualified reference (i.e. reference that explains the nature of the relationship) to external resources (datasets, software, documentation, etc.) that are needed to use, understand, or reproduce the data

The AI-READI dataset includes a dataset_description.json metadata file as per the CDS which has a field to include qualified reference to other (meta)data. More details are provided in the CDS specification documentation.

R1. (Meta)data are richly described with a plurality of accurate and relevant attributes

To fulfill this principle, provide not only metadata required for discovery of the data but also for reusing the data and for understanding the context associated with the data. Some points to take into consideration (non-exhaustive list):

  • Describe the scope of your data: for what purpose was it generated/collected
  • Mention any particularities or limitations about the data that other users should be aware of.
  • Specify the date of generation/collection of the data, the lab conditions, who prepared the data, the parameter settings, the name and version of the software used.
  • Is it raw or processed data?
  • Ensure that all variable names are explained or self-explanatory (i.e., defined in the research field's controlled vocabulary).
  • Clearly specify and document the version of the archived and/or reused data.

The AI-READI dataset is richly described with a plurality of accurate and relevant attributes through the various metadata provided. This is detailed in the description provided for Principle F2.

R1.1. (Meta)data are released with a clear and accessible data usage language.

To fulfill this principle, clearly provide the license terms i.e. the condition under which the data can be used in human and machine readable format. Typically, sharing the data on a suitable repository (F1) should fulfill this principle as most repository include the license as part of the repository metadata

The AI-READI dataset is released under a custom license which is clearly described in a LICENSE.txt file included in the dataset. Additionally, the license is also mentioned in various metadata associated with the dataset, including its documentation available at docs.aireadi.org and the dataset's landing page on FAIRhub.

R1.2. (Meta)data are associated with detailed provenance.

To fulfill this principle, provide provenance information in your metadata including: where the data came from (i.e., clear story of origin/history, see R1), who to cite and/or how you wish to be acknowledged. Include a description of the workflow that led to your data: Who generated or collected it? How has it been processed? Has it been published before? Does it contain data from someone else that you may have transformed or completed? What funding/resources? Who owns the data?

The AI-READI dataset includes various metadata files prescribed by the CDS, such as README.md, Healthsheet.md, dataset_description.json, study_description.json all of which include detailed provenance information. The dataset's landing page also includes the same provenance metadata and so does the dataset documentation available at docs.aireadi.org.

R1.3. (Meta)data meet domain-relevant community standards.

To fulfill this principle do the following as applicable:

  • Organize data file in a standard directory structure
  • Name files and directories following a consistent naming convention
  • Provide data and metadata in domain relevant format (even if it means duplicating on F2 and R1.2). For the metadata, follow any community-agreed minimal information requirements for your data type. For a list of such standards, consult FAIRsharing

The AI-READI dataset is structured according to the CDS, which imposes a standard folder structure, naming convention for directories, and metadata files that follow community standards. More details are provided in the CDS specification documentation.

All the data files in the AI-READI dataset are in format that meet domain-relevant community standards (e.g., DICOM for images). This is described in detail in the dataset documentation available at docs.aireadi.org.

Was this page helpful?