Schemas, Ontologies & Vocabularies
- Pachl, C.; Frank, N.; Breitbart, J.; Bräse, S. Overview of Chemical Ontologies. arXiv:2002.03842 [cs] 2020.
- The Trouble with Ontologies, or, How to Build an Ontology
- Cox SJD, Gonzalez-Beltran AN, Magagna B, Marinescu MC (2021) Ten simple rules for making a vocabulary FAIR. PLOS Computational Biology 17(6)
- Linked Data Modeling Language: Define schema as
yamlfiles and generate JSON-Schema, RDF, OWL, GraphQL, Python dataclasses
- Versioning of data schema: Translate your data with lenses: Blog posts discusses the challenges and potential solutions for dealing with evolving data schema.
- Basic tabular data annotation from frictionless data
- rdf-tabular. Also see this talk
- Research Object Crates, a schema.org-based container specification for the serialization of research data.
See also the Semantic Python Overview list.
GO FAIR Chemistry Implementation Network: Goals are "to enhance the open, FAIR and effective communication of chemical knowledge within the chemical sciences and between chemistry and other disciplines" and "to enable chemists and chemistry to contribute to the achievement of the UN Global Sustainable Development goals" (direct quotes from the website). 📄.
Chemistry Research Data IG: Interest Group of the Research Data Alliance (RDA) that aims to foster exchange on chemical data.
RDA/CODATA Materials Data, Infrastructure & Interoperability IG: Interest Group of the Research Data Alliance (RDA) that aims to foster exchange on material data.
Materials Research Data Alliance (MaRDA): a community network focused on developing the open, accessible, and interoperable materials data that fuels the Materials Genome Initiative (MGI).
FAIRsFAIR ACME-FAIR Guide: Defining Data Interoperability Frameworks - one section in a 7-part set of recommendations for ensuring FAIR practice across domains.
Recommended by participants
Data formats and models
- JCAMP - IUPAC recommended data exchange format with standardized vocabularies for some fields
- FAIRSpec - standard for sharing spectroscopic data
- DataCite metadata schema - A set of mandatory metadata that must be registered with the DataCite Metadata Store when minting a DOI persistent identifier for a dataset
- AniML - Analytical Information Markup Language (AnIML) is an open ASTM XML standard for storing and sharing any analytical chemistry data.
- Chemical Markup Language CML - is an approach to managing molecular information using XML
- NeXus format - common data format for neutron, x-ray, and muon science
- JSON-LD - publishing linked data as JSON files on the Web
- GEMD data model - Graphical Expression of Materials Data that links together materials, the processes that produced them, and the measurements that characterize them
- chemrof - data model for managing information about chemical entities, ranging from atoms through molecules to complex mixtures.
- MOP - Molecular process ontology
- RXNO - name reaction ontology
- OWL - Web ontology language
- ChEBI - Chemical Entities of Biological Interest is a chemical database and ontology of molecular entities
- PyStow - for reproducible downloading of data in various computational Workflows in Python
- Chemotion - Electronic lab notebook, focussed on organic chemistry, developed at KIT
- rdflib - Python libary to deal with RDF
- LinkML -- see above
- Ontology Development Kit - a toolbox of various ontology related tools such as ROBOT, owltools, dosdp-tools and many more, bundled as a docker image, a set of executable workflows for managing your ontologies continuous integration, quality control, releases and dynamic imports
- iochem-bd - The Computational Chemistry Results Repository, using CML as internal data format
- Wikidata - a collaboratively edited multilingual knowledge graph
- Pipeline Pilot - Proprietary software to build ETL, data science/analytics workflows in a graphical user interface
- OPTIMADE - consortium that aims to make materials databases interoperable by developing a specification for a common REST API