Skip to main content
//

CECAM workshop February 7-9 2022

Machine actionable data for chemical sciences: Bridging experiments, simulations, and machine learning for spectral data

All the recorded talks are now available in this playlist on the CECAM YouTube Channel

Recent advances in the computational sciences allow us to simulate many spectra (e.g., X-ray absorption, infrared/Raman, NMR) in silico. In principle, this could open up unprecedented possibilities for the interpretation of experimental data.

Experimental data, however, comes in various, often undocumented or proprietary formats. In recent efforts, this experimental data is being recorded in electronic lab notebooks and archived with open data formats, aiding and automating crucial metadata capture. However, most of these lab notebooks have no mechanisms to exchange data between each other and even less so with our simulation tools, and typically, exporting data from such notebooks again requires lossy conversion to a chosen file format.

Standardization is an arduous process, and for a wide enough domain, it is infeasible. Nevertheless, without significant effort, there is a danger that we will not escape the local minima of “★★★/★★★★★” linked open data (as defined by Tim Berners-Lee).

In the case of the interoperability between experimental and computational data, there is the additional difficulty that computational systems are completely described, idealized systems with implicit assumptions, whereas for experimental systems parameters are ill-defined, unknown, or uncertain. Moreover, we also often miss a link between spectra data and the (meta) data contextualising the sample and its history.

How and where can we be interoperable in this setting? How can we make sure that experimental data can readily be consumed by computational tools, and vice versa, from the bottom-up? How can we share, contextualise and disseminate analysis (e.g., post-processing, peak assignment) in a reproducible way (on platforms such as MaterialsCloud or the Chemotion repository)? What new paradigms could such interoperability enable?

At the CECAM MADICES workshop, we will bring together developers, scientists, and data specialists to discuss the hurdles and opportunities of data interoperability in the context of the chemical and materials sciences. We will strive for general technical recommendations, with X-ray absorption spectroscopy as the first prototype use case.