Sprecher
Beschreibung
Molecular dynamics (MD) simulations generate vast amounts of data foundational to structural biology, yet their value is often limited by inconsistent metadata and software-specific formats that create isolated data silos. To address this "semantic gap," we introduce MOLSIM, an interoperable ontology designed to formalize the description of atomistic biomolecular simulations and enhance the implementation of FAIR (Findable, Accessible, Interoperable, Reusable) principles. Developed in adherence to Open Biological and Biomedical Ontologies (OBO) Foundry principles, MOLSIM prevents redundancy by systematically reusing terms from established ontologies such as ChEBI and the Unit Ontology. A core feature of MOLSIM is its software-agnostic labeling, which resolves ambiguities in simulation metadata; for instance, mapping disparate keywords like ‘ntt=1’ in AMBER and’ tcoupl’ in GROMACS to a unified Berendsen Thermostat class. The ontology was constructed using a Large Language Model (LLM)-assisted workflow, employing LLM to extract technical terms from software manuals, followed by rigorous expert curation. Currently comprising approximately 2,000 terms, MOLSIM enables simulation data to be structured as Knowledge Graphs. This allows for the seamless integration of MD metadata with external open knowledge bases such as Wikidata, UniProt, and the PDB, providing the necessary semantic granularity to support next-generation community repositories.
| Alternative Track | 6. Harmonisation of Metadata: Closing Semantic Gaps |
|---|