28.–30. Apr. 2026
DKFZ, Heidelberg
Europe/Berlin Zeitzone

Metadata Extraction with LLMs for DCAT-AP+ based Research Data Management in Process Engineering and Catalysis

P04
Nicht eingeplant
3m
Communication Center (DKFZ, Heidelberg)

Communication Center

DKFZ, Heidelberg

Im Neuenheimer Feld 280 69120 Heidelberg, germany
Poster 5. Advancing FAIR Metadata with AI: Methods, Challenges, and Synergies POSTERS & DEMOS - with Coffee

Sprecher

Dr. Alexander S. Sommer-Behr (TU Dortmund) Marc Völkenrath (TU Dortmund)

Beschreibung

Efficient and FAIR (Findable, Accessible, Interoperable, Reusable) research data
management is essential for sustainable data reuse and reproducibility in catalysis
research and chemical engineering. Heterogeneous data sources, inconsistent
documentation practices, and insufficiently standardized metadata continue to
complicate semantic interoperability and long-term accessibility of experimental data.
Within the “Nationale Forschungsdateninfrastruktur” (NFDI) this work presents a
workflow that automates the extraction, validation, and semantic enrichment of
metadata assisted by Large Language Models (LLMs) from scientific datasets in
various file formats.
The workflow applies a customized Ollama-LLM [1] combined with the DCAT-AP+ [2]
metadata schemas and the Voc4Cat [3] domain vocabulary. Relevant domain
concepts are identified through lexical and semantic matching and assigned to
schema-compliant metadata structures. Missing concepts are detected using
definition-based reasoning with existing vocabularies and ontologies [4]. If no suitable
matches are found, the LLM proposes new, standard-compliant candidate concepts,
which are subsequently reviewed and validated by domain experts to ensure
semantic correctness and consistency.
The user validated metadata are exported in a standardized, machine-readable
representation compatible with existing research data infrastructures. The feedback
of domain experts is used to extend the metadata schemas and the controlled
vocabularies and ontologies. The resulting enriched metadata and semantic
resources enable the construction of interoperable knowledge graphs that can be
validated using SHACL and queried via SPARQL to support enhanced literature
search, improved research planning and knowledge discovery.

Autor

Marc Völkenrath (TU Dortmund)

Co-Autoren

Dr. Alexander S. Sommer-Behr (TU Dortmund) Herr Hendrik Borgelt (TU Dortmund) Prof. Norbert Kockmann (TU Dortmund) Herr Simon Clemens (TU Dortmund)

Präsentationsmaterialien