HMC Conference 2026

Name: HMC Conference 2026
Start: 2026-04-28T09:00:00+02:00
End: 2026-04-30T13:30:00+02:00
Location: DKFZ, Heidelberg

28.–30. Apr. 2026

DKFZ, Heidelberg

Europe/Berlin Zeitzone

Contact - HMC Events

event@helmholtz-metadaten.de

Metadata Extraction with LLMs for DCAT-AP+ based Research Data Management in Process Engineering and Catalysis

P04

Nicht eingeplant

Communication Center (DKFZ, Heidelberg)

Communication Center

DKFZ, Heidelberg

Im Neuenheimer Feld 280 69120 Heidelberg, germany

Poster 5. Advancing FAIR Metadata with AI: Methods, Challenges, and Synergies POSTERS & DEMOS - with Coffee

Dr. Alexander S. Sommer-Behr (TU Dortmund) Marc Völkenrath (TU Dortmund)

Efficient and FAIR (Findable, Accessible, Interoperable, Reusable) research data
management is essential for sustainable data reuse and reproducibility in catalysis
research and chemical engineering. Heterogeneous data sources, inconsistent
documentation practices, and insufficiently standardized metadata continue to
complicate semantic interoperability and long-term accessibility of experimental data.
Within the “Nationale Forschungsdateninfrastruktur” (NFDI) this work presents a
workflow that automates the extraction, validation, and semantic enrichment of
metadata assisted by Large Language Models (LLMs) from scientific datasets in
various file formats.
The workflow applies a customized Ollama-LLM [1] combined with the DCAT-AP+ [2]
metadata schemas and the Voc4Cat [3] domain vocabulary. Relevant domain
concepts are identified through lexical and semantic matching and assigned to
schema-compliant metadata structures. Missing concepts are detected using
definition-based reasoning with existing vocabularies and ontologies [4]. If no suitable
matches are found, the LLM proposes new, standard-compliant candidate concepts,
which are subsequently reviewed and validated by domain experts to ensure
semantic correctness and consistency.
The user validated metadata are exported in a standardized, machine-readable
representation compatible with existing research data infrastructures. The feedback
of domain experts is used to extend the metadata schemas and the controlled
vocabularies and ontologies. The resulting enriched metadata and semantic
resources enable the construction of interoperable knowledge graphs that can be
validated using SHACL and queried via SPARQL to support enhanced literature
search, improved research planning and knowledge discovery.

Marc Völkenrath (TU Dortmund)

Dr. Alexander S. Sommer-Behr (TU Dortmund) Herr Hendrik Borgelt (TU Dortmund) Prof. Norbert Kockmann (TU Dortmund) Herr Simon Clemens (TU Dortmund)

Abstract_HMC.pdf

HMC Conference 2026

Contact - HMC Events

Metadata Extraction with LLMs for DCAT-AP+ based Research Data Management in Process Engineering and Catalysis

Communication Center

DKFZ, Heidelberg

Sprecher

Beschreibung

Autor

Co-Autoren

Präsentationsmaterialien

Wähle Zeitzone

HMC Conference 2026

Contact - HMC Events

Sprecher

Beschreibung

Autor

Co-Autoren

Präsentationsmaterialien