HMC Conference 2026
Communication Center
DKFZ, Heidelberg
HMC Conference 2026 – Metadata in Action
How does better metadata drive better science? The Helmholtz Metadata Collaboration (HMC) Conference 2026, held 28–30 April 2026 in Heidelberg, invites you to explore how metadata creates scientific impact across disciplines and infrastructures.
Under the theme “Metadata in Action,” this conference is where diverse perspectives meet. Infrastructure providers, data stewards, researchers, and centre operators come together to exchange ideas, showcase solutions, and discuss how metadata acts towards shaping data quality, interoperability, and innovation in research.
Whether you design infrastructures, manage data, or conduct research, this is your opportunity to see how improved metadata practices lead to better science. Join us to explore how metadata is not just managed - but making a difference.
The full conference programme, including session schedules and workshop details, is now available. Plan your visit and explore the wide range of contributions across metadata, FAIR data, and research infrastructures.
Our Keynote Speakers
![]() |
Marta Teperek is an open science and data stewardship expert with a background in molecular biology (University of Cambridge). She is widely recognised for advancing FAIR data practices, metadata quality, and sustainable research data strategies across international research institutions. |
|
![]() |
Jan Portisch is a data science and knowledge graph expert at SAP SE. As Head of Content Infrastructure, he leads the development of enterprise-scale knowledge graphs and AI-driven content infrastructures, bridging semantic technologies and machine learning in industrial contexts. |
|
![]() |
Oliver Stegle is a computational genomics and research infrastructure expert at DKFZ Heidelberg. As Director and Co-Spokesperson of the German Human Genome-Phenome Archive (GHGA), he advances secure, FAIR-aligned access to human genomic and phenomic data. |
Beyond the Programme
In addition to the scientific sessions, the HMC Conference 2026 offers opportunities to connect and engage beyond the lecture halls.
- Social Event on Day 2 – Neckar River Cruise
Participants who have booked the social event can enjoy an evening boat trip on the Neckar – a relaxed setting for networking and exchange. - Morning Run (optional)
Start your day with a relaxed run through Heidelberg: We’ll follow a ~4.2 km route along the Neckar, through the Neuenheimer Feld, and back via the Altstadt. More info in our timetable.
Stay Connected
Be part of the conversation! Follow the latest updates, share your insights, and connect with fellow participants using our official conference hashtag: #HMCConference2026
And follow us on:
- Bluesky: @helmholtzhmc.bsky.social
- LinkedIn @helmholtz-metadata-collaboration-hmc
- Mastodon: @helmholtz_hmc
Explore the HMC Conference 2025
Curious about what you can expect this year? Explore highlights from our last in-person HMC Conference 2025 – because context matters... in Cologne, including the full Book of Abstracts, the conference website, and a conference newspost from the event.
With the kind support of the German Cancer Research Center (DKFZ)
-
-
09:00
→
12:30
WORKSHOPS: Extended Workshops
-
09:00
From Ontology to ELN - Create your Made-to-Measure Semantic Metadata Platform 3h 30m H1.00.028 (DFKZ)
H1.00.028
DFKZ
This workshop will teach you how to use Herbie for setting up a bespoke semantic electronic laboratory notebook or research metadata platform which is customized to your concrete scientific needs.
We will start with an ontology of your scientific domain, pick a typical metadata record you might want to collect, and end with a set of (re)usable web forms for entering such a record in a fully semantically annotated way.
A typical and cumbersome approach would be creating spreadsheets and a set of transformation scripts to facilitate easy data entry for non-technical users. In the workshop you will get to know an alternative approach using Herbie: You will learn to write validation schemas in the standardized SHACL Shapes Constraint Language, upload these alongside your ontology to Herbie, and obtain a platform with easily usable web forms which automatically persist all entered data into a semantically annotated RDF knowledge graph.
After entering a few exemplary records, we will explore how you can query the created RDF knowledge graph using SPARQL to extract the data you need in downstream projects.
This workshop is intended for those who have an application ontology and want to start collecting (small) (meta)data that is properly semantically annotated. There are no restrictions on the domain. Herbie works best for data entered manually in an append-only approach, like it is typically done in laboratory notebooks.
We assume basic understanding of RDF and OWL, in particular you should be able to understand RDF graphs in the turtle format. You should bring your own computer and have Python and Node.js installed to be able to run some development tools.
Sprecher: Fabian Kirchner (Helmholtz-Zentrum Hereon) -
09:00
Semantic x-Lab: Bridging Laboratory Metadata and Semantic Knowledge Discovery 3h 30m A0.225 (DKFZ)
A0.225
DKFZ
The Semantic x-Lab project addresses a fundamental challenge in modern research data ecosystems: the fragmentation of laboratory metadata across heterogeneous systems and disciplinary silos. Funded within the Helmholtz Metadata Collaboration (HMC) and co-led by HZDR, GFZ, and GSI, the project aims to interlink ontology-based descriptions of workflows, instruments, resources, and experimental data to make them discoverable, interoperable, and semantically rich. By building a distributed knowledge graph through a user-centered co-design process with laboratory partners and large-scale facility stakeholders, Semantic x-Lab fosters cross-domain insights that were previously inaccessible due to isolated metadata landscapes.
Building on our 2025 Kick-Off Workshop where we introduced the project scope, collaborative use cases, and the foundational vision for FAIR semantic integration of lab information, this workshop will advance hands-on discussions on concrete integration strategies and community engagement practices. Participants will explore how semantic search interfaces, ontology alignment, and co-design methodologies can support FAIR metadata workflows across research domains.
The workshop aligns with key HMC Conference 2026 track topics, by showcasing ontology-based harmonisation efforts, Human-Machine Collaboration in (Meta)data Acquisition through discussions on digital tools and workflows and Domain and Application-specific Ontologies via real use cases from laboratory contexts. Based on this, we will develop and discuss exemplary knowledge graphs in groups during the workshop in order to introduce researchers to the field, but also to show infrastructure providers and central data stewards how knowledge graphs can support their work and the scientists they serve. We will also take these insights into account as our project progresses and incorporate them into further work.
This workshop invites researchers, data stewards, and infrastructure developers to contribute to shaping Semantic x-Lab’s next phases and to collectively envision semantic metadata as a cornerstone for future-ready, cross-disciplinary research discovery.
Sprecher: David Pape (Helmholtz-Zentrum Dresden - Rossendorf (HZDR)), Dr. Felix Mühlbauer (GFZ), Dr. Manja Luzi-Helbing (GFZ), Martin Voigt (HZDR), Oliver Knodel
-
09:00
-
09:00
→
10:30
WORKSHOPS: Short Workshops
-
09:00
Building Connected Data Ecosystems - How to Facilitate FAIR Data Workflows across Tools and Services 1 h 30m Lecture Hall (Marsilius Kolleg)
Lecture Hall
Marsilius Kolleg
Audience: Research Data Managers, Researchers, Research Infrastructure/service providers, Core facility providers
Most research centers maintain dedicated infrastructures to capture, curate, and store research data produced by their personnel. The employed solutions, however, are often run independently of one another and therefore lack connectivity, creating gaps in the data workflows. An integrated data ecosystem, however, would manage information, provide workflows, and support data documentation as data is produced.
Lab and field notebooks are essential tools for documenting structured information during measurement campaigns or field and laboratory work. Modern Electronic Lab Notebooks (ELNs) and data collection tools offer advanced features to support this documentation process and can enrich records with additional metadata—such as instrumentation details, personnel involved, sample registration, and more. They are often positioned in sections of the data workflow, where critical information is generated and possibly merged and thus could operate as data workflow orchestrators. On the other hand, this task could also be assumed by other tools, depending on the architecture of the envisioned data ecosystem.However, in practice, many centers and laboratories face significant barriers: ELNs and other services are not readily available, may require costly licenses, don’t integrate sufficiently into existing workflows and infrastructure. They also often lack institutional support or training opportunities. As a result, their use is not yet widespread.
In this workshop, which builds on the results of a workshop taking place in summer 2025. We would like to discuss potential architecture models within research centers, and invite participants to explore the potential of ELNs and other tools within scientific workflows. Together, we’ll discuss desirable features, briefly review a few existing solutions, adoption challenges and consider whether centrally provided ELN services across Helmholtz could be a sustainable way forward. The aim is, to form a working group, developing interoperability standards for data ecosystems.Sprecher: Emanuel Söding (ORTC), Rory Macneil, Tilo Mathes (ResearchSpace) -
09:00
Creating and Inspecting Research Object Crates – The Interactive Way 1 h 30m ATV 106 (DKFZ)
ATV 106
DKFZ
NovaCrate is a web-based interactive editor for creating, editing, and visualizing Research Object Crates (RO-Crates). Built for inspecting, validating, and manipulating RO-Crates, it enables getting a deeper understanding of an RO-Crate's content and structure.
In our workshop, we aim to provide training in NovaCrate and RO-Crate. We also hope to extend our understanding of the requirements of researchers, data stewards, and any other roles that may come in contact with RO-Crates or NovaCrate for further investigation for potential improvement.
During the hands-on session, we will guide and encourage participants to work together in small groups and package some prepared research data as an RO-Crate with the help of NovaCrate. To do so, participants will describe the research data with metadata created through NovaCrate. Here we see a close connection to track topic No. 4, "From Harmonisation to Action(ability)".
In this process, teams are encouraged to take notes on challenges, blockers, and ideas for improvement. At the end of the workshop, we will discuss the experience with the participants, guided by the notes the participants have taken.
The discussion will be centered around these questions:- In which scenarios are RO-Crates useful?
- How to approach reuse or consumption of RO-Crates?
- How can you incorporate RO-Crates into your research
We hope to have an interesting discussion not only providing us with crucial input for the development of our services, but also to offer our participants with the opportunity for discourse on the applications of RO-Crates in their research area.
Sprecher: Christopher Raquet (KIT) -
09:00
Creating RDF-compliant Metadata Templates with the AIMS Metadata Profile Service 1 h 30m Seminar Room 2 (Marsilius Kolleg)
Seminar Room 2
Marsilius Kolleg
Generating FAIR research data and enabling its reuse is the overall goal of research data management. However, establishing machine-readable knowledge representation - the “I” in FAIR - as the foundation for FAIR data and metadata remains a major challenge for many research communities. We have developed an approach to create subject-specific, RDF-compliant metadata profiles (i.e., SHACL shapes) that enable precise and flexible documentation of research processes and data. Our modelling approach supports inheritance between profiles: communities can create and share modular profiles as building blocks, which others can adopt and extend, so that metadata remains community-specific and interoperable at the same time.
To facilitate the modelling process and make it accessible to users with limited ontology expertise, we have developed a web service that provides a graphical user interface for creating metadata profiles [1]. It allows users to add suitable terms from existing terminologies together with constraints on permitted value nodes (e.g. expected data types, classes, or node shapes) and attribute cardinalities. Based on those profiles, metadata forms can be automatically generated for entering profile-compliant metadata [2] as well as search interfaces to explore profile-based metadata via faceted search [3].
In this workshop, participants will learn how to use the AIMS editor to create and extend metadata profiles and discuss the challenges of creating RDF-compliant metadata for research data. We will also present the new user interface prototype and conduct a hands-on user test. By gathering feedback from metadata experts, data stewards, and domain experts, we aim to improve the current user interface and discuss how RDF-based metadata can be embedded into everyday research workflows.
[1] NFDI4ING Metadata Profile Service. https://profiles.nfdi4ing.de
[2] Shacl-form. https://github.com/ULB-Darmstadt/shacl-form
[3] RDF-Store. https://github.com/ULB-Darmstadt/rdf-storeSprecher: Jürgen Windeck (Technical University Darmstadt), Kseniia Dukkart (RWTH Aachen University), Marc Fuhrmans (Technical University Darmstadt), Moritz Kern (RWTH Aachen University) -
09:00
Making Helmholtz Data Assets Visible via the Helmholtz Knowledge Graph 1 h 30m Seminar Room 1 (Marsilius Kolleg)
Seminar Room 1
Marsilius Kolleg
This workshop aims to advance the Helmholtz Knowledge Graph (HKG) as a shared metadata backbone by identifying new data providers and sources, extending and refining the HKG data model, and jointly evaluating practical onboarding processes for data providers across Helmholtz.
The Helmholtz Knowledge Graph is a federated metadata infrastructure that makes digital assets—such as datasets, publications, software, and instruments—discoverable, comparable, and queryable across the Helmholtz Association. While the HKG already integrates metadata from multiple infrastructures, its continued value depends on active collaboration with data providers, domain experts, and metadata professionals.
The workshop provides a structured, interactive setting to work on three closely connected themes. First, participants will identify novel data providers and metadata sources, including domain-specific repositories, institutional services, and emerging infrastructures, that could meaningfully extend the coverage of the HKG. This includes discussing when data sources can be considered authoritative and how they may be used to validate, enrich, or contextualize other metadata in the graph.
Second, the workshop will explore metadata schemas and domain-specific structures that are currently not, or only partially, represented. Participants will review limitations of the existing HKG data model and discuss extensions that improve expressiveness for search, discovery, and cross-domain analysis.
Finally, participants will discuss how a structured onboarding process for data providers can be established, identifying challenges, best practices, and opportunities to better align technical pipelines with real-world metadata creation and maintenance.
Outcomes include a curated list of candidate data providers, shared criteria for authoritative metadata, concrete proposals for extending the HKG data model, and initial milestones for onboarding new data providers.
The workshop will be organized into parallel and successive discussion tables, followed by joint synthesis sessions to consolidate results across perspectives.
Data stewards, infrastructure providers, metadata specialists, and developers working with metadata infrastructures within Helmholtz and beyond.
Sprecher: Volker Hofmann (Forschungszentrum Jülich)
-
09:00
-
10:30
→
11:00
COFFEE 30m Communication Center, Foyer (DKFZ)
Communication Center, Foyer
DKFZ
-
11:00
→
12:30
WORKSHOPS: Short Workshops
-
11:00
From Chaos to Clarity: Smart Sample Management with LinkAhead & O2A SAMPLES 1 h 30m Seminar Room 2 (Marsilius Kolleg)
Seminar Room 2
Marsilius Kolleg
LinkAhead is a flexible open source toolbox for research data that adapts easily when workflows or requirements change. It offers a clear web interface, programmatic access and a semantic structure that can be extended for many different research contexts.
In this workshop we will introduce LinkAhead and demonstrate how it supports O2A SAMPLES, a sustainable and interoperable platform for transparent, FAIR compliant and AI ready sample metadata. O2A SAMPLES enables reliable sample registration, storage tracking and Nagoya documentation, and connects smoothly with Helmholtz infrastructures. With well defined workflows, QR based tracking and fully documented procedures, it provides an efficient and collaborative approach to managing samples from field collection to digital archive. This unified framework strengthens reproducibility, accessibility and discoverability, enabling efficient digitization and collaboration across the entire sample lifecycle.
In this workshop, participants will first get to know the open-source research data management software LinkAhead [1, 2] which is the basis of the O2A SAMPLES platform at AWI. We will introduce LinkAhead's datamodel and webinterface including hands-on examples of how to query for, insert, and edit data entries in LinkAhead. We will then continue with an introduction of the O2A Samples platform with its sample and storage management workflows. Participants will learn how samples are registered, and how to export and update their metadata. An outlook will be given on configuring and adapting the sample management module [3] to the participants' (or their institutions') needs.
[1] https://doi.org/10.3390/data4020083
[2] https://gitlab.com/linkahead/
[3] https://gitlab.com/linkahead/linkahead-sample-managementSprecher: Maren Rebke (AWI), Florian Spreckelsen (IndiScale) -
11:00
From Shared Challenges to Shared Action: Metadata Harmonization in Practice 1 h 30m Seminar Room 1 (Marsilius Kolleg)
Seminar Room 1
Marsilius Kolleg
Metadata harmonisation is a collective action problem. In this workshop our goal is to bring together data stewards, infrastructure providers, and researchers to share practical experiences in improving metadata quality, and co-identify actionable next steps toward harmonized metadata practices.
The workshop builds on our analysis of metadata provided, previous workshops, and one-on-one counselling sessions.
Intended Outcomes:
The workshop will:
-
present a summary of insights gathered from community workshops and one-on-one provider counseling,
-
provide short provider case reflections illustrating practical harmonization efforts (successes and challenges),
-
facilitate an interactive group exchange on lessons learned, remaining obstacles, and community-identified priorities,
-
synthesize outcomes into a joint set of next steps for HMC and providers, and shared recommendations.
Expected results:
Shared understanding of practical paths to improve metadata in provider contexts,
A curated list of next steps and recommendations for provider networks and HMC,
Strengthened network of practitioners engaged in metadata harmonization.
Sprecher: Oonagh Brendike-Mannix (HMC/HZB) -
-
11:00
Make your own FAIR Digital Objects – The Graphical Way 1 h 30m ATV 106 (DKFZ)
ATV 106
DKFZ
To accelerate the adoption of FAIR Digital Objects (FDOs), their creation and usage needs to be implemented in software. Our work targets the task of creating and maintaining FDO records. We introduce an application to build designs for FDO records in an intuitive and visual way, targeting non-experts and experts in the field alike. From a design, code and FDOs can be generated to automatically create FDO records from given information.
In this workshop, we aim to provide the skill to create FAIR Digital Objects in smaller and larger scales with minimal resources. We encourage the participants to bring JSON-encoded metadata of the objects they would like to publish as FAIR DOs. For those who do not, we will provide examples to work with. We also hope to get some feedback for the further development of the FAIR DO Designer and insights into deeper requirements of the target group. The workshop will have the following shape:
- Introduction to the FAIR DO Designer (10 min)
- Demonstration and guidance through the basic concepts (interactive, 20 min)
- Working session, so participants can build their own FDOs (guided, 45 min)
- Discussion and Feedback (15 min)
References
- FAIR DO Designer Code Repository: https://github.com/kit-data-manager/fair-do-designer
- FAIR DO Designer Online Demonstrator: https://kit-data-manager.github.io/fair-do-designer/
Sprecher: Andreas Pfeil (HMC FAIR Data Commons) -
11:00
Semantics Hidden in the Dark – Make Datasets Shine (Practical Integration of Terminology Services for FAIR Data) 1 h 30m Lecture Hall (Marsilius Kolleg)
Lecture Hall
Marsilius Kolleg
Semantic technologies and terminology services are a cornerstone for implementing the FAIR principles, as they make the meaning of data explicit, machine-actionable, and reusable beyond their original context. While data may be technically accessible, a lack of shared semantics often limits interoperability and hinders reuse across disciplines, infrastructures, and research communities. Terminology services address this challenge by providing controlled concepts, semantic relationships, and persistent identifiers that enable consistent interpretation and integration of data.
This workshop focuses on the practical adoption of terminology services in research data infrastructures, moving beyond conceptual discussions toward concrete, transferable implementations. It presents key results of the BITS project (Blueprint for the Integration of Terminology Services in Earth System Science) and demonstrates how terminology services on the example of the ESS TS (https://terminology.nfdi4earth.de) can be embedded into research data infrastructures to improve discovery, interoperability, and semantic enrichment of datasets. Designed as an interactive forum, the workshop combines short inputs, live demonstrations, and participatory elements to make the added value of semantics tangible for different stakeholder groups. Participants will engage with real-world implementations of terminology services integrated into repository interfaces (via API usage) and metadata pipelines, supported by interactive elements such as live polling, search challenges, and guided discussion. By bringing together repository managers, researchers, and data stewards, the workshop fosters exchange between technical and conceptual perspectives and supports community-driven learning. Overall, the workshop aims to lower barriers to adopting terminology services, strengthen awareness of their strategic importance for FAIR data, and stimulate discussion on scalable and sustainable implementations across research infrastructures.Sprecher: Claudia Martens (DKRZ), Dr. Anette Ganske (TIB)
-
11:00
-
12:30
→
13:30
LUNCH 1 h Communication Center, Foyer (DKFZ)
Communication Center, Foyer
DKFZ
-
13:30
→
14:00
WELCOME: Official Opening of the HMC Conference 2026 Communication Center
Communication Center
DKFZ, Heidelberg
Im Neuenheimer Feld 280 69120 Heidelberg, germanySören Lorenz - Speaker HMC
Klaus Maier-Hein, Marco Nolden - Local Hosts
Volker Hofmann - Head of Scientific Programming Committee -
14:00
→
15:00
KEYNOTE - Making FAIR Happen: Culture Change Across the Research Ecosystem: Dr. Marta Teperek with Dr. Dani Metilli Communication Center
Communication Center
DKFZ, Heidelberg
Im Neuenheimer Feld 280 69120 Heidelberg, germanyIn her keynote, Marta will address how research cultures, infrastructures, and policy frameworks can evolve to strengthen metadata quality and enable FAIR data. Drawing on her international work in professionalising data stewardship and coordinating community-driven initiatives, she will discuss how funders, research organisations, and multidisciplinary teams can collaborate to embed FAIR workflows, improve interoperability, and foster more transparent and trustworthy research. This will be complemented by, Dr Dani Metilli, ontology engineer at the Thematic Digital Competence Centre for Natural and Engineering Sciences, showing examples of concrete research data interoperability support. Altogether, the talk will offer a forward-looking perspective on how the research ecosystem can harness metadata as a driver for both scientific excellence and cultural change
Chair: Dr. Volker Hofmann (FZJ)
-
15:00
→
15:30
COFFEE 30m Communication Center, Foyer (DKFZ)
Communication Center, Foyer
DKFZ
-
15:30
→
17:00
TALK SESSION: Metadata in Action: Embedding Quality and Context into Research Infrastructures - Chair: Christine Lemster Communication Center
Communication Center
DKFZ, Heidelberg
Im Neuenheimer Feld 280 69120 Heidelberg, germany-
15:30
Data Quality Assessment Results as Decision-Relevant Enriched Metadata for Decision Support Systems 10m
Trustworthy and impactful decision support systems (DSS) critically depend on data that are fit for their intended use. Here, data quality assessments should be understood not merely as a validation step, but as a systematic metadata enrichment process that produces structured, machine-actionable descriptors of data plausibility, and correctness.
Building on established data quality concepts in medical research, this talk generically conceptualises automated data quality assessment as a crucial pre-assessment layer that is analytically distinct from decision logic. Results from such assessments inform decision logic on appropriate data use.
To implement such assessment in an efficient and transparent manner, we have implemented a layered metadata model comprising: (i) metadata describing core data quality dimensions such as missing data, inadmissible data, contradictions, or unexpected associations; (ii) metadata capturing the formal rules and expectations used to detect data quality issues; (iii) metadata that classifies the severity and decision relevance of the detected issues within a specific application context - this classification is inherently normative and context dependent, it reflects explicit assumptions about how particular data limitations may affect decisions; and from the first three layers finally (iv) the derived fitness-for-decision metadata that explicitly characterises the suitability of a dataset for downstream decision processes.
Using the tools dataquieR (R) and dqrep (Stata) as concrete implementations, based on an example from the medical domain, we demonstrate how this separation of assessment and decision layers enables context-specific, automated generation of interoperable data quality metadata with minimal programming effort. We illustrate how such metadata can shape DSS behaviour—for example by suppressing unreliable alerts, or down-weighting model inputs.
We conclude by discussing implications for metadata standards, interoperability of data quality assessment results. Treating such results as decision-relevant metadata provides a structured mechanism to potentially increase the value of DSS.Sprecher: Carsten Oliver Schmidt (Universitätsmedizin Greifswald) -
15:45
Self-Assessment for FAIR Data Publication: Empowering Researchers to Improve Dataset Quality before Submission 10m
The CoreTrustSeal certified institutional data repository RDR, built on Dataverse, is central to KU Leuven’s efforts to support FAIR data publication. Since its launch in 2022, the growing number of dataset submissions has highlighted the need for an efficient, transparent, and consistent curation workflow. To address this, the RDR team developed an open-source review dashboard that integrates with Dataverse and streamlines the curation process.
Initially designed to optimize the review workflow, the dashboard’s second iteration introduced Python-based automated checks for systematic quality assessment. These checks validate metadata completeness and consistency while flagging issues such as missing PIDs, unclear licensing, insufficient metadata, or absent README files. Crucially, automation complements rather than replaces human judgement: curators can override or contextualize outcomes, ensuring nuanced interpretation remains part of the process.
Recurring metadata issues often surface only during curation, causing delays and additional review rounds. Building on the insights from the automated checks in the review dashboard, the RDR team is developing a self-assessment tool for researchers. This tool enables more complex pre-submission validation of draft datasets than is possible in the Dataverse UI and embeds FAIR-oriented guidance, including PID requirements, licensing clarity, consistent metadata, and documentation completeness. By providing concrete, and actionable feedback, it helps prevent common issues before formal review and supports the creation of more complete datasets.
The presentation will introduce the design principles and implementation of this self-assessment tool, highlighting the metadata checks and how feedback is presented to users. We will discuss how automated assessment assists researchers in fulfilling essential requirements while encouraging more complete metadata. Furthermore, we will reflect on key insights and challenges, offering guidance for institutions aiming to strengthen research support and enhance metadata quality for FAIR-aligned data publication.Sprecher: Marleen Marynissen (KU Leuven) -
16:00
Ontology-Driven FAIR Sensor Maintenance Metadata for Environmental Observations 10m
Environmental research relies on a wide variety of sensors deployed across oceanic, terrestrial, and atmospheric platforms, yet maintenance metadata remains poorly standardized and difficult to integrate into FAIR-compliant workflows. Metadata is crucial for interpreting sensor data, particularly on research vessels equipped with diverse instruments operating globally. Contextual details including sensor specifications, timestamp, location, etc. are essential for ensuring interoperability and reproducibility of the data.
The MOIN4Herbie project funded by HMC addresses this challenge by integrating maintenance-specific semantic models into Herbie, an electronic laboratory notebook built on an RDF-based collaborative knowledge graph. To support the recording of maintenance activities, we developed the MOIN and MOIN4BoknisEck ontologies. MOIN ontology captures maintenance tasks such as calibration and cleaning, along with the instruments, platforms, and personnel involved, building on established ontologies including SSN/SOSA and PROV-O. Because maintenance procedures vary substantially across sensor types, manufacturers, environmental conditions, and operational protocols, the MOIN4BoknisEck ontology captures use-case-specific requirements.
Using these ontologies in combination with SHACL, we implemented backend and frontend protocols that generate ontology-driven web forms in Herbie. Users enter maintenance information through user-friendly interface, after which Herbie validates the input, applies semantic annotations, and stores the enriched metadata directly in the knowledge graph.
Core device metadata about sensors and platforms are retrieved from O2A registry and ingested into MOIN4Herbie via Registry2RDF module and it is stored as structured RDF records capturing key details such as involved instruments, and responsible personnel. Correspondingly maintenance metadata recorded in MOIN4Herbie, including calibration and cleaning, these records are queried via SPARQL and automatically exported to the O2A Registry through API, reducing manual metadata handling and improving interoperability. Additionally, integration with the OGC SensorThings API enables linkage between maintenance metadata and sensor data streams, further enhancing accessibility across systems.
Sprecher: Smruthishree Srinivasa -
16:15
FAIR AIMS – Bringing Rich Metadata for Physical Samples into the Digital World 10m
Persistent identifiers (PID) are critical elements of digital research data infrastructures, enabling the unambiguous identification, location and citation of digital representations of a growing range of entities, such as publications and data. Physical samples form the basis for many research results and data. The International Generic Sample Number (IGSN) provides a globally unique, persistent, and web-resolvable identifier for physical objects, allowing them to be found, cited and reanalysed. IGSNs facilitate direct links between data, publications, originating samples, and records of their creation. This closes one of the final gaps in the provenance of research results.
FAIR AIMS builds upon the successful HMC project FAIR WISH. The main outcome of FAIR WISH was the FAIR SAMPLES Template – a modular template, developed for EaE researchers that allows users to select sample-type-specific metadata properties and create customized, rich sample descriptions that comply with the IGSN schema - regardless of the level of digitisation of sample metadata and the individual researcher's metadata training. The template includes a number of linked data vocabularies and forms the basis for the semi-automated generation of IGSN metadata XMLs and subsequent IGSN registration. IGSN metadata submitted via the FAIR SAMPLES TEMPLATE have already proven to be much richer and more complete than previous submissions.
FAIR AIMS will develop an online version of the FAIR SAMPLES Template with automated workflows for IGSN registration and the integration of linked-data vocabularies as dropdown lists, as well as automated metadata quality checks during uploads. FAIR AIMS is also the first IGSN-related project to actively reach beyond the geosciences. Our partner HZB will contribute to FAIR AIMS by developing IGSN metadata profiles for material science samples and integrate the online template into their sample database SEPIA.
Sprecher: Kirsten Elger (GFZ Helmholtz Centre for Geosciences) -
16:30
From Siloed Experiments to Collective Intelligence: Operationalizing the Post-FAIR Laboratory 10m
While the theoretical benefits of FAIR data are well-established, the operational reality of integrating these principles into active R&D environments reveals a distinct set of challenges and opportunities. This presentation moves beyond the "why" of digitalization to the "how," based on observations from scaling semantic architectures in tribology and materials engineering.
We identify three critical pillars for the "Lab of the Future." First, the Democratization of Context: enabling PhD students to leverage advanced data structures without requiring extensive training in data science. Second, the Contextualization of Automation: ensuring that data streams from continuous robotic testing are automatically enriched with metadata to prevent high-speed resource waste. Third, the Realization of AI Utility: moving past vague promises to a concrete understanding of AI’s role: from automating routine analysis to powering high-level predictive models. By addressing these pillars, we show how isolated data points can be woven into a unified, queryable asset, transforming a collection of individual projects into a robust, self-improving research ecosystem.
Sprecher: Nick Garabedian (datin GmbH) -
16:45
Open Data, Open Access, Open Source: Liberating Metadata for OA Books and Chapters 10m
As open access (OA) becomes the dominant model for scholarly book publishing, the integration of open, standard-compliant metadata into publishing workflows, library systems, and preservation infrastructures has become increasingly urgent. This presentation reports on key findings from a recent metadata study (Steiner et. al. 2026) that reviewed international standards and requirements for OA books and chapters, with a particular focus on the needs of small-to-medium-sized, scholar-led and institutional Diamond OA publishers. The study identifies persistent challenges in discoverability, interoperability, and sustainability, and outlines practical approaches to improving open metadata management across the long-form publishing lifecycle.
Building on this analysis, the talk introduces an extended, format-agnostic metadata framework aligned with established regional and national recommendations for OA publishing (e.g. NISO, NAG, AG Univerlage Quality Criteria, Diamond OA Standard). The framework is also compatible with open data principles such as that of the Barcelona Declaration on Open Research Information, and incorporates the requirements of major metadata aggregators in the scholarly book supply chain. Designed to be both robust and adaptable, it supports wide dissemination while remaining responsive to future policy and infrastructure developments.
The presentation then demonstrates how Thoth Open Metadata operationalises this framework through a freely available, open-source metadata management platform. Drawing on examples from independent, library-based, and university presses across Europe, North America, Latin America, and Africa, we show how publishers retain control over fully open (CC0) and FAIR metadata, manage it centrally, and automatically export it in multiple industry-standard formats (including MARC, ONIX, KBART, and Crossref XML) via open APIs. Finally, we illustrate how Thoth enables seamless dissemination to major OA platforms, automated DOI registration, library integration, and transparent open archiving, while maintaining a format-agnostic upstream source for high-quality open metadata for OA books and chapters..
Sprecher: Tobias Steiner (Thoth Open Metadata)
-
15:30
-
17:00
→
19:00
POSTERS & DEMOS - with Drinks Communication Center, Foyer & 1st Floor (DFKZ)
Communication Center, Foyer & 1st Floor
DFKZ
Join us for the first poster and demo session. A selection of authors will be present at their posters and demos (see overview here & printed overviews), offering insights into their work and inviting you to engage in conversation.
Demo contributions are located on the 1st floor, while posters are displayed both there and throughout the foyer.
Take this opportunity to explore a wide range of topics, ask questions, and connect with fellow participants in an open and interactive setting. Drinks will be available.
From 18:00, the Welcome Reception begins, offering light finger food – a perfect opportunity to continue discussions and network in a relaxed atmosphere.
-
18:00
→
21:00
WELCOME RECEPTION - with Fingerfood Communication Center
Communication Center
DKFZ, Heidelberg
Im Neuenheimer Feld 280 69120 Heidelberg, germanyJoin us for the Welcome Reception.
Enjoy light finger food and drinks in a relaxed setting while connecting with fellow participants, speakers, and contributors. The reception takes place alongside the poster and demo session, offering a great opportunity to combine informal networking with exploring the presented work.
We look forward to welcoming you and kicking off the conference together.
-
09:00
→
12:30
-
-
07:00
→
07:45
Optional - MORNING RUN Communication Center
Communication Center
DKFZ, Heidelberg
Im Neuenheimer Feld 280 69120 Heidelberg, germanyWe’ll follow a ~4.2 km route along the Neckar, through the Neuenheimer Feld, and back via the Altstadt. The pace is relaxed and inclusive – everyone is welcome. If you prefer to go faster, feel free to run ahead; the route is available here: https://www.strava.com/routes/3476881065633586266
Meeting Point:
Im Neuenheimer Feld 223, 69120 Heidelberg (conference venue)Duration: ~30 minutes
-
09:00
→
10:30
TALK SESSION: Software Interoperability for (Meta)data Acquisition - Chair: Martin Held (Hereon) Communication Center
Communication Center
DKFZ, Heidelberg
Im Neuenheimer Feld 280 69120 Heidelberg, germany-
09:00
FDO-Ops Prototype for Machine-Actionable FDOs 10m
This talk presents the FDO-Ops model as a prototype framework that makes FAIR Digital Objects (FDOs) machine-actionable by discovering, assessing, and executing operations across heterogeneous data resources in an interoperable way.
The prototype builds on a DOIP/HTTP client interface where every management function and every Operation FDO is invoked with a uniform request pattern, providing a stable, client-agnostic interaction layer independent of underlying systems. It integrates an identifier and FDO type system supporting base infrastructure, i.e., the Handle Registry, a Data Type Registry, and a Typed PID Maker (TPM) instance. An extended service component called TPM Adapter is used that (1) supports Elasticsearch-based full-text search over information records; (2) ingests associated FDO–Operation and FDO–FDO relationships into a Neo4j graph for efficient traversal and rule-enforced consistency; (3) uses a mapping component that translates technology-dependent execution protocols of operations into a JSON-based execution map, run by an Executor module (e.g., for Web API calls or script executions).
Conceptually, FDO-Ops advances several interoperability layers defined in different interoperability models, in particular the technical and syntactic layers, by:• treating operations themselves as reusable Operation FDOs
• separating workflows into phases (discovery, typed metadata assessment, and bit-sequence processing)
• integrating existing standards (e.g., APIs, SKOS/RDF) without requirinig changes to established (meta)data systems.Applicability is exemplified with cross-domain use cases: discovering relevant FDOs, listing associated operations, interpreting SKOS vocabularies via a SPARQL-based endpoint, and executing data-level preprocessing/validation for Numpy and SKOS RDF/XML files.
Overall, this work shows how latest advances in research on machine-actionable FDOs, turning them from passive containers into reusable computational instruments, pave the way towards truly interoperable data ecosystems.Sprecher: Nicolas Blumenröhr -
09:15
O2A SAMPLES: From Expedition to Digital Database – A Unified Framework for Sample Digitization and Interoperability 10m
Comprehensive and standardized sample management is crucial for advancing field research under challenging conditions. We present the “O2A SAMPLES” prototype: a generic Sample Management System designed as a hosted service that unifies diverse sample metadata in one interoperable framework (samples.o2a-data.de). Developed collaboratively by AWI and GEOMAR, O2A SAMPLES not only facilitates automated, digitized workflows for capturing metadata (such as unique sample identification via DataCite, and Nagoya compliance) but also offers seamless integration with well-established Helmholtz systems The main contribution is 1.) a sample management for the Helmholtz community and beyond—enabling institutions, universities, and research organizations to view and loan samples, identify them by QR-code, conduct experiments on actual materials, and benefit from standardized workflows without managing underlying infrastructure and 2.) rigorously documented standardized workflows and standard operating procedures (SOPs) for sample management, ensuring that every aspect of the process is fully described. This integrated approach promises to set a new benchmark for digital sample management, enhancing interdisciplinary collaboration and resource utilization across the research community.
Sprecher: Maren Rebke (AWI) -
09:30
LabID-Prov : Unifying Experimental and Computational Data Provenance with LabID 10m
LabID is an open-source platform designed to streamline research data management for scientists, research groups, and core facilities of life-science institutes. By integrating sample and dataset management, inventory tracking, and an electronic lab notebook, LabID enables users to organize, annotate, and share experimental data in compliance with FAIR principles. At its core, LabID uses a relational database to document entities—such as instruments, samples, and resulting datasets—and their relationships, constructing a comprehensive knowledge graph that traces data provenance from biological specimens to raw outputs.
We introduce LabID-Prov, an extension that enhances this knowledge graph by capturing post-acquisition data processing steps—typically executed via computational workflows. While a variety of platforms and software exist to process raw data with workflows and scripts, there is no central solution to document these processes while preserving data provenance.
With LabID-Prov, the knowledge graph of data provenance has been expanded to incorporate derived datasets produced by computational workflows. These datasets are directly linked to their source raw data, while the software, parameters, and processing methods applied are systematically documented. This extension is supported by newly implemented data models for workflow versions and runs, ensuring standardized and comprehensive metadata capture.
To facilitate integration with existing platforms, LabID-Prov supports importing from Git repositories, Galaxy instances, and WorkflowHub, while leveraging Workflow and Workflow Run RO-Crate specifications to simplify sharing and deposition in scientific repositories (WorkflowHub...). Interoperability is further strengthened through an API, a command-line utility, and a Python library, enabling automation and customization. For example, users can import a workflow from Git, enrich its metadata (e.g. license, authors) via LabID’s interface, and export it to WorkflowHub, all within a unified ecosystem.
By unifying experimental and computational provenance, LabID-Prov guarantees that derived datasets retain the same level of contextual information as raw data, fostering reproducible computational research.Sprecher: Laurent Thomas (EMBL Heidelberg) -
09:45
Deploying Kadi4Mat Workflows in Laboratory Environments for Reproducible and Guided Experimental Research 10m
Within the Kadi4Mat ecosystem, Kadi4Mat Workflows support the systematic deployment of laboratory workflows by transitioning execution from previously exclusively local desktop systems (KadiStudio) to remote, containerized infrastructures, such as Docker and Kubernetes. This approach ensures controlled, scalable, and reproducible execution conditions across experimental runs. Interactions via a web interface allows workflows to be controlled and monitored on mobile devices at the point of experimentation. For procedural steps that require human intervention, the workflow provides predefined, structured input fields for capturing user inputs and observations during execution. QR code-based identification of samples and laboratory equipment further improves traceability and reduces manual transcription errors. Experimental data and associated metadata can be persistently stored in the Kadi4Mat repository, enabling comprehensive provenance tracking as well as reproducibility, reuse, and long-term preservation of laboratory research data.
As part of this framework, electrochemical corrosion measurements (e.g., OCP, polarization curves, EIS) are implemented as Kadi4Mat Workflows, enabling guided execution with structured capture of test parameters, sample history and electrolyte composition directly at the point of experimentation. The resulting raw data and metadata are stored persistently in the Kadi4Mat repository, providing end-to-end provenance and ensuring reproducible, comparable corrosion metrics across experimental runs and infrastructures.
Overall, Kadi4Mat Workflows extend electronic lab notebook (ELN) concepts from passive documentation toward guided experiment execution. By integrating structured data capture with workflow-driven experimental guidance, experiments are documented directly during execution rather than retrospectively. This approach embeds ELN functionality into everyday laboratory practice and supports reproducibility by design through the seamless coupling of experimental procedures, data, and metadata.
Sprecher: Johannes Steinhülb (Karlsruhe Institute of Technology) -
10:00
Using Persistent Identifiers (PIDs) as a Vehicle for Achieving Interoperability 10m
Most research workflows involve use of multiple research tools, services, and IT infrastructure, each addressing one phase of the research life cycle. The lack of interoperability between resources hinders research productivity and prevents streamlined passage of data between tools and sustainable data FAIRification. This presentation discusses implementation of PIDs in RSpace to enhance interoperability between research tools and services used in different phases of the research lifecycle.
RSpace, which has evolved from an ELN into a research orchestration platform, has integrations with 20+ research tools and services, including domain specific research tools, data management planning tools, data repositories, equipment scheduling and colony management tools, computational resources including R, Jupyter Notebooks and Galaxy, protocols.io, and (institutional) data storage solutions. This has resulted in an ecosystem of connected research tools through which data can pass readily and seamlessly.
We will discuss pidification of RSpace and extension of the pids overlay to other tools and services in RSpace’s ecosystem. Starting with ORCIDs and RORs, we added support for associating IGSN IDs with physical samples and their metadata, and the ability to pass sample data and associated IGSN IDs from a field data capture notebook (Fieldmark) to RSpace. We then added support for PIDs for projects, RAiDs, and now are incorporating support for instruments, using PIDINST. With IGSNs, RAiD and PIDINST, we describe how product design is driven by consideration of research workflows involving other tools. This includes discussions with developers of other tools and open-source contributors to ensure that information about the PIDs can be shared through RSpace in a streamlined and effective fashion. Finally, we discuss how support for PIDs in a research hub like RSpace enhances an entire ecosystem of tools, services, and research infrastructure using open APIs, SDKs, and MCP tools and making it accessible to both humans and machines.
Sprecher: Rory Macneil (Research Space) -
10:15
TeSSHub - A Federated Training Catalogue Infrastructure 10m
The mTeSS-X project (“Multi-space Training e-Support System with eXchange”) aims to address one of the central challenges in modern research infrastructures: how to provide coordinated, yet domain-specific training resources across diverse scientific communities. Within the framework of ELIXIR and PaNOSC, the project develops a federated training catalog infrastructure called TeSSHub that connects communities from Photon and Neutron (PaN) Science, the Life Sciences (LS) and beyond, enabling them to share, discover, and reuse training materials and event information across institutional and disciplinary boundaries.
Scientific domains such as LS and PaN share common challenges in training data stewardship, reproducible research, and the application of computational methods. However, their training ecosystems have traditionally evolved independently, often leading to fragmentation and duplication of effort. mTeSS-X directly addresses this by developing a modular, multi-space platform architecture that supports community autonomy while enabling interoperability and content exchange between training catalog instances. The software framework builds upon the ELIXIR Training eSupport System (TeSS) and introduces extensions that facilitate federated content discovery, metadata harmonization, and cross-domain search through standardized APIs and metadata schemas aligned with FAIR principles.
From a technical perspective, mTeSS-X combines robust software engineering with semantic technologies to support training resources that are FAIR (Findable, Accessible, Interoperable, and Reusable) across infrastructures. It introduces an exchange mechanism that allows participating communities to selectively publish, synchronize, and enrich content, while preserving local governance and editorial control. The well-established OAI-PMH 2.0 protocol is supported for import as well as export of content and is used in combination with RDF data utilizing schemas.science, which is based on schema.org properties, and ontologies, such as EDAM, for semantic interoperability.
In this contribution, we will present the conceptual and technical foundations of mTeSS-X. The presentation will also highlight how the exchange feature relates to FAIR training materials.
Sprecher: Martin Voigt (Helmholtz-Zentrum Dresden-Rossendorf)
-
09:00
-
10:30
→
11:00
COFFEE 30m Communication Center, Foyer (DKFZ)
Communication Center, Foyer
DKFZ
-
11:00
→
12:30
TALK SESSION: Ontology-Driven Metadata Harmonization: Closing Semantic Gaps - Chairs: Gerrit Günther (HZB), Said Fathalla (FZJ) Communication Center
Communication Center
DKFZ, Heidelberg
Im Neuenheimer Feld 280 69120 Heidelberg, germany-
11:00
Local Semantics with Global Meaning 10m
FAIRification is huge global innovation proces. A major challenge is the steep learning curve for every new user especially, if it comes to actionalble semantics: Which globally vocablary or ontology do I use? How can I use it for my data? Essentially every new adopter, who has never collected a single semantic dataset before, is asked to do everything right from the first moment and select the right vocabulary. This however is which strictly speaking would already require an overview.
This talks tries to address this and other problems connected to the introduction of globally harmonized semantics, with a seamingly counterintuitive suggestion, built into the concept of ontology: Split the problem of selecting the vocabulary ontology for collecting and communicating metadata into two: Allow users to collect metadata using fine granualar locally defined terms, and to give them a meaning that can be translated into global and harmonized vocabularies later. Both task can be performed by creating simple RDFS and OWL statements. Semantic reasoners can do the actual translation.
The short talk will provide several arguments for this proposal: It mimizes hurdles to get started with metadata, because only minial prior knowlege or decisions are required. It allows adopters to learn about the semantics in a safe local setting, before they need to select the best fitting ontology. It allows to express concepts not yet available in harmonized ontologies and to test new concepts early. It allows to represent the collected metadata in several even contradictory or not yet developed ontologies. It allows to define terms much more specific and concrete than global terms and thus be more concise and better taylored for the actual data needs. It makes the job of data stewards easier, due to a separation of problems and tasks.
Sprecher: Prof. K. Gerald van den Boogaart (HZDR) -
11:15
Interoperable Metadata For Describing Health Studies – The NFDI4Health Metadata Schema 10m
To structure metadata of health-specific research studies a tailored metadata schema (MDS) has been developed in the context of the German National Research Data Infrastructure for Personal Health Data (NFDI4Health) [1,2]. The MDS supports metadata publication from clinical, epidemiological, and public health studies, while maintaining interoperability with other resources. Designed in a modular fashion, it combines metadata for multiple purposes. Examples are health study design and use case-specific use for nutritional epidemiology, chronic diseases, record linkage, and medical imaging/radiomics. It also comprises bibliographic information and metadata about data sharing and access.
The MDS is based on DataCite for domain-independent metadata, and CT.gov for metadata specific for clinical trials, and other domain-specific schemas. For compatibility with clinical trial registries, the schema was mapped to DRKS and ICTRP. To support metadata exchange with other platforms mappings to the European Clinical Research Infrastructure Network metadata repository, the German Human Genome-Phenome Archive and the Directory of Registries of the European Rare Disease Registry Infrastructure were performed. To further promote semantic and syntactic interoperability of metadata across health research infrastructures, we aligned the MDS with HL7® FHIR® (Fast Healthcare Interoperability Resources) and included standard terminologies in the value sets such as DICOM for the imaging module. The emerging European Health Data Space (EHDS) will play a major role in Europe in the whole medical domain and to prepare for interoperability with the EHDS we have initiated an alignment of the MDS with the HealthDCAT-AP metadata standard underlying the EHDS systems.
Overall, MDS can be applied across a wide range of use scenarios, while maintaining interoperability. NFDI4Health services based on the MDS, such as the German Health Study Hub (https://health-study-hub.de/) and the Local Data Hub software (https://www.nfdi4health.de/en/service/local-data-hub.html) can easily interface with external resources.
[1] [https://doi.org/10.2196/63906]
[2] [https://www.nfdi4health.de/en/service/metadata-schema.html]Sprecher: Martin Golebiewski (Heidelberg Institute for Theoretical Studies (HITS), Heidelberg) -
11:30
Integrating Domain Ontologies and Workflow Metadata for Interoperable Computational Experiments 10m
Computational Science and Engineering relies on complex, multi-step workflows that combine simulations, data processing, and parameter-driven analyses across heterogeneous environments. Ensuring reproducibility in such settings requires not only abstract workflow descriptions but also semantically rich metadata that is interoperable across domains.
In this work, we present MaRDIFlow, a lightweight, metadata-driven workflow framework developed within the MaRDI consortium for research data management in the mathematical sciences. MaRDIFlow executes workflows through explicit input–output relationships between components, enabling structured metadata descriptions at different abstraction levels. Redundant representations of models, code, and data are supported to strengthen reproducibility and reuse.
To address semantic interoperability, MaRDIFlow integrates domain specific ontologies via RESTful APIs and SPARQL endpoints. This allows workflow components and their metadata to be dynamically aligned with standardized vocabularies during both construction and execution. As a concrete example, we integrate Voc4Cat, a domain-specific ontology and SKOS vocabulary from the NFDI4Cat consortium, which serves as a semantic backbone for annotating workflow components and metadata dependencies. Through this integration, knowledge graphs are used to represent and query relationships across workflow layers, supporting automated discovery, validation, and consistent interpretation of data.
The presented use cases demonstrate how combining workflow descriptions with domain ontologies enhances semantic consistency, interoperability, and reproducibility. This work highlights the practical role of domain and application ontologies in building reusable data infrastructures for computational workflows and lays the groundwork for extending MaRDIFlow with additional NFDI ontologies across disciplines.
Sprecher: Pavan L. Veluvali (Max Planck Institute for Dyanmics of Complex Technical Systems) -
11:45
MOLSIM: An Interoperable Ontology for Representing Biomolecular Simulation 10m
Molecular dynamics (MD) simulations generate vast amounts of data foundational to structural biology, yet their value is often limited by inconsistent metadata and software-specific formats that create isolated data silos. To address this "semantic gap," we introduce MOLSIM, an interoperable ontology designed to formalize the description of atomistic biomolecular simulations and enhance the implementation of FAIR (Findable, Accessible, Interoperable, Reusable) principles. Developed in adherence to Open Biological and Biomedical Ontologies (OBO) Foundry principles, MOLSIM prevents redundancy by systematically reusing terms from established ontologies such as ChEBI and the Unit Ontology. A core feature of MOLSIM is its software-agnostic labeling, which resolves ambiguities in simulation metadata; for instance, mapping disparate keywords like ‘ntt=1’ in AMBER and’ tcoupl’ in GROMACS to a unified Berendsen Thermostat class. The ontology was constructed using a Large Language Model (LLM)-assisted workflow, employing LLM to extract technical terms from software manuals, followed by rigorous expert curation. Currently comprising approximately 2,000 terms, MOLSIM enables simulation data to be structured as Knowledge Graphs. This allows for the seamless integration of MD metadata with external open knowledge bases such as Wikidata, UniProt, and the PDB, providing the necessary semantic granularity to support next-generation community repositories.
Sprecher: Fathoni Musyaffa (FZ-Jülich) -
12:00
Toward FAIR and Reproducible Data Quality Control: A Use Case–Driven Data-Quality Processing Metadata Schema for Time Series Data 10m
High-quality environmental time series data require transparent, reproducible, and well-documented quality control (QC) workflows that integrate automated procedures and expert judgment. While many QC frameworks offer algorithmic methods, the processing information explaining how data quality decisions are made — including parameterization, flag semantics, and manual interventions — is often not formalized enough to be easily reused, reproduced, or exchanged across infrastructures.
In this talk, we present a metadata schema for time series data that enables FAIR and reproducible data quality processing. The schema is designed to describe QC methods, execution contexts, and resulting quality flags in a machine-actionable and interoperable manner. It employs the OGC SensorThings API data model enhanced by the STAMPLATE schema and the concepts established in the SaQC framework. The schema follows the linked-data approach and aligns with standards such as the W3C Data Quality Vocabulary.
The design of the proposed schema is motivated by concrete use cases for QC of time series data from the TERENO and ACTRIS observation networks. These use cases include detailed analyses of existing automated and manual QC workflows. By comparing and abstracting these practices, we derive common requirements and design patterns for representing QC processing information in a FAIR and reproducible manner. The resulting schema can be used straightforwardly with SensorThings API services and mapped into NetCDF files that align with the Helmholtz Metadata Guidelines for NetCDF. It can also be used with RO-Crates, embedding files in CSV format, for example.
Our metadata schema lays the foundation for a community-driven, FAIR, and reproducible quality control solution. Our goal is to integrate the requirements of other communities and develop a web application that allows users to visually inspect and flag time series data in a manner consistently with our schema.
Sprecher: Dr. Ulrich Loup (Forschungszentrum Jülich GmbH)
-
11:00
-
12:30
→
13:30
LUNCH 1 h Communication Center, Foyer (DKFZ)
Communication Center, Foyer
DKFZ
-
13:30
→
14:30
KEYNOTE - Constructing Enterprise RDF Knowledge Graphs: Foundations for Neuro-Symbolic AI: Dr. Jan Portisch Communication Center
Communication Center
DKFZ, Heidelberg
Im Neuenheimer Feld 280 69120 Heidelberg, germanyIn his keynote, Jan Portisch will explore the construction and operationalisation of RDF knowledge graphs in complex corporate environments. Drawing on extensive experience in integrating heterogeneous and evolving data sources, he will discuss architectural choices, practical strategies, and common challenges in building enterprise-scale knowledge graphs. The talk will also highlight how such knowledge graphs can support AI applications by improving semantic grounding, robustness, and explainability, and will provide an outlook on neuro-symbolic AI as a bridge between statistical learning and symbolic reasoning.
Chair: Dr. Wolfgang Süß (KIT)
-
14:30
→
14:40
GROUP PICTURE 10m Communication Center
Communication Center
DKFZ, Heidelberg
Im Neuenheimer Feld 280 69120 Heidelberg, germany -
14:40
→
17:00
POSTERS & DEMOS - with Coffee Communication Center, Foyer & 1st Floor (DKFZ)
Communication Center, Foyer & 1st Floor
DKFZ
The second poster and demo session features the remaining contributions (see overview here & printed overviews). Authors will again be present to share their work and engage in discussion.
Use this opportunity to discover furthertopics, exchange ideas, and connect with presenters and participants.
Coffee will be served during the session.
-
18:00
→
22:00
SOCIAL EVENT – Neckar Cruise
Neckar river cruise including conference dinner (registration required)
-
07:00
→
07:45
-
-
09:00
→
10:00
KEYNOTE - TBA: Prof. Dr. Oliver Stegle Communication Center
Communication Center
DKFZ, Heidelberg
Im Neuenheimer Feld 280 69120 Heidelberg, germanyIn his keynote, Prof. Stegle will bring the perspective of scientific infrastructure to the theme of “Metadata in Action,” reflecting on how robust infrastructure — from secure data archives to scalable computational ecosystems — underpins effective metadata practices and enables cutting-edge science. Drawing on his experience building and coordinating GHGA and related initiatives that integrate data workflows, governance, and interoperability across institutions, he will discuss how infrastructures can make metadata more actionable, drive reproducible research, and support interdisciplinary collaboration in the era of large-scale genomics and biomedicine
Chair: Dr. Marco Nolden (DKFZ)
-
10:00
→
10:30
COFFEE 30m Communication Center, Foyer (DKFZ)
Communication Center, Foyer
DKFZ
-
10:30
→
12:15
TALK SESSION: Human-Machine Collaboration in (Meta)data Acquisition - Chair: Marta Dembska (DLR) Communication Center
Communication Center
DKFZ, Heidelberg
Im Neuenheimer Feld 280 69120 Heidelberg, germany-
10:30
A Semantically Integrated Framework for Robotic Data Acquisition in Mechanical Testing 20m
Traditional mechanical testing often relies on manual observation and fragmented data storage, creating bottlenecks in scientific progress. To reduce development times and make mechanical testing more sustainable, we must transition from manual logging to high-throughput, standardized data acquisition. This presentation demonstrates a paradigm shift in human-machine collaboration within fatigue crack growth experiments.
We present an autonomous data acquisition framework for fatigue crack growth experiments in which intelligent robotics, Digital Image Correlation (DIC), and machine learning (ML) operate as closed-loop sensing agents. High-resolution DIC continuously tracks crack tip position and deformation fields, while ML models extract higher-order descriptors, including plastic zone evolution and fracture-relevant damage features, in real time.
Central to the framework is a semantic orchestration layer based on graph databases, domain ontologies, and explicit provenance models. Experimental parameters, sensor states, derived features, and processing steps are represented as first-class entities in a unified knowledge graph. This enables automated metadata capture, cross-modal data alignment, and machine-driven reasoning over experimental context, rather than post-hoc annotation.
By decoupling experimental execution from semantic interpretation, the framework transforms mechanical testing into a self-describing, machine-navigable process. The result is an autonomous experimental pipeline that supports scalable data generation, reproducible analysis, and seamless integration into self-driving laboratory workflows.Sprecher: Eric Breitbarth (Deutschland) -
10:55
Automated Metadata Acquisition in Energy Research using BPMN-driven Workflows at the Energy Lab at KIT 10m
In energy research facilities, such as the Energy Lab at the Karlsruhe Institute of Technology, the systematic acquisition of high-quality metadata remains a significant challenge because manual documentation can be error-prone and time-consuming. To ensure data findability and reproducibility according to the FAIR principles, we present an innovative approach that utilizes Business Process Model and Notation (BPMN) to orchestrate research workflows and automate metadata capture. By using Operaton as a lightweight, open-source BPMN engine, metadata acquisition can be transformed into an inherent component of the experimental lifecycle. Within the Energy Lab infrastructure, experimental sequences, ranging from sensor calibration to data storage, are modeled as executable BPMN diagrams. These models serve a dual purpose: they provide a visual documentation layer for researchers and act as technical instructions for the Operaton engine. By integrating specialized metadata tasks directly into the automated workflow, the engine extracts technical parameters and provenance information in real-time and maps them to standardized schemas without requiring manual intervention. The initial results demonstrate that this orchestration significantly increases metadata completeness and consistency while reducing the administrative burden on researchers. Furthermore, the graphical nature of BPMN facilitates a crucial bridge between domain-specific research and data engineering. This integration provides a scalable framework for "Metadata-by-Design," ensuring that complex datasets generated within the Helmholtz Association are accompanied by high-quality, machine-readable documentation. Ultimately, the use of Operaton for process-driven metadata acquisition represents a robust solution for the long-term usability of energy-research data.
Sprecher: Jan Martin Reckel (KIT) -
11:10
A Semantic Laboratory Assistant for Metadata Acquisition in Electronic Lab Notebooks 10m
Laboratory data reuse and reproducibility depend on rapid, accurate, and complete capture of experimental (meta)data. In practice, metadata creation in electronic lab notebooks (ELNs) remains a bottleneck because form-based entry interrupts workflows, free-text input is time-consuming and error-prone, and heterogeneous terminology complicates harmonisation across projects and infrastructures. These limitations reduce metadata quality and quantity and impede Findable, Accessible, Interoperable, and Reusable (FAIR) dissemination and integration.
LabFriend is an open, ELN-agnostic laboratory assistant under development to mitigate these issues through semantically structured, context-aware metadata acquisition. Intended functionality includes real-time suggestions, validation of field values against controlled semantics, and optional speech-based capture. Methods combine association-rule mining from historical form instances with ontology- and knowledge-graph-based semantic relatedness, aiming to improve completeness and terminology consistency while keeping interaction lightweight.A central prerequisite is robust data preparation that converts heterogeneous ELN exports into validation-ready material for semantic methods and evaluation. This contribution focuses on a preparation workflow for transforming records collected in the Chemotion ELN into knowledge-graph-ready representations. The workflow addresses common obstacles in exported records, including a mixture of structured key-value fields and unstructured free-text, missing or implicit units, inconsistent naming, ambiguous identifiers. Preparation steps include structure extraction from Chemotion objects, normalisation of datatypes and units, entity resolution across samples, processes, and instruments, and semantic anchoring to domain vocabularies while preserving provenance of each transformation decision. The resulting material is annotated manually against a closed, schema-driven target model and can be mapped to Resource Description Framework (RDF) statements for knowledge-graph construction and downstream reuse.
Sprecher: Frau Tina Boroukhian (Helmholtz-Zentrum Hereon, Institute of Membrane Research, Germany) -
11:25
A Collaborative Approach to Metadata Interoperability: PID4NFDI, TS4NFDI, and RSpace 10m
This contribution presents a joint collaboration between PID4NFDI, TS4NFDI, and the electronic lab notebook provider ResearchSpace to support interoperable research workflows within the National Research Data Infrastructure (NFDI). It demonstrates how early, structured capture of high-quality metadata and persistent identifiers (PIDs) in ELNs, combined with shared reference schemas and centrally governed terminology services, can reduce redundant effort and improve metadata consistency and data lineage across the research lifecycle [1].
The presentation outlines the complementary roles of the partners. PID4NFDI coordinates PID integration and metadata alignment. TS4NFDI provides centralized terminology services via an API Gateway, ensuring consistent, machine-actionable metadata. RSpace integrates these components into everyday research workflows, enabling structured metadata and PID capture at the point of data creation.
Entity mappings curate DataCite schema alignments with schema.org and DCAT, maintained in Cocoda [2] with versioning and provenance tracking. DataCite properties and vocabularies are available via TIB Terminology Service, providing canonical, machine-actionable terms accessible through TSS widgets [3] and the API Gateway [4]. RSpace integrates these services via embedded widgets, enabling structured metadata capture at data creation and supporting export of NFDI-compliant ELN records.
Overall, the collaboration establishes a reusable, machine-actionable metadata layer based on shared terminology lookup, cross-schema mappings as a single source of truth, and clear service integration patterns. The proof of concept illustrates how PID4NFDI and TS4NFDI can work with ELN and DMP providers to enable interoperable research workflows and inform future NFDI-wide implementations.- El-Gebali, S. (2024). Concepts for metadata interoperability, harmonization and technical integration of PID infrastructure (1.0). Zenodo. https://doi.org/10.5281/zenodo.14506138
- coli-conc - Cocoda. (2025). Coli-Conc.gbv.de. https://coli-conc.gbv.de/cocoda
- Terminology Service Suite. (2025). Base4nfdi.de. https://terminology.services.base4nfdi.de/tss/comp/latest/
- TS4NFDI Api Gateway. (2025). Base4nfdi.de. https://terminology.services.base4nfdi.de/api-gateway/swagger-ui/index.html
Sprecher: Tilo Mathes (ResearchSpace) -
11:40
The Agentic Automation Canvas: A Metadata Framework for Human-AI Task Delegation 10m
Agentic AI systems—autonomous software driven by large language models (LLMs)—promise significant efficiency gains by performing tasks that traditionally required human judgment. However, their deployment fundamentally involves control inversion: humans must step back and allow the system to take command. The ease of building impressive prototypes with current LLMs creates a dangerous mismatch: stakeholders see quick demos and assume production-ready solutions are within reach, while the bulk of actual work—handling edge cases, ensuring reliability, integrating governance, and validating real-world performance—lies beyond the prototype. Without an explicit contract defining expectations before control inversion occurs, organizations face disillusionment when promised benefits fail to materialize.
We present the Agentic Automation Canvas (AAC), a structured metadata framework that captures the essential agreement between human stakeholders and agentic AI systems. By formalizing this as machine-readable metadata rather than traditional requirements documents, the AAC enables automated validation of stakeholder agreements, cross-system interoperability, and integration with institutional governance workflows. The canvas formalizes user requirements with quantified benefit expectations and balances them with developer feasibility assessments including model baseline capabilities, governance stages with assigned accountability, data access rights, and evaluation metrics for comparing outcomes against expectations. The AAC is implemented as an interactive web application (https://aac.slolab.ai) exporting versioned RO-Crate packages. Where possible, the schema maps to established vocabularies (Schema.org, PROV-O, DCAT, P-Plan, FRAPO, DUO); for agentic-specific concepts such as benefit metrics, baseline capabilities, and control inversion agreements, we introduce new terms under a registered https://w3id.org/aac/ namespace.
By requiring this contract before control inversion, the AAC bridges the gap between prototype enthusiasm and production reality. The resulting RO-Crate travels alongside the project as a machine-readable artifact designed to support governance, auditable decision-making, and benefit tracking throughout the collaboration lifecycle.
Sprecher: Sebastian Lobentanzer (Helmholtz Munich) -
11:55
AIMWORKS: Template‑Driven, Agentic Framework for FAIR Knowledge Graph Construction in Hydrogen Technologies 10m
Hydrogen and electrochemical energy research produces rapidly evolving, heterogeneous outputs - protocols, instrument settings, conditioned performance metrics, and multi-scale materials descriptors - that are difficult to curate into FAIR, machine-actionable metadata [1]. Many “LLM-first” knowledge-graph pipelines rely on monolithic prompts and ad-hoc post-processing, which can lead to inconsistent terminology, unreliable unit handling, and missing provenance and dataset details [2]. We present AIMWORKS [3], a template-driven, agentic framework that improves FAIR metadata by design. AIMWORKS uses a stable core vocabulary and a curated library of reusable templates for common experimental patterns (measurements, processes, experimental context, instruments, metrics, and dataset/provenance blocks) [4]. Given a user’s natural-language research question, the system selects the most relevant templates, assembles them into a structured knowledge graph, and exports it in standard formats (RDF/JSON-LD). To ensure reliability, each template includes micro-level validation rules (SHACL) and the system applies deterministic checks and repairs to enforce consistent typing, represent conditioned metrics via a DataPoint pattern, normalise quantities and units using QUDT, and generate a complete dataset description (including title, license, and access information). Outputs integrate cleanly into downstream platforms such as Neo4j and institutional knowledge-graph infrastructures. Case studies from hydrogen technologies (polarization curves, impedance spectroscopy, durability protocols, and ionomer–catalyst-layer questions) show that the template-first approach improves metadata completeness and interoperability while reducing manual curation and providing a transparent trace from query to graph.
References
[1] Dreger, M., Eslamibidgoli, M. J., Eikerling, M. H., & Malek, K. (2023). Synergizing ontologies and graph databases for highly flexible materials-to-device workflow representations. Journal of Materials Informatics, 3(1), N-A
[2] Dreger, M., Malek, K., & Eikerling, M. (2025). Large language models for knowledge graph extraction from tables in materials science. Digital Discovery.
[3] https://meslamib3-aimworks4-streamlit-app7-2kyctx.streamlit.app/
[4] https://github.com/meslamib3/aimworks4/tree/main/templatesSprecher: Mohammad J. Eslamibidgoli (Theory and Computation of Energy Materials (IET-3), Institute of Energy Technologies, Forschungszentrum Jülich GmbH, 52425 Jülich, Germany)
-
10:30
-
12:15
→
12:30
WRAP UP: Official Closing of the HMC Conference 2026 & Awards Communication Center
Communication Center
DKFZ, Heidelberg
Im Neuenheimer Feld 280 69120 Heidelberg, germany -
12:30
→
13:30
LIGHT LUNCH 1 h Communication Center, Foyer (DKFZ)
Communication Center, Foyer
DKFZ
-
09:00
→
10:00


