28.–30. Apr. 2026
DKFZ, Heidelberg
Europe/Berlin Zeitzone

Large-scale Extraction and Annotation of Quantitative Information on Energy Technologies from Scientific Literature

D07
Nicht eingeplant
3m
Communication Center (DKFZ, Heidelberg)

Communication Center

DKFZ, Heidelberg

Im Neuenheimer Feld 280 69120 Heidelberg, germany
Demo Poster 4. Human-machine collaboration in (meta)data acquisition POSTERS & DEMOS - with Drinks

Sprecher

Maxime Gorres (Forschungszentrum Jülich GmbH, Institute of Climate and Energy Systems – Jülich Systems Analysis, 52425 Jülich, Germany)

Beschreibung

Systematic literature reviews are fundamental to energy system analysis, yet are often time-consuming, incomplete, and inconsistent. While manually curated datasets provide valuable structured information for specific subdomains of energy research, extending such efforts to the entire field remains challenging. At the same time, easily accessible and extensible quanti-tative evidence would substantially benefit the research community.
In this contribution, we present a large-scale, automatically compiled dataset of quantitative information extracted from 15 years of energy systems literature using Quinex, an LLM-based information extraction tool. Quinex identifies quantitative statements and transforms them into structured data containing numerical values, units, quantified properties, entities, and contextual metadata such as spatial and temporal scope. The literature corpus was com-piled using advanced searches in Scopus and Web of Science, covering a broad range of keywords. It comprises approximately 76,000 abstracts, of which around 31,000 include full texts.
Applying Quinex to this corpus yielded roughly three million quantitative datapoints. As the tool is domain-agnostic, the extracted information includes values unrelated to energy systems. To enable meaningful analysis, we implemented a filtering and normalization workflow based on regular expressions, resulting in a dataset tailored to energy system research.
A preliminary analysis demonstrates the dataset’s potential applications. Photovoltaic and wind technologies constitute the largest share, with cost and efficiency being the most fre-quently reported properties. The distribution of technologies exhibits strong regional patterns, reflecting differences in research focus across countries. Normalized data and metadata fur-ther enable temporal analyses, revealing trends in key techno-economic parameters such as efficiency, lifetime, and capacity factor.
The processed data are made available through an interactive dashboard that allows users to filter, visualize, and download customized subsets. Future work will map extracted metadata to the Open Energy Ontology and integrate the dataset into a collaborative infrastructure to support community-driven data sharing.

Alternative Track 1. Advancing FAIR Metadata with AI: Methods, Challenges, and Synergies

Autor

Maxime Gorres (Forschungszentrum Jülich GmbH, Institute of Climate and Energy Systems – Jülich Systems Analysis, 52425 Jülich, Germany)

Co-Autoren

Jan Göpfert (Forschungszentrum Jülich GmbH, Institute of Climate and Energy Systems – Jülich Systems Analysis, 52425 Jülich, Germany) Jann M. Weinand (Forschungszentrum Jülich GmbH, Institute of Climate and Energy Systems – Jülich Systems Analysis, 52425 Jülich, Germany) Patrick Kuckertz (Forschungszentrum Jülich GmbH, Institute of Climate and Energy Systems – Jülich Systems Analysis, 52425 Jülich, Germany) Titan Hartono (Forschungszentrum Jülich GmbH, Institute of Climate and Energy Systems – Jülich Systems Analysis, 52425 Jülich, Germany)

Präsentationsmaterialien

Es gibt derzeit keine Materialien.