28.–30. Apr. 2026
DKFZ, Heidelberg
Europe/Berlin Zeitzone

HVRLocator: A Computationally Efficient Tool for Identifying Hypervariable Regions in Large 16S rRNA Datasets

P46
Nicht eingeplant
3m
Communication Center (DKFZ, Heidelberg)

Communication Center

DKFZ, Heidelberg

Im Neuenheimer Feld 280 69120 Heidelberg, germany
Poster 5. Advancing FAIR Metadata with AI: Methods, Challenges, and Synergies POSTERS & DEMOS - with Coffee

Sprecher

Clara Arboleda Baena (Department of Applied Microbial Ecology, Helmholtz Centre for Environmental Research - UFZ)

Beschreibung

Metabarcoding of the 16S rRNA gene is widely used to assess microbial diversity due to its cost-effectiveness and efficiency. However, publicly available 16S rRNA metabarcoding datasets often lack standardized metadata, particularly information on the sequenced hypervariable regions or primers used, which are critical to their accurate reuse. To address this, we present HVRLocator, a computational tool that (1) identifies the start and end positions of 16S rRNA amplicons, (2) determines their corresponding hypervariable regions, and (3) detects the presence of primer sequences. This tool was validated on four datasets comprising 41,513 samples generated with different primers and sequencing platforms.
HVRLocator can process archived 16S rRNA sequences from NCBI SRA at an average rate of 6.5 samples per minute. Validation showed it reliably detects amplicon start and end positions across datasets sequenced with different primers and platforms, achieving 100% accuracy within single-platform studies and correctly revealing length heterogeneity across platforms. It also flagged misannotated metadata and problematic sequences, underscoring its value as a sequence data curation tool. Finally, HVRLocator can select comparable sequences to build large 16S rRNA amplicon databases spanning the same hypervariable region, facilitating cross-study comparisons.
In conclusion, this tool overcomes unreliable metadata by accurately identifying 16S rRNA amplicon start and end positions, determining hypervariable regions, and detecting primer sequences, thereby enabling accurate curation and large-scale processing of 16S rRNA data for reliable and reproducible microbial studies, syntheses, and meta-analyses.

Autoren

Antonis Chatzinotas (Helmholtz Centre for Environmental Research - UFZ, German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig University) Clara Arboleda Baena (Department of Applied Microbial Ecology, Helmholtz Centre for Environmental Research - UFZ) Felipe Borim Correa (Helmholtz Centre for Environmental Research - UFZ) Joao Pedro Saraiva (Helmholtz Centre for Environmental Research - UFZ) Jonas Coelho Kasmanas (Helmholtz Centre for Environmental Research - UFZ) Santiago Castillo-Rivadeneira (German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig) Stephanie Jurburg (Helmholtz Centre for Environmental Research - UFZ, German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig)

Präsentationsmaterialien

Es gibt derzeit keine Materialien.