View Posters By Category
Scroll down to view Results
Session A: July 21, 2025 at 10:00-11:20 and 16:00-16:40
|
Session B: July 22, 2025 at 10:00-11:20 and 16:00-16:40
|
|
|
Session A Posters set up: Monday, July 21 between 08:00 - 08:40
Session A Posters dismantle: Tuesday, July 22 at 18:00 | Session B Posters set up: Monday, July 21 between 08:00 - 08:40
Session B Posters dismantle: Tuesday, July 22 at 18:00 |
Session C: July 23, 2025 at 10:00-11:20 and 16:00-16:40
|
Session D: July 24, 2025 at 10:00-11:20 and 13:00-14:00
|
|
|
Session C Posters set up: Wednesday, July 23 between 08:00 - 08:40
Session C Posters dismantle: Thursday, July 24 at 16:00 | Session D Posters set up: Wednesday, July 23 between 08:00 - 08:40
Session D Posters dismantle: Thursday, July 24 at 16:00 |
Virtual
|
Student Council Symposium
|
|
|
Results
A-086: Statistical end-to-end analysis of large-scale microbial growth data with DGrowthR
Track: BOSC: Bioinformatics Open Source Conference
- Medina Feldl, Institute of Computational Biology, Helmholtz Munich, Germany
- Roberto Olayo-Alarcon, Department of Statistics, Ludwig-Maximilians-Universität München, Germany
- Martin K. Amstalden, Department of Microbiology, Biocenter, Julius-Maximilians-Universität Würzburg, Germany
- Annamaria Zannoni, Department of Molecular Infection Biology II, Julius-Maximilians-Universität Würzburg, Germany
- Stefanie Peschel, Department of Statistics, Ludwig-Maximilians-Universität München, Germany
- Cynthia M. Sharma, Department of Molecular Infection Biology II, Julius-Maximilians-Universität Würzburg, Germany
- Ana Rita Brochado, Department of Microbiology, Biocenter, Julius-Maximilians-Universität Würzburg, Germany
- Christian L. Müller, Institute of Computational Biology, Helmholtz Munich, Germany
Presentation Overview: Show
Quantitative analysis of microbial growth curves is essential for understanding how bacterial populations respond to environmental cues. Traditional analysis approaches make parametric assumptions about the functional form of these curves, limiting their usefulness for studying conditions that distort standard growth curves. In addition, modern robotics platforms enable the high-throughput collection of large volumes of growth data, thus requiring strategies that can analyze large-scale growth data in a flexible and efficient manner. Here, we introduce DGrowthR, a statistical R and standalone app framework for the integrative analysis of large growth experiments. DGrowthR comprises methods for data pre-processing and standardization, exploratory functional data analysis, and nonparametric modeling of growth curves using Gaussian Process regression. Importantly, DGrowthR includes a rigorous statistical testing framework for differential growth analysis. To illustrate the range of application scenarios of DGrowthR, we analyzed three large-scale bacterial growth datasets that tackle distinct scientific questions. On an in-house large-scale growth dataset comprising two pathogens that were subjected to a large chemical perturbation screen, DGrowthR enabled the discovery of compounds with significant growth inhibitory effects as well as compounds that induce non-canonical growth dynamics. We also re-analyzed two publicly available datasets and recovered reported adjuvants and antagonists of antibiotic activity, as well as bacterial genetic factors that determine susceptibility to specific antibiotic treatments. We anticipate that DGrowthR will streamline the analysis of modern high-volume growth experiments, enabling researchers to gain novel biological insights in a standardized and reproducible manner.
A-088: Full-spectrum pipelines for single-cell and spatial transcriptomics analysis
Track: BOSC: Bioinformatics Open Source Conference
- Huihai Wu, Earlham Institute, United Kingdom
- Ashleigh Lister, Earlham Institute, United Kingdom
- Iain Macaulay, Earlham Institute, United Kingdom
- Katie Long, John Innes Centre, United Kingdom
- Cristobal Uauy, John Innes Centre, United Kingdom
- Yuxuan Lan, Earlham Institute, United Kingdom
- David Swarbreck, Earlham Institute, United Kingdom
- Irene Papatheodorou, Earlham Institute, United Kingdom
Presentation Overview: Show
This poster presents two full-spectrum pipelines we developed at the Earlham Institute: EISCA pipeline for scRNA-seq data analysis, and EISTA pipeline for spatial transcriptomics data analysis. Both pipelines cover all phases of analysis from primary and secondary to tertiary. They are built on the Nextflow nf-core framework aiming to provide generalized, flexible, and scalable workflows.
EISCA (github.com/TGAC/eisca) is designed for droplet-based (10x) and plate-based (Smart-seq2) data. Its analysis stages include quality control, alignment, quantification, count conversion, cell filtering, clustering, cell-type annotation, and differential expression analysis (DEA). EISTA (github.com/TGAC/eista) is designed for Vizgen MERFISH data (support for 10x Xenium planned). It includes Vizgen post-processing, count conversion, cell filtering, clustering, spatial statistical analysis, and will also support cell-type annotation and DEA in the tertiary analysis phase.
The pipelines offer several key benefits. 1) They are easy to use, allowing users to launch them directly without the need to pre-download tools, and they can be run on local machines, HPC clusters, or cloud platforms. 2) They are flexible, supporting both end-to-end execution and specific analysis phases. Users can also tweak analysis to gain better insights. 3) The pipelines follow standardized practices, relying on widely used Python packages to efficiently process large data in a consistent manner. 4) They are extensible, serving as a foundation for basic analysis while allowing users to integrate task-specific analyses. As the first full-spectrum pipelines for single-cell analysis, they enable quick preliminary analyses and an out-of-the-box report, ensuring smooth data assessment and transition to advanced analyses.
A-092: SingleRust: A Foundation for Scalable High-Throughput Single-Cell Analysis in Rust
Track: BOSC: Bioinformatics Open Source Conference
- Ian Ferenc Diks, Chair for Clinical Bioinformatics, Saarland University, Germany
- Matthias Flotho, Chair for Clinical Bioinformatics, Saarland University, Germany
- Andreas Keller, Chair for Clinical Bioinformatics, Saarland University, Germany
Presentation Overview: Show
SingleRust is an emerging library that brings single-cell data analysis capabilities to the Rust ecosystem. Our project focuses on establishing a solid foundation for production-grade, high-throughput analysis pipelines by leveraging Rust's core strengths: memory safety and efficiency without garbage collection, fearless concurrency, and predictable performance characteristics.
While Python remains an excellent tool for exploratory analysis and prototyping, SingleRust aims to complement the ecosystem by providing a pathway to build robust, maintainable pipelines for pre-processing large single-cell datasets. The library implements common single-cell workflows, including data loading, quality control, normalization, feature selection, dimensionality reduction and differential expression using Rust's type system to enhance code reliability.
By focusing on the translation from prototyping to production, SingleRust enables researchers and computational biologists to create scalable analysis pipelines that can handle growing dataset sizes and complexity. Currently in alpha development, our open-source approach encourages collaboration between computational biologists and the Rust community to build a sustainable foundation for the next generation of single-cell analysis tools. All code is freely available at https://github.com/SingleRust and documented at https://singlerust.com .
A-094: Managing Workflow Executions with WESkit for FAIR Data Management
Track: BOSC: Bioinformatics Open Source Conference
- Valentin Schneider-Lunitz, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Germany
- Philip Kensche, German Cancer Research Center (DKFZ), Germany
- Landfried Kraatz, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Germany
- Philipp Strubel, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Germany
- Ivo Buchhalter, German Cancer Research Center (DKFZ), Germany
- Sven Twardziok, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Germany
Presentation Overview: Show
The rapid advancement of biological and medical research technologies is generating vast amounts of data, necessitating effective research data management to ensure high quality and reproducibility. Compliance with FAIR (Findable, Accessible, Interoperable, Reusable) principles is crucial for maintaining data integrity throughout the data life cycle. WESkit, a tool implementing the GA4GH Workflow Execution Service (WES) interface, facilitates the execution, monitoring, and documentation of workflows, addressing the challenges in managing numerous workflow executions with different parameters across various research projects. Developed through collaboration between Charité Universitätsmedizin Berlin and DKFZ, WESkit supports the execution of Snakemake and Nextflow workflows on both local and remote infrastructures, offering scalability and compatibility with scheduling systems. The tool's advantages include long-term use within research groups, uniform automation, and monitoring of workflow executions, making it suitable for larger teams and service units. WESkit can be integrated into cloud environments, contributing to the GA4GH cloud framework and supporting secure workflow execution on sensitive data. Its development continues as an open-source project, driven by the joint efforts of the DKFZ and Charité Universitätsmedizin Berlin, and supported by the German Bioinformatics Network de.NBI.
A-096: Creating a FAIR-Compliant Metadata Framework for RNA Sequencing: A Case Study on Amyotrophic Lateral Sclerosis
Track: BOSC: Bioinformatics Open Source Conference
- Edīte Vārtiņa, Riga Stradiņš University, Latvia
- Tatjana Kiseļova, Riga Stradiņš University, Latvia
- Burak Özkan, Ulm University, Germany
- Egija Berga-Švītiņa, Riga Stradiņš University, Latvia
- Marcin Bączyk, Poznań University of Physical Education, Poland
- Francesco Roselli, Ulm University, Germany
- Baiba Vilne, Riga Stradiņš University, Latvia
Presentation Overview: Show
Background: A key challenge in using RNA-seq data is managing the associated metadata, which is often documented manually, inconsistently, and inefficiently, hindering the reproducibility and reusability of RNA-seq research. This is particularly true for complex diseases like Amyotrophic Lateral Sclerosis (ALS), where data is often scattered across multiple platforms and formats.
Data and Methods: To address described metadata challenges, we are evaluating existing standards —MIAME/MINSEQE, DCMI/ISA, and EDAM—to identify gaps and guide the development of a unified, FAIR-compliant framework. We are conducting interviews with researchers and data stewards to gather user requirements, focusing on real-world metadata workflows and usability needs. These insights are informing the design of a user-friendly, ontology-driven tool for automated metadata generation. ALS-related RNA-seq datasets serve as a test case to assess the framework’s applicability and refine its design.
Results: Preliminary findings reveal several critical challenges in current RNA-seq metadata practices: inconsistent terminology across studies and platforms, fragmented and heterogeneous metadata formats, which complicate downstream reusability, and a high dependency on manual input. Researchers and data stewards consistently highlight the need for simplified, standardised and automated metadata entry solutions. The emerging framework will enforce the use of controlled vocabularies and community standards, support both human- and machine-readable formats, and enable interoperability with infrastructures such as WorkflowHub, BioContainers, and bio.tools, adhering to FAIR4RS principles.
Conclusions: Our work confirms the urgent need for a standardised, FAIR-compliant approach to RNA-seq metadata management to enhance the reproducibility and impact of research, especially in complex disease contexts like ALS.
A-098: Pfam database in 2025 – towards expanding protein families coverage
Track: BOSC: Bioinformatics Open Source Conference
- Nicole Alejandra Morveli Flores, EMBL-EBI, United Kingdom
- Antonina Andreeva, EMBL-EBI, United Kingdom
- Beatriz Lazaro Pinto, EMBL-EBI, United Kingdom
- Typhaine Paysan-Lafosse, EMBL-EBI, United Kingdom
- Sara Chuguransky, EMBL-EBI, United Kingdom
- Tiago Grego, EMBL-EBI, United Kingdom
- Alex Bateman, EMBL-EBI, United Kingdom
Presentation Overview: Show
Pfam serves as a comprehensive repository of protein families, represented by manually curated multiple sequence alignments and hidden Markov models, widely used as a key resource in biological research. Recent developments have transitioned Pfam from a standalone platform to full integration within InterPro, enabling streamlined management and more frequent updates. Since Pfam release 36.0, this transition has facilitated the addition of nearly 4000 new families, bringing the total of 24,736 families, 780 clans and 76.85 % coverage of the reference proteome in the latest 37.4 release. Pfam methodology currently prioritises a structure-guided approach, leveraging AlphaFold2-predicted models to refine domain boundaries, uncover novel domains, and improve classification accuracy. In addition, Pfam now aligns with the ECOD structural classification, enhancing cross-referencing between sequence-based families and structural domains as well as tailoring classification details using the ECOD hierarchical framework. The Pfam curation workflow also takes advantage of The Encyclopedia of Domains (TED) resource to exploit protein domain data and systematically expand family annotations. Here we report the major efforts contributing to the Pfam growth and improved annotation quality alongside some interesting findings providing important biological insights into previously uncharacterised protein families.
A-100: nf-core/seqinspector - a basic QC pipeline for sequencing core facilities
Track: BOSC: Bioinformatics Open Source Conference
- Adrien Coulier, Uppsala University, Department of Medical Sciences National Genomics Infrastructure Uppsala, Scilifelab, Sweden
- Alfred Kedhammar, Stockholm University, Department of Biochemistry and Biophysics, National Genomics Infrastructure Stockholm, Scilifelab, Sweden
- Cris Tuñí i Domínguez, Flomics Biotech, Barcelona, Spain
- Erkut Ilaslan, University of Copenhagen, Department of Cellular and Molecular Medicine, Denmark
- Franziska Bonath, Kungliga Tekniska Högskolan, Department of Gene Technology, National Genomics Infrastructure, Scilifelab, Sweden
- Johannes Alneberg, Kungliga Tekniska Högskolan, Department of Gene Technology, National Genomics Infrastructure Stockholm, Scilifelab, Sweden
- Karthik Nair, Uppsala University, Department of Medical Sciences National Genomics Infrastructure Uppsala, Scilifelab, Sweden
- Mahesh Binzer-Panchal, Uppsala Universitet, Department of Medical Biochemistry and Microbiology, NBIS, Scilifelab, Sweden
- Matilda Åslin, Uppsala University, Department of Medical Sciences National Genomics Infrastructure Uppsala, Scilifelab, Sweden
- Matthias Hörtenhuber, Uppsala University, Department of Immunology, Genetics and Pathology, Scilifelab Data Centre, Sweden
- Matthias Zepper, Stockholm University, Department of Biochemistry and Biophysics, National Genomics Infrastructure Stockholm, Scilifelab, Sweden
- Maxime Garcia, Seqera, Scientific Development, Sweden
- Thomas Adams, The James Hutton Institute, Department of Cell and Molecular Sciences, Dundee, Scotland, United Kingdom
- Yuk Woon Cheung, The University of Dundee, Division of Plant Sciences at the Hutton, Scotland, United Kingdom
- Nf-Core Community, nf-core, Sweden
Presentation Overview: Show
Providing high quality data from a range of different sequencing instruments to their users is in the interest of every sequencing facility. In order to monitor their sequencing quality, performing standardized, yet flexible quality controls for every sequencing project and sample that passes through their facilities is crucial to ensure consistent quality and dependable results.
The Nextflow pipeline nf-core/seqinspector is planned to be a unified quality control pipeline for data originating from instruments of various providers like Illumina, Oxford Nanopore Technologies or Pacific Biosciences.
It will assess sequencing quality, duplication levels and complexity on a per-sample basis, in addition to highlighting adapter contents and technical artifacts. Furthermore, it will facilitate the detection of common biological contaminants that may have been introduced to the samples before or during library preparation.
Since facilities share their flowcells and even sequencing lanes between different projects, the report generation will be particularly versatile and customizable. Quality reports can be obtained with a variable granularity ranging from individual samples or projects to whole flow cells. Therefore, receiving one single MultiQC report that summarizes all input samples, or having individual MultiQC reports for sample groups determined by the sample sheet will be possible.
The nf-core/seqinspector is developed by and for core facilities but can also be utilized by any entity that has access to a sequencer. This project is part of the open source community “nf-core”, that operates under the FAIR principles.
A-102: QuasiSpecies Analyser (QSA) as a novel bioinformatic tool for the analysis of viral population genetics
Track: BOSC: Bioinformatics Open Source Conference
- Niccolò Guglietta, Department of Public Health, Experimental and Forensic Medicine, University of Pavia, Pavia, 27100, Italy, Italy
- Giorgio Gallinella, Department of Pharmacy and Biotechnology, University of Bologna, 40138 Bologna, Italy, Italy
Presentation Overview: Show
Quasispecies viruses exhibit high mutation rates, creating a dynamic genomic cloud of co-existing variants within a single host. High-Throughput Sequencing (HTS) offers deeper coverage than standard methods, revealing more information on intrinsic sequence heterogeneity. Most bioinformatic analyses focus on the consensus sequence, discarding crucial details on the viral population structure. To overcome this issue, QuasiSpecies Analyser (QSA) is a novel bioinformatic tool under active development able to utilise previously discarded information from HTS output.
Starting from a set of HTS reads, QSA can perform the basic steps of quality control, trimming, and alignment to a reference genome. From the alignment file, Position-Specific Scoring Matrices (PSSMs) are generated. The Position Probability Matrices (PPMs) are then utilised to calculate Efficiency (normalised Shannon’s Entropy) values for each position. These values are then summarised as alpha-diversity and delta-diversity, novel measures of intra- and inter-sample diversity respectively.
QSA has already been tested on Human Parvovirus B19 (B19V), successfully identifying highly variable and conserved regions. The alpha- and delta-diversity metrics capture sample-specific mutation patterns and differences between viral populations as expected.
Ongoing development aims to integrate haplotype reconstruction using Hidden Markov Models (HMMs) and a user-friendly graphical interface, making QSA an accessible and powerful tool for studying viral population genetics.
A-104: Data and code are first class research objects, how do we ensure they get the recognition they deserve?
Track: BOSC: Bioinformatics Open Source Conference
- Christopher Hunter, GigaScience Press, United Kingdom
- Yannan Fan, GigaScience Press, China
- Bastien Molcrette, GigaScience Press, France
- Mary Ann Tuli, GigaScience Press, United Kingdom
- Scott Edmunds, GigaScience Press, Hong Kong
Presentation Overview: Show
GigaScience Press publishes two journals; GigaScience and GigaByte, both have extremely high standards for transparency and reproducibility. In order to maintain these standards we have a team of data curators (biocurators) whose job is to ensure the manuscripts' are as transparent and reproducible as possible prior to publication.
Whilst we do have GigaDB, our own repository to host data associated with GigaScience Press articles, we try to use that only for data that do not have a natural home in a suitable stable public repository such as the INSDC or Software Heritage Library. GigaDB biocurators act as an interpretation layer between the often-verbose standards of FAIR data, and time-pressured authors, who require succinct and pertinent guidance specific to their manuscript. We accomplish this by scanning the manuscript prior to peer-review and instructing the submitting author(s) how to make their research objects available, either privately whilst under peer-review, or directly open. This ensures reviewers can spend their time reviewing the article rather than hunting for the data or code. Collecting the research objects together at this stage ensures that if the manuscript passes peer-review we can proceed straight away with the publication process.
Here we present a summary of the variety of different repositories that our biocurators recommend to authors, and how we ensure those data are suitably cited and linked from the journal articles and, where appropriate, from GigaDB datasets.
A-106: Biocentral: An Open Source Platform for Democratizing Bioinformatics Research
Track: BOSC: Bioinformatics Open Source Conference
- Sebastian Franz, Technical University of Munich (TUM), Germany
- Luisa Jiménez-Soto, University of Munich (LMU), Germany
- Burkhard Rost, Technical University of Munich (TUM), Germany
Presentation Overview: Show
The transformative power of modern bioinformatics and artificial intelligence offers unprecedented opportunities to accelerate scientific discovery - from understanding protein function to drug development. To fully realize this potential, we need stronger bridges between computational experts and experimental scientists, enabling collaborative innovation that crosses traditional disciplinary boundaries. While current developments are promising, many researchers still face technical barriers that prevent them from leveraging these powerful computational methods in their daily work.
We address this challenge with biocentral, an open source platform that democratizes access to advanced bioinformatics tools. Unlike existing proprietary solutions, biocentral provides a free, intuitive graphical interface that empowers researchers - regardless of their programming expertise - to perform sophisticated analyses, visualizations, and machine learning tasks on biological data. It enables researchers to train models on their own data with a few clicks, while offering baselines, error bars for all metrics and sanity checks. Thus, the platform integrates state-of-the-art tools while abstracting away technical complexity, allowing to focus on data quality and methodological considerations.
Beyond mere accessibility, biocentral advocates for radical transparency in research through FAIR data management and ‘executable papers’ that make every analysis step reproducible. Our community-driven development approach ensures the platform evolves with researchers' needs while maintaining rigorous scientific standards. Initial deployments in a microbiology laboratory demonstrate how democratizing computational tools can accelerate research progress.
In an era where scientific advancement should not be limited by technical barriers or paywalls, biocentral represents another step toward truly open and equitable research practices.
A-108: Agora: An Open Science Platform for Evidence-Based Exploration of Alzheimer's Disease Therapeutic Targets
Track: BOSC: Bioinformatics Open Source Conference
- Jessica Britton, Sage Bionetworks, United States
- Jesse C. Wiley, Sage Bionetworks, United States
- Jaclyn Beck, Sage Bionetworks, United States
- Lawrence Yi, Sage Bionetworks, United States
- Beatriz Saldana, Sage Bionetworks, United States
- Brad MacDonald, Sage Bionetworks, United States
- Khai Do, Sage Bionetworks, United States
- Stockard Simon, Sage Bionetworks, United States
- Hallie Swan, Sage Bionetworks, United States
- Thomas Yu, Sage Bionetworks, United States
- Jay Hodgson, Sage Bionetworks, United States
- Anna Greenwood, Sage Bionetworks, United States
- Laura Heath, Sage Bionetworks, United States
- Susheel Varma, Sage Bionetworks, United Kingdom
Presentation Overview: Show
Background: Agora (https://agora.adknowledgeportal.org) is an openly available web application developed to enable a broad spectrum of Alzheimer's disease (AD) researchers to access the target-based evidence generated within the National Institute on Aging (NIA) 's translational research portfolio. Agora accelerates AD research and maximizes therapeutic discovery by sharing information using interactive tools, data visualizations, and summarized evidence.
Methods: Agora enables users to browse over 950 potential therapeutic targets nominated by the NIA's Accelerating Medicines Partnership in AD (AMP-AD) consortium, Target Enablement to Accelerate Therapy Development for AD (TREAT-AD) centers, and the broader AD research community. The platform provides visualizations and summarized information based on harmonized genome-wide human transcriptomic, proteomic, and metabolomic data analyses. It includes interactive tools designed for non-bioinformaticians to evaluate and compare complex multi-omic data, objectively ranks targets based on their role in AD, and provides access to experimental reagents and bioinformatics packages supporting further investigation.
Results: Recent updates to Agora include new TREAT-AD target nominations and enhanced connections to external resources, including a new drug development resource section and UniprotKB integration.
Conclusions: Advancing potential Alzheimer's disease (AD) therapeutics requires collaboration between academia and industry. Agora unites the AD research community by enabling the development and sharing of target hypotheses, accelerating the investigation of promising new therapeutic targets, pathways, and mechanisms.
A-110: Monarch-Ingest and MonarchKG;
Track: BOSC: Bioinformatics Open Source Conference
- Daniel Korn, Department of Genetics, University of North Carolina at Chapel Hill, United States
- Katherina Cortes, University of Colorado Anschutz Medical Campus, United States
- Kevin Schaper, Department of Genetics, University of North Carolina at Chapel Hill, United States
- Corey Cox, Department of Genetics, University of North Carolina at Chapel Hill, United States
- Patrick Golden, Department of Genetics, University of North Carolina at Chapel Hill, United States
- Melissa Haendel, Department of Genetics, University of North Carolina at Chapel Hill, United States
Presentation Overview: Show
In recent years, biomedical knowledge graphs (BKGs) have become fundamental tools for organizing and understanding biological information by providing a centralized repository of biomedical datasets. We aim to present and share with the community the Monarch-Ingest tool and its construct, the MonarchKG; a leading edge expert-curated BKG.
BKG construction is a large complex process demanding data be retrieved, parsed, merged into a harmonized representation, and finally constructed into a data artifact. The Monarch-Ingest tool is a ""turn-key"" BKG construction suite, which aims to streamline all parsing, aggregation, merging, and formatting steps into one pipeline. Our team has already catalogued 18 data sources with over 80+ data artifacts into our catalog. The software suite consists of 4 core components; Koza, Cat-Merge, KGX, and our ingest catalog; each of these tools provide various key components of the build pipeline. Our system aims to easily extend to new data sources, allowing community members to contribute to our build process to novel data and interlink into all knowledge covered by our existing catalog with ease.
The MonarchKG is created by combining all available ingests in Monarch-Ingest. Covering phenotypes, genomics, genetic and clinical variants, chemical interactions, and model organism information; we aim to be a source of high quality domain expert curated biomedical knowledge.
The Monarch-Ingest and it's software components are available via Github (https://github.com/monarch-initiative/monarch-ingest/), the MonarchKG is released monthly, a neo4j version available at https://neo4j.monarchinitiative.org/ and regularly updated dumps available at https://data.monarchinitiative.org/monarch-kg/latest/.
A-112: The Overture Data Dictionary Viewer
Track: BOSC: Bioinformatics Open Source Conference
- Melanie Courtot, Ontario Institute for Cancer Research, Canada
- Mitchell Shiell, Ontario Institute of Cancer Research (OICR), Canada
- Patrick Dos Santos, Ontario Institute for Cancer Research (OICR), Canada
- Jon Eubank, Ontario Institute for Cancer Research (OICR), Canada
- Ciarán Schütte, Ontario Institute of Cancer Research (OICR), Canada
- Justin Richardsson, Ontario Institute for Cancer Research (OICR), Canada
- Saqib Ashraf, Ontario Institute for Cancer Research (OICR), Canada
- Lincoln Stein, Ontario Institute for Cancer Research, Canada
- Overture Team, Ontario Institute of Cancer Research (OICR), Canada
- Robin Haw, Ontario Institute of Cancer Research (OICR),
Presentation Overview: Show
Overture is used to build platforms that enable researchers to organize and share their
data quickly, flexibly and at multiple scales. While it currently powers major international
platforms like ICGC-ARGO (100,000+ participants) and VirusSeq (500,000+ genomes), there is
an unfulfilled demand to address the needs of smaller research teams with less technical expertise. Researchers do not have time to learn complicated
interfaces or train multiple team members on complex systems. Furthermore, data management
systems are either too technical for non-developers (custom solutions) or too limited for serious
research applications (spreadsheet editors).
One practical barrier is communicating data requirements to researchers. In
Overture, Lectern data dictionaries define tabular submission standards but exist only as JSON
objects accessible through API calls, creating significant friction for data submitters who need to
understand schema requirements before contributing. How can we create user interfaces that
abstracts complexity and connect researchers with their data?
We address this by developing intuitive interfaces for Lectern schema viewing,
transforming technical specifications into accessible, organized displays. Built as
reusable components, these interfaces will initially serve the Pan-Canadian Genome Library
(PCGL)—a national initiative unifying Canada's human genome sequencing efforts. The schema
viewer, similar to the successful ICGC ARGO dictionary viewer, will enhance our ability to
communicate data requirements clearly, align with existing standards, and encourage
submissions.
As we develop these interfaces for reuse, we're seeking feedback from the research
community on:
● Interface design and user experience priorities
● Feature requirements and needs
● Real-world use cases and pain points
A-114: Advancing the Computability of AOPs: AOP-Wiki Releases 2.7 Through 2.8
Track: BOSC: Bioinformatics Open Source Conference
- Virginia Hench, Open BioData Modeling, LLC, United States
- Travis Karschnik, US EPA, Duluth, MN, United States
- David Williams, RTI International, United States
- Jaleh Abedini, US EPA, Research Triangle Park, NC, United States
- Nathalie Delrue, Organisation for Economic Co-operation and Development, France
- Magdalini Sachana, Organisation for Economic Co-operation and Development, France
- Dan Villeneuve, US EPA, Duluth, MN, United States
- Stephen Edwards, US EPA, Research Triangle Park, NC, United States
- Clemens Wittwehr, European Commission, Joint Research Centre, Italy
Presentation Overview: Show
An adverse outcome pathway (AOP) is an analytical framework used to organize biomechanistic information linking molecular-level exposure events to downstream adverse outcomes, important for regulatory decision-making. The AOP-Wiki is a globally accessible user interface for the AOP Knowledgebase (AOP-KB), the central AOP repository that was first launched in 2013. The AOP-Wiki and AOP-KB have undergone multiple iterations and reached version 2.6 in 2023.
For 2 years, the AOP-Wiki development team and the AOP-KB Coordination Group have worked together to advance beyond version 2.6, while gathering requirements for AOP-Wiki 3.0. Our vision for AOP-Wiki 3.0 involves a major overhaul to the data infrastructure and user interfaces to radically enhance the computability of AOPs and overall FAIRness of the AOP-KB and Wiki. This poster covers release 2.7 from March 2024 and features in development for releases 2.7.2 and 2.8 (2025).
Release 2.8 will include an AOP collections feature, and a set of UI improvements referred to as knowledge organization system (KOS) features, that will enhance the findability and accessibility of ontology terms used to define AOPs in the AOP-Wiki. The collections feature will support AOP network development and analysis by allowing users to build, save and export collections of AOP elements ready for integration with network visualization tools, like Cytoscape. Both features will offer benefits to human users and serve integral roles for how we take advantage of AI tools and incorporate input from subject matter experts. Visuals demonstrating the design and utility of these features will be included in the poster.
A-116: circhemy: The alchemy of circular RNA ID conversion
Track: BOSC: Bioinformatics Open Source Conference
- Tobias Jakobi, University of Arizona, College of Medicine Phoenix, United States
Presentation Overview: Show
Back-splicing, a subtype of transcript splicing, can lead to circular RNA (circRNA) formation, marked by back-splice junctions (BSJ). Once it became evident that circRNAs can be detected directly from RNA-sequencing data, several computational approaches for the identification of circRNAs from next generation sequencing data have been developed, such as CIRCexplorer2, CIRI2, and circtools (Jakobi et al., 2019). The circRNA community needed a means to centrally store circRNAs identified in different studies, tissues, and organisms and several circRNA databases have been developed, but while these databases provide valuable resources for researchers to explore circRNAs, comparing circRNAs between these databases is hardly possible due to missing standardization of unique circRNA identifiers, yielding to different circRNA IDs for the same circRNA in different databases, making it difficult for researchers to compare between different databases or even find their circRNA of interest.
The circhemy software is implemented in Python3 and only requires minimal dependencies to run. The software uses a local SQLite3 database to perform circRNA ID lookups and conversation efficiently and additionally offers advanced query features similar to SQL queries to search for circRNAs matching user-specified criteria. Circhemy not only allows conversion of over two million circRNAs between 10 databases, but moreover is accessible via a user-friendly web interface, via local command line, and via REST for tool-less integration into existing workflows. Importantly, circhemy also fully supports the recently proposed circRNA standard nomenclature with nearly one million circRNAs already converted into this naming scheme.
A-118: RO-Crate: Capturing FAIR research outputs in bioinformatics and beyond
Track: BOSC: Bioinformatics Open Source Conference
- Eli Chadwick, The University of Manchester, United Kingdom
- Stian Soiland-Reyes, The University of Manchester, United Kingdom
- Phil Reed, The University of Manchester, United Kingdom
- Claus Weiland, Leibniz Institute for Biodiversity and Earth System Research, Germany
- Dag Endresen, University of Oslo, Norway
- Felix Shaw, Earlam Institute, United Kingdom
- Timo Mühlhaus, RPTU Kaiserslautern-Landau, Germany
- Carole Goble, The University of Manchester, United Kingdom
Presentation Overview: Show
RO-Crate is a mechanism for packaging research outputs with structured metadata, providing machine-readability and reproducibility following the FAIR principles. It enables interlinking methods, data, and outputs with the outcomes of a project or a piece of work, even where distributed across repositories.
Researchers can distribute their work as an RO-Crate to ensure their data travels with its metadata, so that key components are correctly tracked, archived, and attributed. Data stewards and infrastructure providers can integrate RO-Crate into the projects and platforms they support, to make it easier for researchers to create and consume RO-Crates without requiring technical expertise.
Community-developed extensions called “profiles” allow the creation of tailored RO-Crates that serve the needs of a particular domain or data format.
Current uses of RO-Crate in bioinformatics include:
∙ Describing and sharing computational workflows registered with WorkflowHub
∙ Creating FAIR exports of workflow executions from workflow engines and biodiversity digital twin simulations
∙ Enabling an appropriate level of credit and attribution, particularly in currently under-recognised roles (eg. sample gathering, processing, sample distribution)
∙ Capturing plant science experiments as Annotated Research Contexts (ARC), complex objects which include workflows, workflow executions, inputs, and results
∙ Defining metadata conventions for biodiversity genomics
This presentation will outline the RO-Crate project and highlight its most prominent applications within bioinformatics, with the aim of increasing awareness and sparking new conversations and collaborations within the BOSC community.
A-120: AutoPeptideML 2: An open source library for democratizing machine learning for peptide bioactivity prediction
Track: BOSC: Bioinformatics Open Source Conference
- Raúl Fernández-Díaz, IBM Research | UCD Conway Institute, Ireland
- Thanh Lam Hoang, IBM Research Dublin, Ireland
- Vanessa Lopez, IBM Research Dublin, Ireland
- Denis Shields, University College Dublin, Ireland
Presentation Overview: Show
Peptides are a rapidly growing drug modality with diverse bioactivities and accessible synthesis, particularly for canonical peptides composed of the 20 standard amino acids. However, enhancing their pharmacological properties often requires chemical modifications, increasing synthesis cost and complexity. Consequently, most existing data and predictive models focus on canonical peptides. To accelerate the development of peptide drugs, there is a need for models that generalize from canonical to non-canonical peptides.
We present AutoPeptideML, an open-source, user-friendly machine learning platform designed to bridge this gap. It empowers experimental scientists to build custom predictive models without specialized computational knowledge, enabling active learning workflows that optimize experimental design and reduce sample requirements. AutoPeptideML introduces key innovations: (1) preprocessing pipelines for harmonizing diverse peptide formats (e.g., sequences, SMILES); (2) automated sampling of negative peptides with matched physicochemical properties; (3) robust test set selection with multiple similarity functions (via the Hestia-GOOD framework); (4) flexible model building with multiple representation and algorithm choices; (5) thorough model evaluation for unseen data at multiple similarity levels; and (6) FAIR-compliant, interpretable outputs to support reuse and sharing. A webserver with GUI enhances accessibility and interoperability.
We validated AutoPeptideML on 18 peptide bioactivity datasets and found that automated negative sampling and rigorous evaluation reduce overestimation of model performance, promoting user trust. A follow-up investigation also highlighted the current limitations in extrapolating from canonical to non-canonical peptides using existing representation methods.
AutoPeptideML is a powerful platform for democratizing machine learning in peptide research, facilitating integration with experimental workflows across academia and industry.
A-122: LiMeTrack: A lightweight biosample management platform for the multicenter SATURN3 consortium
Track: BOSC: Bioinformatics Open Source Conference
- Florian Heyl, German Cancer Research Center (DKFZ), Division of Computational Genomics and Systems Genetics (B260), Germany
- Jonas Gassenschmidt, Institute of Medical Bioinformatics and Systems Medicine (IBSM), Medical Center - University of Freiburg, Germany
- Lukas Heine, Institute for AI in medicine, Cancer Research Center Cologne Essen (CCCE), University Medicine Essen, Essen, Germany, Germany
- Frederik Voigt, Institute of Medical Bioinformatics and Systems Medicine (IBSM), Medical Center - University of Freiburg, Germany
- Jens Kleesiek, Institute for AI in medicine, Cancer Research Center Cologne Essen (CCCE), University Medicine Essen, Essen, Germany, Germany
- Oliver Stegle, German Cancer Research Center (DKFZ), Division of Computational Genomics and Systems Genetics (B260), Germany
- Jens Siveke, Bridge Institute of Experimental Tumor Therapy (BIT), University Hospital Essen & University of Duisburg-Essen Germany, Germany
- Melanie Boerries, Institute of Medical Bioinformatics and Systems Medicine (IBSM), Medical Center - University of Freiburg, Germany
- Roland Schwarz, Institute for Computational Cancer Biology (ICCB), University Hospital and University of Cologne, Germany, Germany
- Laura Godfrey, Institute for Computational Cancer Biology (ICCB), University Hospital and University of Cologne, Germany, Germany
Presentation Overview: Show
Biomedical research projects involving large patient cohorts are increasingly complex, both in terms of data modalities and number of samples. Hence, they require robust data management solutions to foster data integrity, reproducibility and secondary use compliant with the FAIR principles. SATURN3, a German consortium with 17 partner sites investigates intratumoral heterogeneity using patient biosamples. As part of a complex, multicenter workflow, high-level multimodal analyses include bulk, single-cell, and spatial omics and corresponding data analysis. To manage this complexity and to avoid miscommunication, data loss and de-synchronization at different project sites, harmonization in a central infrastructure is essential. Additionally, real-time monitoring of the sample processing status must be accessible to all project partners throughout the project. This use case goes far beyond the capabilities of spreadsheets that are susceptible to security vulnerabilities, versioning mistakes, data loss and type conversion errors. Existing data management tools are often complex to set up or lack the necessary flexibility to be adopted for specific project needs. To address these challenges, we introduce LightMetaTrack (LiMeTrack), a biosample management platform built on the Django-Framework. Key features include customizable and user-friendly forms for data entry and a real-time dashboard for project and sample status overview. LiMeTrack simplifies the creation and export of sample sheets, streamlining subsequent bioinformatics analyses and research workflows. By integrating real-time monitoring with robust sample tracking and data management, LiMeTrack improves research transparency and reproducibility, ensures data integrity and optimizes workflows, making it a powerful solution for biosample management in multicenter biomedical research endeavours.
A-124: Open-Source GPU Acceleration for State-of-the-Art Nanopore Basecalling with Slorado
Track: BOSC: Bioinformatics Open Source Conference
- Bonson Wong, School of Computer Science and Engineering, UNSW Sydney, Australia
- Hasindu Gamaarachchi, School of Computer Science and Engineering, UNSW Sydney, Australia
Presentation Overview: Show
Nanopore sequencing has become a popular technology for genomic research because of its cost-effectiveness and ability to sequence long reads. Nanopore technology offers solutions from portable sequencing devices, such as the MinION designed for in-field applications, to large-scale sequencing devices like the PromethION. A nanopore sequencer generates a time series 'raw signal, ' which is then converted into a nucleobase sequence (A, C, G, T) through basecalling. The basecalling step, however, occupies a narrow scope of hardware that can be performed on. Much of the implementation in the current state-of-the-art Dorado basecaller relies on a closed-source binary package for platform-specific optimisations. Dorado is specifically developed for high-compute Nvidia Graphics Processing Units (GPUs) as their main platform. Basecalling without these optimisations is impractical, and therefore, researchers working in resource-constrained environments will be limited by Dorado's limited hardware compatibility. We aim to open-source these large sections of the codebase to make basecalling technology accessible to researchers and developers. We provide two open-source software packages to the genomics community: 'Openfish' is a library that accelerates nanopore CRF decoding tailored towards nanopore signal processing. Openfish implements decoding in Dorado on the GPU for NVIDIA and AMD hardware. As a framework for testing and benchmarking the entire basecalling pipeline, we have also built the application ‘Slorado’: a lean and open-source basecaller that can be easily compiled for NVIDIA and AMD machines.
A-126: Bridging the gap: advancing aging & dementia research through the open-access AD Knowledge Portal
Track: BOSC: Bioinformatics Open Source Conference
- Jo Scanlan, Sage Bionetworks, United States
- Amelia Kallaher, Sage Bionetworks, United States
- Zoe Leanza, Sage Bionetworks, United States
- Jessica Britton, Sage Bionetworks, United States
- Jaclyn Beck, Sage Bionetworks, United States
- Beatriz Saldana, Sage Bionetworks, United States
- Anthony Pena, Sage Bionetworks, United States
- William Poehlman, Sage Bionetworks, United States
- Victor Baham, Sage Bionetworks, United States
- Trisha Zintel, Sage Bionetworks, United States
- Jesse Wiley, Sage Bionetworks, United States
- Karina Leal, Sage Bionetworks, United States
- Jessica Malenfant, Sage Bionetworks, United States
- Laura Heath, Sage Bionetworks, United Kingdom
- Susheel Varma, Sage Bionetworks, United States
Presentation Overview: Show
The AD Knowledge Portal (adknowledgeportal.org) is an NIA-funded resource developed by Sage Bionetworks to facilitate Alzheimer's Disease research through open data sharing. The secure Synapse platform enables researchers to share data with proper attribution while ensuring compliance with FAIR principles.
The Portal aggregates resources from 14 NIH-funded research programs and 97 aging-related grants, housing approximately 800TB of data from over 11,000 individuals. This multimodal data encompasses genomics, transcriptomics, epigenetics, imaging, proteomics, metabolomics, and behavioural assays from various sources, including brain banks, longitudinal cohorts, cell lines, and animal models. Recent additions include 290 TB of single-cell and nucleus expression data, alongside experimental tools and computational resources.
All content is available under Creative Commons BY 4.0 licenses, with software under open-source licenses such as Apache 2.0. The Portal's code is publicly available on GitHub with comprehensive documentation. The Community Data Contribution Program extends the Portal's scope beyond NIA-funded projects.
Since January 2022, over 6,000 unique users have downloaded 12.57 PB of data, with monthly downloads doubling between 2023-2024. Portal data has been cited in over 1,000 publications since 2019, with more than half of these representing the reuse of secondary data. Integration with platforms like CAVATICA and Terra enhances accessibility. Future developments include interoperability with AD Workbench, NACC, NIAGADS, and LONI, as well as new data types such as spatial transcriptomics and longitudinal data from Alzheimer's disease models.
A-128: VueGen: automating the generation of scientific reports
Track: BOSC: Bioinformatics Open Source Conference
- Sebastian Ayala-Ruano, Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Denmark
- Henry Webel, Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Denmark
- Alberto Santos Delgado, Novo Nordisk Foundation Center for Biosustainability, Denmark
Presentation Overview: Show
The analysis of omics data typically involves multiple bioinformatics tools and methods, each producing distinct output files. However, compiling these results into comprehensive reports often requires additional effort and technical skills. This creates a barrier for non-bioinformaticians, limiting their ability to produce reports from their findings. Moreover, the lack of streamlined reporting workflows impacts reproducibility and transparency, making it difficult to communicate results and track analytical processes.
Here, we present VueGen, an open-source software that addresses the limitations of current reporting tools by automating report generation from bioinformatics outputs, allowing researchers with minimal coding experience to communicate their results effectively. With VueGen, users can produce reports by simply specifying a directory containing output files, such as plots, tables, networks, Markdown text, and HTML components, along with the report format. Supported formats include documents (PDF, HTML, DOCX, ODT), presentations (PPTX, Reveal.js), Jupyter notebooks, and Streamlit web applications. To showcase VueGen’s functionality, we present two case studies and provide detailed documentation to help users generate customized reports.
VueGen was designed with accessibility and community contribution in mind, offering multiple implementation options for users with varying technical expertise. It is available as a Python package, a portable Docker image, and an nf-core module, leveraging established open-source ecosystems to facilitate integration and reproducibility. Furthermore, a cross-platform desktop application for macOS and Windows provides a user-friendly interface for users less familiar with command-line tools. The source code is freely available on https://github.com/Multiomics-Analytics-Group/vuegen. Documentation is provided at https://vuegen.readthedocs.io/.