Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in BST
Tuesday, July 22nd
11:20-12:00
Invited Presentation: The elephant in the (metabolomics) room: computational approaches for small molecule structural annotation from single biological study to data repositories
Confirmed Presenter: Warwick Dunn

Format: In person

Moderator(s): Wout Bittremieux


Authors List: Show

  • Warwick Dunn

Presentation Overview: Show

Multiple omics research strategies (metabolomics, lipidomics, exposomics) focus on the reporting of small molecules in biological systems in relation to human diseases, biotechnology, microbiomes and environmental impact as a few examples. Many small molecule omics studies apply liquid chromatography-mass spectrometry (LC-MS) to simultaneously collect data reporting on hundreds to low thousands of small molecules. The availability of these studies in data repositories (Metabolights, Metabolomics Workbench, GNPS) is rapidly increasing and provides the opportunity for large-scale data reuse.

LC-MS data contain thousands of signals with one small molecule being detected as multiple complementary signals. These signals are applied to structurally annotate small molecules, a required process to derive biological knowledge. There have been significant advances in both the volumes of data available in metabolomics data repositories and in the development of computational tools to structurally annotate and biologically interpret. However, the complexity of the data collected along with the inability to sequence all common metabolomes and sparsity of metabolite coverage in libraries applied provide significant hurdles.

In this presentation I will (1) describe the different types of data collected in LC-MS small molecule studies, (2) review the current strategies applied to these data to convert from signal to chemical structure and the hurdles to overcome computationally and (3) discuss moving from single lab/single study to the integration of studies available across multiple data repositories. By moving from small-scale to big data through reuse of data publicly available we can rapidly advance our biological understanding across species and geography.

12:00-12:20
MetaboT: AI-based agent for natural language-based interaction with metabolomics knowledge graphs
Format: In person

Moderator(s): Wout Bittremieux


Authors List: Show

  • Madina Bekbergenova, University of Cote d'Azur, University of Antwerp, France
  • Lucas Pradi, University of Cote d'Azur, France
  • Emma Tysinger, MIT, Cambridge, MA, USA, United States
  • Franck Michel, Université Côte d’Azur, CNRS, Inria, I3S, France., France
  • Florence Mehl, Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland, Switzerland
  • Marco Pagni, Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland, Switzerland
  • Wout Bittremieux, Department of Computer Science, University of Antwerp, Antwerp, Belgium, Belgium
  • Jean-Luc Wolfender, Institute of Pharmaceutical Sciences of Western Switzerland, University of Geneva, Switzerland
  • Fabien Gandon, INRIA, Université Côte d’Azur, CNRS, I3S, France., France
  • Louis-Félix Nothias, Université Côte d'Azur, CNRS, Interdisciplinary Institute for Artificial Intelligence (3iA) Côte d'Azur, France

Presentation Overview: Show

Long abstract is submitted in pdf

12:20-12:40
Reference data-driven analysis for joint metabolome–microbiome readout from untargeted mass spectrometry data
Format: In person

Moderator(s): Wout Bittremieux


Authors List: Show

  • Alejandro Mendoza Cantu, University of Antwerp, Belgium
  • Julia Gauglitz, University of Antwerp, Belgium
  • Wout Bitrremieux, University of Antwerp, Belgium

Presentation Overview: Show

Untargeted tandem mass spectrometry (MS/MS) metabolomics enables broad chemical profiling of complex biological samples but is limited by low annotation rates and interpretability challenges. Reference data-driven (RDD) analysis addresses these limitations by leveraging metadata-rich MS/MS reference datasets to contextualize untargeted experiments.
RDD improves spectrum interpretation by matching experimental MS/MS spectra to comprehensive reference libraries enriched with hierarchical metadata. The workflow begins with raw MS/MS files from both study samples and reference materials (e.g., foods, microbes). Reference samples are annotated with structured ontologies (e.g., plant → fruit → pome → apple), allowing multi-level biological interpretation.
Both datasets are analyzed using GNPS molecular networking, which clusters spectra based on similarity. Clusters shared between study and reference samples are treated as spectral matches. A spectral count table is constructed by aggregating shared clusters across samples and reference files. These counts can be aggregated across different ontology levels to support flexible downstream analysis.
RDD was first demonstrated with the Global FoodOmics Project dataset, enabling dietary pattern reconstruction and increasing spectral usage by 5.1 ± 3.3-fold. We now expand this approach to microbiome analysis using MS/MS data from 488 American Gut Project participants, matched to a curated subset of the microbeMASST reference database. This enabled microbial profiling from metabolic data, capturing key gut taxonomic trends.
The full framework is available as an open-source Python library and web application, enabling custom dataset analysis without coding. RDD offers a generalizable strategy to enhance annotation and biological insight from untargeted MS/MS data.

12:40-13:00
Combined MS and MS/MS deconvolution of SWATH DIA data with the DIA-NMF software for comprehensive annotation in metabolomics
Confirmed Presenter: Diana Karaki, Dèpartement Mèdicaments et Technologies pour la Santè (DMTS), MetaboHUB, Universitè Paris-Saclay, CEA, INRAE, France

Format: In person

Moderator(s): Wout Bittremieux


Authors List: Show

  • Diana Karaki, Dèpartement Mèdicaments et Technologies pour la Santè (DMTS), MetaboHUB, Universitè Paris-Saclay, CEA, INRAE, France
  • Annelaure Damont, Dèpartement Mèdicaments et Technologies pour la Santè (DMTS), MetaboHUB, Universitè Paris-Saclay, CEA, INRAE, France
  • Antoine Souloumiac, CEA-List, Universitè Paris-Saclay, 91120 Palaiseau, France, France
  • Francois Fenaille, Dèpartement Mèdicaments et Technologies pour la Santè (DMTS), MetaboHUB, Universitè Paris-Saclay, CEA, INRAE, France
  • Etienne Thevenot, Dèpartement Mèdicaments et Technologies pour la Santè (DMTS), MetaboHUB, Universitè Paris-Saclay, CEA, INRAE, France
  • Sylvain Dechaumet, Dèpartement Mèdicaments et Technologies pour la Santè (DMTS), MetaboHUB, Universitè Paris-Saclay, CEA, INRAE, France

Presentation Overview: Show

Data-independent acquisition (DIA), particularly Sequential Window Acquisition of All Theoretical Mass Spectra (SWATH-MS), is gaining momentum in untargeted metabolomics due to its ability to fragment all detected ions within large consecutive isolation windows in a single run. The main challenge lies in processing the resulting hybrid fragmentation data and extracting pure MS/MS spectra based on the similarity of retention time profiles from precursors and their fragment ions. We recently demonstrated the value of a Non-Negative Matrix Factorization (NMF) approach for DIA deconvolution, compared to existing peak modeling methods such as MS-DIAL and DecoMetDIA.
Here, we extended our strategy to simultaneous deconvolution of MS and MS/MS DIA data. This is not only more rigorous—since fragment ions of distinct ion species from the same molecule often share retention time profiles—but also more efficient, as MS1 pure spectra are now provided. Second, we redesigned the deconvolution strategy to extract all pure components from each retention time window in a single step, reducing redundancy and decreasing computation time. Post-processing quality filters were also included to discard weak or redundant components by analyzing their contribution to MS1 signals.
In SWATH-DIA mode, we applied the DIA-NMF software to human plasma samples spiked with 47 chemical compounds at eight known concentrations (0–10 ng/mL). DIA-NMF identified more spiked compounds than MS-DIAL and DecoMetDIA at all concentrations. It also achieved higher reverse dot-product scores, indicating a better grouping of relevant fragments. These results highlight the value of the DIA-NMF method and software for integrated metabolomics workflows.

Towards mzTab-M 2.1 - Evolving the HUPO-PSI standard format for reporting of small molecule mass spectrometry results
Confirmed Presenter: Nils Hoffmann, IBG-5, Forschungszentrum Jülich, Jülich, Germany, Germany

Format: In person

Moderator(s): Wout Bittremieux


Authors List: Show

  • Nils Hoffmann, IBG-5, Forschungszentrum Jülich, Jülich, Germany, Germany
  • Bo Burla, Singapore Lipidomics Incubator, Life Sciences Institute, National University of Singapore, Singapore
  • Yasin El Abiead, Skaggs School of Pharmacy and Pharmaceutical Sciences, UCSD, San Diego, USA, United States
  • Janik Kokot, Institute of Human Genetics, Medical University of Innsbruck, Innsbruck, Austria, Austria
  • Philippine Louail, Institute for Biomedicine, Eurac Research, Bolzano, Italy, Italy
  • Steffen Neumann, Leibniz Institute of Plant Biochemistry, Halle, Germany, Germany
  • Kozo Nishida, RIKEN Center for Biosystems Dynamics Research, Kobe, Japan, Japan
  • Thomas Payne, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, United Kingdom, United Kingdom
  • Johannes Rainer, Institute for Biomedicine, Eurac Research, Bolzano, Italy., Italy
  • Juan Antonio Vizcaíno, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, United Kingdom, United Kingdom
  • Ozgur Yurekten, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, United Kingdom., United Kingdom

Presentation Overview: Show

Mass spectrometry (MS) is central to modern large-scale metabolomics, but a lack of data format standardization for intermediate and final MS data analysis results still limits data sharing, database deposition, and reanalysis. To address this, the Human Proteome Organization’s Proteomics Standards Initiative (HUPO-PSI) and the Metabolomics Standards Initiative (MSI) originally developed mzTab-M 2.0.0 (published in 2019) as an open standard for reporting MS-based metabolomics data.

mzTab-M uses a simple, tab-separated text format designed for both human readability and computational processing, based on a JSON schema and complemented by controlled-vocabulary-defined metadata. The format is detailed in a specification document, while a reference implementation and validator ensure data quality and consistency.
The format comprehensively represents metabolomics results, including final quantification values and the identification evidence linking these values back to the raw MS features. Importantly, mzTab-M explicitly accommodates ambiguity in molecule identification, allowing researchers to clearly communicate levels of confidence. mzTab-M aims to be flexible by supporting CV-term controlled optional columns, thereby adapting to different experimental setups, applications and workflows.

Initial implementations of mzTab-M in software like xcms, mzmine, OpenMS and for submission to repositories like MetaboLights and GNPS require significant updates and extensions to the format, its documentation and implementations. Thus, in mzTab-M 2.1.0 we want to support those, as well as new MS-technologies and we want to provide improved integration with other HUPO-PSI formats for sample metadata, QC results and cross-links to mass spectra in public databases, and implement more efficient and faster serialization and deserialization options.

14:00-14:20
Enhanced spectrum clustering for interpretable molecular networking
Format: In person

Moderator(s): Wout Bittremieux


Authors List: Show

  • Janne Heirman, University of Antwerp, Belgium
  • Yasin El Abiead, University of California San Diego, United States
  • Wout Bittremieux, University of Antwerp, Belgium

Presentation Overview: Show

Small molecule tandem mass spectrometry (MS/MS) produces vast datasets, making interpretation challenging. Molecular networking aids analysis by linking spectra based on similarity, identifying groups of similar compounds. Clustering is a crucial preprocessing step to reduce data redundancy and enhance network interpretability. Here we introduce enhanced clustering approaches integrated into Falcon, an efficient clustering tool available via GNPS.

Low-quality spectra were removed, and noise peaks filtered before clustering spectra using the cosine similarity. Consensus spectra were generated with a novel noise-rejection algorithm. Clustering performance was evaluated using cluster completeness, the proportion of clustered spectra, and incorrect clustering rates. Clustering algorithms were evaluated using 502,993 MS/MS spectra acquired on a Thermo Astral mass spectrometer, with ground truth labels derived from MZmine-based feature detection.

We evaluated hierarchical clustering (single, average, and complete linkage) alongside density-based clustering with DBSCAN, which was previously used in Falcon. At a maximum threshold of 14% incorrectly clustered spectra, hierarchical clustering with complete linkage significantly outperforms DBSCAN, clustering 89.7% of spectra (11.8% incorrect), while DBSCAN clusters only 39.8% (12.9% incorrect). While DBSCAN yielded a higher completeness score (0.856), complete linkage maintained strong completeness (0.826) with better accuracy. Despite the overall high incorrect clustering rates—partly due to strict ground truth criteria—hierarchical clustering offers a substantial improvement in clustering performance.

By reducing redundancy and enhancing the interpretability of molecular networks, optimized clustering strategies like hierarchical clustering accelerate metabolite annotation and drive discovery in metabolomics research.

14:20-14:40
Deep Learning for Small Molecule Analog Discovery From Untargeted Mass Spectrometry
Format: In person

Moderator(s): Wout Bittremieux


Authors List: Show

  • Juan Sebastian Piedrahita Giraldo, University of Antwerp, Belgium
  • Manuela Da Silva, Janssen Pharmaceutica, Belgium
  • Reza Shahneh, University of California Riverside, United States
  • Mingxun Wang, University of California Riverside, United States
  • Thomas De Vijlder, Janssen Pharmaceutica, Belgium
  • Kris Laukens, University of Antwerp, Belgium
  • Wout Bittremieux, University of Antwerp, Belgium

Presentation Overview: Show

Tandem mass spectrometry (MS/MS) is a key tool for analyzing the small molecule composition of biological samples. Untargeted metabolomics data analysis entails matching the experimental MS/MS spectra to spectral libraries based on spectrum similarity, typically using the cosine similarity. However, such heuristic techniques often fail to capture the structural similarity between molecules. With the purpose of discovering novel structural analogs, we developed a neural network model that has learned the relationship between MS/MS data and chemical structures.
Our approach, called SIMBA, consists of a twin transformer encoder that receives pairs of MS/MS spectra and predicts the structural similarity of their molecules. The model was trained in a multi-task setting to predict both the “substructure edit distance,” a novel domain-inspired metric that reflects the number of modifications between molecules, and the maximum common edge subgraph (MCES), thus learning both the number of structural differences and their atomic cardinality. Harnessing the learning capabilities of transformers, the model was trained on 200 million spectrum pairs from the NIST20 and MassSpecGym spectral libraries.
SIMBA significantly surpasses the state of the art for analog discovery, predicting the MCES with a higher Spearman correlation (r=0.93) compared to modified cosine (r=0.46). Likewise, SIMBA identifies analogs with high performance on the CASMI2022 dataset: SIMBA is able to retrieve analogs with a lower normalized MCES distance of 0.17 compared to traditional modified cosine (0.23) as well as deep learning approaches such as Spec2Vec (0.19) and MS2DeepScore (0.20).

14:40-15:00
Mass Spectrometry and Machine Learning Reveal Stool-Based Multi-Signatures for Diagnosis and Longitudinal Monitoring of Inflammatory Bowel Disease
Format: In person

Moderator(s): Wout Bittremieux


Authors List: Show

  • Elmira Shajari, Department of Immunology and Cell Biology, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Canada, Canada
  • David Gagné, Department of Immunology and Cell Biology, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Canada, Canada
  • Patricia Roy, Department of Immunology and Cell Biology, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Canada, Canada
  • Mandy Malick, Department of Immunology and Cell Biology, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Canada, Canada
  • Maxime Delisle, Department of Medicine, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Canada, Canada
  • François-Michel Boisvert, Department of Immunology and Cell Biology, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Canada, Canada
  • Marie Brunet, Department of Pediatrics, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Canada, Canada
  • Jean-François Beaulieu, Department of Immunology and Cell Biology, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Canada, Canada

Presentation Overview: Show

Background:
Monitoring inflammation activity in Inflammatory Bowel Disease (IBD) is essential for guiding treatment and preventing long-term complications. While fecal calprotectin is a common non-invasive biomarker, its diagnostic reliability declines significantly within the “gray zone” (50–300 µg/g), limiting its clinical utility. To address this challenge, we developed a stool-based proteomic biomarker panel for precise classification of inflammation activity in this diagnostically ambiguous range.
Methods:
We analyzed 155 stool samples from IBD patients for model training and reserved 53 samples for blind testing. The proteomic profiling was performed using SWATH-MS, a data-independent acquisition (DIA) mass spectrometry technique known for its reproducibility and depth. Protein- and peptide-level datasets were preprocessed separately. Feature selection was conducted using Boruta, LASSO, RF, and RFE. Features identified consistently across both data levels were prioritized. Six machine learning models (SVM, Random Forest, Naïve-Bayes, KNN, GLMnet, and XGBoost) were evaluated with 10-fold cross-validation, focusing on gray zone performance. Model interpretability was assessed using SHAP values and GO enrichment analysis explored the biological relevance of selected features.
Results:
We identified 19 protein-level and 14 peptide-level discriminatory features, with five robust overlapping markers selected for final modeling. The Support Vector Machine (SVM) model achieved the highest performance: 0.96 precision and 0.88 recall during training, and 1.00 precision and 0.86 recall in blind testing. SHAP analysis confirmed biomarker contribution, and enriched GO terms were linked to immune and inflammatory pathways.
Conclusion:
This proteomic signature offers a promising non-invasive tool for resolving diagnostic uncertainty in IBD monitoring within the gray zone.

Rapid Deployment of Interactive and Visual Web Applications for Computational Mass Spectrometry
Confirmed Presenter: Tom David Müller, University of Tübingen, Germany

Format: In person

Moderator(s): Wout Bittremieux


Authors List: Show

  • Tom David Müller, University of Tübingen, Germany
  • Arslan Siraj, University of Tübingen, Germany
  • Justin Cyril Sing, University of Toronto, Canada
  • Joshua Charkow, University of Toronto, Canada
  • Axel Walter, University of Tübingen, Germany
  • Samuel Wein, University of Tübingen, Germany
  • Ayesha Feroz, University of Tübingen, Germany
  • Matteo Pilz, University of Tübingen, Germany
  • Kyowon Jeong, University of Tübingen, Germany
  • Mingxuan Gao, University of Toronto, Canada
  • Wout Bittremieux, University of Antwerp, Belgium
  • Hannes Luc Röst, University of Toronto, Canada
  • Oliver Kohlbacher, University of Tübingen, Germany
  • Timo Sachsenberg, University of Tübingen, Germany

Presentation Overview: Show

Mass Spectrometry (MS) is a highly versatile bioanalytical technique with a myriad of experimental approaches, instrumentation, and computational tools. If a desired analysis is not already supported by existing desktop applications, bioinformaticians often integrate scripts and tools to produce custom analyses and visualizations. While effective, this approach requires specialized expertise and limits accessibility for non-technical users. Traditional workflow systems can standardize and scale such analyses, but often lack user-friendly interfaces, visualization capabilities, and support for interactive decision-making during execution.
To address these limitations, we present two freely available open-source solutions designed to streamline MS workflow development and deployment. pyOpenMS-viz enables rapid creation of publication-ready visualizations, such as spectra, chromatograms, and peak maps with a single line of code, directly from pandas DataFrames, a common data structure in Python-based MS tools. Straightforward use cases that do not require complex development, are well supported by Jupyter notebooks enhanced with pyOpenMS-viz.
For broader accessibility and reuse, the OpenMS WebApp template offers a lightweight framework to develop interactive web applications with minimal effort. These apps guide users through uploading files, setting parameters, executing workflows involving arbitrary scripts and command line tools, and visualizing results interactively. Visualizations from pyOpenMS-viz and other libraries are fully supported. Applications can be deployed online allowing users to share results (e.g. with collaborators) via website URLs or offline via automatically generated windows executables.
Together, pyOpenMS-viz and the OpenMS WebApp template empower rapid prototyping, streamline deployment, and make MS workflows accessible to a wider scientific audience, promoting collaboration and reproducibility.

15:00-15:20
PepSi-Print: Unraveling Protein Fingerprints through Pairwise Intensity Ratios with a Peptide Siamese Network
Confirmed Presenter: Zixuan Xiao, Computational Mass Spectrometry, School of Life Sciences, Technical University of Munich, Freising 85354, Germany, Germany

Format: In person

Moderator(s): Wout Bittremieux


Authors List: Show

  • Zixuan Xiao, Computational Mass Spectrometry, School of Life Sciences, Technical University of Munich, Freising 85354, Germany, Germany
  • Mathias Wilhelm, Computational Mass Spectrometry, School of Life Sciences, Technical University of Munich, Freising 85354, Germany, Germany

Presentation Overview: Show

In MS-based bottom-up proteomics, proteins are enzymatically digested into peptides, and identified and quantified to infer proteins. While MS2 spectra provide peptide sequence information, the stochastic nature of peptide sampling and fragmentation introduces ambiguity in protein inference. Inspired by fragment ion intensity patterns (FIIP) used in MS2-based identification, we propose using peptide ion intensity patterns (PIIP) in MS1-based identification. Combined with isotope pattern, retention time, and ion mobility, PIIP defines protein fingerprints facilitating direct protein identification. We present a deep learning approach to model PIIP and integrate it into an MS1-based workflow.
In this pursuit, we leveraged a large bacterial dataset comprising 343 raw files. Exploratory analysis revealed highly consistent fingerprint patterns across measurements (median Pearson’s correlation = 0.9), suggesting a stable signal suitable for learning. Based on this, we developed PepSi-Print, a Siamese network architecture with Long Short-Term Memory arms and a regression head that predicts pairwise logarithmic intensity ratios between peptides from the same protein, thus avoiding the need for absolute intensity ground truths.
PepSi-Print achieves a median absolute error of 0.85 on pairwise predictions and 0.73 after aggregation at the sequence level. When applied to unseen raw files, predicted fingerprints correlate with observed ones with a median Pearson’s r of 0.75. Integrated into the DirectMS1 workflow, PepSi-Print improves peptide-feature-match (PFM) discrimination, reducing uncontrolled peptide-level false discovery rates (FDR) from >30% to as low as 2–3%. These learned fingerprints offer a novel protein-specific signal to enhance identification, enable isoform resolution, and improve quantification in MS1-only proteomics.

15:20-15:40
Identification of novel proteins by integrating ribosome profiling data into a transcriptomic language model for deeper mass spectrometry analyses
Format: In person

Moderator(s): Wout Bittremieux


Authors List: Show

  • Nicolas Provencher, Département de biochimie et de génomique fonctionnelle, Université de Sherbrooke, Canada
  • Sebastien Leblanc, Département de biochimie et de génomique fonctionnelle, Université de Sherbrooke, Canada
  • Jean-Francois Jacques, Département de biochimie et de génomique fonctionnelle, Université de Sherbrooke, Canada
  • Xavier Roucou, Département de biochimie et de génomique fonctionnelle, Université de Sherbrooke, Canada

Presentation Overview: Show

Background – The human transcriptome contains millions of open reading frames (ORFeome) potentially coding for currently unknown or unannotated proteins. Typical mass spectrometry (MS)-based proteomics pipelines use protein databases to identify known proteins in a biological sample. The detection of proteins encoded in the human ORFeome would require a customized database including millions of proteins. However, proteomics analyses cannot be performed with millions of proteins predicted from the human ORFeome because of unacceptable high false detection rates caused by large protein databases.

Goal – Identify the functional human ORFeome by excluding random ORFs to get a deeper detection of the proteome by detecting unannotated proteins in addition to previously annotated proteins.

Method – 78 Ribosome profiling (Ribo-seq) studies from 47 unique tissue or cell lines samples were reanalysed to curate a set of ORFs showing ribosomal activity. This set was added to the training set of TIS transformer, a transcriptomic language model used to predict functional ORF across the human transcriptome.

Results – The retraining and inference of the TIS transformer model allowed us to obtain a database containing 210 000 new unique protein sequences. In a preliminary experiment, our reanalysis of the ‘Deep HeLa proteome’ confirmed the expression of previously undetected proteins.

Conclusion – Combining Ribo-seq with a transcriptomic language model can sort out relevant ORFs for the discovery of unannotated proteins. The reanalysis of a bigger set of MS studies will be able to identify proteins with a high potential to be biologically relevant.

Zero-shot retention time prediction for unseen post-translational modifications with molecular structure encodings
Confirmed Presenter: Ceder Dens, Adrem Data Lab, Department of Computer Science, University of Antwerp, Belgium

Format: In person

Moderator(s): Wout Bittremieux


Authors List: Show

  • Ceder Dens, Adrem Data Lab, Department of Computer Science, University of Antwerp, Belgium
  • Darien Yeung, Manitoba Centre for Proteomics and Systems Biology, University of Manitoba, Canada
  • Oleg Krokhin, Manitoba Centre for Proteomics and Systems Biology, University of Manitoba, Canada
  • Kris Laukens, Adrem Data Lab, Department of Computer Science, University of Antwerp, Belgium
  • Wout Bittremieux, Adrem Data Lab, Department of Computer Science, University of Antwerp, Belgium

Presentation Overview: Show

Identifying proteoforms with diverse post-translational modifications (PTMs) remains challenging in mass spectrometry-based proteomics. PTMs regulate protein activity and interactions and impact stability and localization. Limited knowledge of PTMs and their impact on liquid chromatography (LC) behavior hinders peptide identification, particularly for modified peptides.
We introduce MoSTERT, a transformer-based model for retention time (RT) prediction of peptides with any PTM. MoSTERT encodes amino acids and their PTMs as a molecular structure, processed by a molecule encoder to generate residue-specific embeddings. A transformer then predicts RTs, even for peptides with unseen PTMs.
We enhance accuracy by introducing a two-step model, MoSTERT-2S. First, a regular transformer encoder predicts the RT of the unmodified sequence. Then, MoSTERT predicts the RT shift induced by the modification. This strategy leverages the high prediction accuracy for unmodified peptides and the superior input representation for modified peptides.
Trained on a dataset of 1.3M unmodified and 913K modified peptides (with 9 unique PTMs), MoSTERT was tested on external datasets, including ProteomeTools, with 70K peptides containing 16 unseen PTMs. Compared to DeepLC (MAE: 24.07 ± 13.44), MoSTERT-2S significantly improves RT prediction (MAE: 12.83 ± 11.83), setting a new state-of-the-art for peptides with novel PTMs.

15:40-16:00
Open modification proteogenomics fosters reproducible detection of non-canonical proteins
Format: In person

Moderator(s): Wout Bittremieux


Authors List: Show

  • Valeriia Vasylieva, Pediatrics Department, University of Sherbrooke, Canada
  • Enrico Massignani, Biomolecular Medicine Department, Ghent University, Belgium
  • Francis Bourassa, Pediatrics Department, University of Sherbrooke, Canada
  • Tine Claeys, Biomolecular Medicine Department, Ghent University, Belgium
  • Lennart Martens, Biomolecular Medicine Department, Ghent University, Belgium
  • Marie A. Brunet, Pediatrics Department, University of Sherbrooke, Canada

Presentation Overview: Show

MS-based proteomics enables the identification of thousands of proteins within a single sample. Analysis of MS/MS spectra involves database search engines, which match experimental spectra to theoretical spectra generated from in silico digestion of a reference proteome. A key limitation of this method lies in its dependence on the size of the search space. As it grows, the false discovery rate (FDR) can increase, and the overall identification rate decline.
Proteogenomic databases containing non-canonical proteins are notoriously large and flawed by FDR inflation. Unrestricted searches to consider all possible modifications further expands the search space. Ionbot is a fast, semi-supervised machine learning-based search engine. It handles large search spaces efficiently through a data-driven sequence tag-based approach. Here, we leveraged Ionbot to enhance the identification of non-canonical proteins through unrestricted searches.
We showed that the identification rate is increased by 17% under a controlled FDR when using open modification search with ionbot. Open search with ionbot increased the detection of non-canonical proteins with 51% being supported by more than 1 PSM, compared to a mere 5% in a standard TPP search. Similarly, 40% of non-canonical proteins were detected in at least 2 samples with ionbot, compared to only 5% with TPP. 86% of non-canonical proteins were identified with at least one modified peptide, and 78% of peptides unique to non-canonical proteins were identified only in their modified form.
Our study highlights the importance of open searches for robust and reliable detection of non-canonical proteins.

PCI-DB: A novel primary tissue immunopeptidome database to guide next-generation peptide-based immunotherapy development
Format: In person

Moderator(s): Wout Bittremieux


Authors List: Show

  • Steffen Lemke, Department of Peptide-based Immunotherapy, Institute of Immunology, University and University Hospital Tübingen, Germany
  • Marissa L. Dubbelaar, Department of Peptide-based Immunotherapy, Institute of Immunology, University and University Hospital Tübingen, Germany
  • Patrick Zimmermann, Department of Peptide-based Immunotherapy, Institute of Immunology, University and University Hospital Tübingen, Germany
  • Jens Bauer, Department of Peptide-based Immunotherapy, Institute of Immunology, University and University Hospital Tübingen, Germany
  • Annika Nelde, Department of Peptide-based Immunotherapy, Institute of Immunology, University and University Hospital Tübingen, Germany
  • Naomi Hoenisch-Gravel, Department of Peptide-based Immunotherapy, Institute of Immunology, University and University Hospital Tübingen, Germany
  • Jonas Scheid, Department of Peptide-based Immunotherapy, Institute of Immunology, University and University Hospital Tübingen, Germany
  • Marcel Wacker, Department of Peptide-based Immunotherapy, Institute of Immunology, University and University Hospital Tübingen, Germany
  • Susanne Jung, Department of Peptide-based Immunotherapy, Institute of Immunology, University and University Hospital Tübingen, Germany
  • Anna Dengler, Department of Peptide-based Immunotherapy, Institute of Immunology, University and University Hospital Tübingen, Germany
  • Yacine Maringer, Department of Peptide-based Immunotherapy, Institute of Immunology, University and University Hospital Tübingen, Germany
  • Hans-Georg Rammensee, Institute of Immunology, University of Tübingen, Tübingen, Germany
  • Cécile Gouttefangeas, Institute of Immunology, University of Tübingen, Tübingen, Germany
  • Sven Fillinger, Quantitative Biology Center (QBiC), University of Tübingen, Germany
  • Tatjana Bilich, Department of Peptide-based Immunotherapy, Institute of Immunology, University and University Hospital Tübingen, Germany
  • Jonas S. Heitmann, Department of Peptide-based Immunotherapy, Institute of Immunology, University and University Hospital Tübingen, Germany
  • Sven Nahnsen, Quantitative Biology Center (QBiC), University of Tübingen, Germany
  • Juliane S. Walz, Department of Peptide-based Immunotherapy, Institute of Immunology, University and University Hospital Tübingen, Germany

Presentation Overview: Show

Various cancer immunotherapies rely on the T cell-mediated recognition of peptide antigens presented on human leukocyte antigens (HLA). However, the identification and selection of naturally presented peptide targets for the development of personalized as well as off-the-shelf immunotherapy approaches remains challenging. Here, we introduce the open-access Peptides for Cancer Immunotherapy Database (PCI-DB, https://pci-db.org/), a comprehensive resource of immunopeptidome data originating from various malignant and benign primary tissues that provides the research community with a convenient tool to facilitate the identification of peptide targets for immunotherapy development.
The PCI-DB includes > 6.6 million HLA class I and > 3.4 million HLA class II peptides from over 40 tissue types and cancer entities. First application of the database provided insights into the presentation of cancer-testis antigens across malignant and benign tissues, enabling the identification and characterization of pan-tumor and entity-specific tumor-associated antigens as well as naturally presented neoepitopes from frequent cancer mutations.
Further, we used the PCI-DB to design personalized peptide vaccines for two patients suffering from metastatic cancer. In a retrospective analysis, PCI-DB enabled to validate the composition of a multi-peptide vaccine for each patient comprising non-mutated, highly frequent tumor-associated antigens matching the immunopeptidome of the individual patient´s tumor and a neoepitope-based vaccine matching the mutational profile of the cancer patient. Both vaccines induced potent and long-lasting T-cell responses, accompanied by long-term survival of these advanced cancer patients. The PCI-DB is a highly versatile tool to broaden the understanding of cancer-related antigen presentation and, ultimately, supports the development of novel immunotherapies.

16:40-17:00
CellPick: a cell selection toolkit for spatial single cell proteomics
Format: In person

Moderator(s): Wout Bittremieux


Authors List: Show

  • Paolo Pellizzoni, Max Planck Institute of Biochemistry, Germany
  • Lucas Miranda, Max Planck Institute of Biochemistry, Germany
  • Caroline Weiss, Max Planck Institute of Biochemistry, Germany
  • Matthias Mann, Max Planck Institute of Biochemistry, Germany
  • Karsten Borgwardt, Max Planck Institute of Biochemistry, Germany

Presentation Overview: Show

We present CellPick: a computational tool for facilitating the selection of cells for laser microdissection in single cell proteomics applications.

Laser microdissection is a technique that allows the dissection of single cells from tissues via a high-powered laser. A naïve selection of the cells to be cut, such as random selection, is usually effective in contexts with abundant cells. However, it often results in the selection of contiguous shapes when applied to tissue regions with sparse cellular distribution, thereby risking damaging the cells during microdissection. To overcome this, we introduce a custom shape selection technique that employs a combinatorial optimization procedure for the selection of cells, ensuring the selection of non-contiguous shapes, while approximately maximizing coverage within a restricted tissue area.

Often, in spatial proteomics, one seeks a statistical correlation between protein intensities and the positions of the cells at hand. Our tool allows the specification of two points of interest, such as two types of veins, establishing a gradient along a relevant axis. The selected shapes are then automatically endowed with a value indicating how close they are to the two points of interest. This allows to find their position on the gradient of interest, which allows to correlate protein intensity levels and the closeness to the points of interest.

We showcase an application of our tool in single cell spatial proteomics on liver samples.

17:00-17:20
A living proteomics benchmark for comprehensive evaluation of deep learning-based de novo peptide sequencing tools
Format: In person

Moderator(s): Wout Bittremieux


Authors List: Show

  • Marina Pominova, University of Antwerp, Belgium
  • Wout Bittremieux, University of Antwerp, Belgium
  • Charlotte Adams, University of Antwerp, Belgium
  • Ceder Dens, University of Antwerp, Belgium

Presentation Overview: Show

Mass spectrometry-based proteomics is essential for understanding protein composition and function, yet traditional sequence database-based methods face challenges in identifying novel peptides, post-translational modifications, and diverse proteomes. De novo peptide sequencing, which operates independently of sequence databases, offers a powerful approach for uncovering these unknown peptides. However, the lack of consistent evaluation frameworks for the growing array of deep learning-based de novo sequencing tools limits their adoption and effective application. Here, we introduce a comprehensive, community-driven benchmarking resource designed to assess the performance of various de novo sequencing tools across a broad range of experimental conditions and proteomic applications. Our benchmark employs heterogeneous datasets to establish a standardized evaluation framework, providing a transparent, evolving resource accessible through an interactive online dashboard. This benchmark is anticipated to offer key insights into tool performance, aiding researchers in selecting suitable tools and identifying areas for future refinement and development in de novo peptide sequencing.

Mavis: An Ensemble of Methods for Mean-Variance Trend Modeling and Bayesian Decision in Comparative Proteomics
Format: In person

Moderator(s): Wout Bittremieux


Authors List: Show

  • George Popescu, Mississippi State University, United States
  • Philip Berg, Mississippi State University, United States

Presentation Overview: Show

Motivation:
Comparative methods that use dataset-level information such as mean-variance have been highly successful for several types of -omics data. While a large number of software pipelines for the study of mass spectrometry, tools for statistical modeling of dataset-level properties after data quantification are lacking. We address this
gap by introducing Mavis, an ensemble of statistical methods implemented in R for mean-variance trend modeling and Bayesian decision in comparative proteomics.
Results:
Mavis facilitates dataset-specific modeling, particularly emphasizing models that utilize mean-variance (M-V) trend properties. Mavis builds on the recent methodologies to model proteomics M-V trends with gamma
regression. It proposes a new M-V trend clustering method, coined Gamma Cluster Regression (GCR). Mavis implements two imputation strategies: a random forest imputation and a trend-based multiple imputation.
Finally, we present a new ensemble method that makes statistical decisions by aggregating p-values of component methods. We evaluated Mavis across several label-free proteomics benchmark datasets; GCR paired with Baldur consistently delivered the best precision, and weighted limma outperformed limma-trend and t-test on most datasets, while the ensemble gave a robust decision across all data. The pipeline
supports development of other mean-variance trends within the
same comparative proteomics software framework. Finally, Mavis is available as an R (R Core Team, 2021) package on GitHub (https://github.com/PhilipBerg/mavis).

17:20-18:00
Invited Presentation:
Format: In person

Moderator(s): Wout Bittremieux


Authors List: Show

  • TBD