FLASHTagger: An open-source web application for ion type- and precursor mass-free protein identification in top-down mass spectrometry
Confirmed Presenter: Kyowon Jeong, Applied Bioinformatics, Department for Computer Science, University of Tübingen, Germany
Room: 525
Format: In Person
Moderator(s): Timo Sachsenberg
Authors List: Show
- Kyowon Jeong, Applied Bioinformatics, Department for Computer Science, University of Tübingen, Germany
- Wonhyeuk Jung, Department of Cell Biology, Yale School of Medicine, United States
- Tom Müller, Applied Bioinformatics, Department for Computer Science, University of Tübingen, Germany
- Jaywon Lee, Department of Cell Biology, Yale School of Medicine, United States
- Aniruddha Panda, Department of Cell Biology, Yale School of Medicine, United States
- Jared Shaw, Department of Chemistry, University of Nebraska-Lincoln, United States
- Louise Buur, Bioinformatics Research Group, University of Applied Sciences Upper Austria, Austria
- Viktoria Dorfer, Bioinformatics Research Group, University of Applied Sciences Upper Austria, Austria
- Oliver Kohlbacher, Applied Bioinformatics, Department for Computer Science, University of Tübingen, Germany
- Kallol Gupta, Department of Cell Biology, Yale School of Medicine, United States
Presentation Overview: Show
The growing capacity to detect proteins and protein complexes in MS pose computational challenges in identifying them through Top-DownMS (TDMS). While alternative fragmentation methods such as ECD and UVPD open up multiple fragmentation pathways increasing sequence coverage, they also complicate the interpretation of fragment spectra. Recently developed protocols like complex-down MS often isolate protein complexes at once yielding multiplexed fragment spectra. Together with complex signal structure of TDMS spectra and frequent errors in deconvolution, they present challenges in correct precursor ion interpretation.
Addressing these issues, we present FLASHTagger, a high-sensitivity protein identification tool for TDMS platforms. Unlike most conventional database searches, FLASHTagger is de novo sequence tag-based and thus runs without specifying fragment ion type or assuming monomeric proteoform precursors. The tags enable rapid protein searches, with a protein-level false discovery rate (FDR) control. Benchmark tests performed with EChcD datasets from monoclonal antibody and E.coli membrane proteins showed that FLASHTagger can reliably identify individual target proteins from MS/MS scans of multimeric complexes. Analysis of the matched tags revealed various ion types, including internal ions, and well known protein modifications.
Current implementation of FLASHTagger focuses on the low complexity datasets, but analysis of complex datasets will be made available in near future as a part of our new proteoform search engine. We anticipate that the precursor independent feature of FLASHTagger would open up the gate toward data independent acquisition in TDP. FLASHTagger is deployed as a part of OpenMS web application FLASHViewer at https://abi-services.cs.uni-tuebingen.de/flashviewer/.
Imputation of cancer proteomics data with a deep model that learns jointly from many datasets
Confirmed Presenter: Lincoln Harris, University of Washington, United States
Room: 525
Format: In Person
Moderator(s): Timo Sachsenberg
Authors List: Show
- Lincoln Harris, University of Washington, United States
- William Noble, University of Washington, United States
Presentation Overview: Show
TMT proteomics suffers from excessive missing values, especially in the large-scale, multi-batch experimental setting. Imputation is an analytical solution to the missingness problem. Many methods exist for TMT proteomics imputation, however, few of them take advantage of deep neural networks, and none of them can learn jointly from multiple datasets. We introduce Lupine, a deep learning-based imputation tool that learns patterns of missingness across many mass spectrometry runs and experiments. We demonstrate that Lupine outperforms the current state-of-the-art and learns meaningful representations of experimental structure and protein physicochemical properties.
We first constructed a joint protein quantifications matrix consisting of mass spectrometry runs from 10 cancer cohorts from the Clinical Proteomics Tumor Atlas Consortium (CPTAC). These data were generated with a common experimental workflow and processing pipeline. We then developed a deep learning model that leverages matrix factorization to learn low-dimensional representations of proteins and mass spectrometry runs. This model, called Lupine, was trained on our joint protein quantifications matrix.
We show that Lupine outperforms DreamAI, an ensemble imputation method that represents the current state-of-the-art for TMT proteomics. For each of the 10 CPTAC cohorts, the mean squared error of Lupine’s imputed values is lower than DreamAI’s. We also show that Lupine learns a latent representation of proteins that captures missingness fraction and other protein physicochemical properties. Lupine increases the number of differentially expressed proteins between CPTAC cohorts and improves clustering accuracy.
In summary, Lupine is the only existing proteomics imputation method that can learn jointly from many datasets.
Proteogenomics analysis of human tissues using pangenomes
Confirmed Presenter: Husen M. Umer, Bioscience Core Laboratory, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia, Saudi Arabia
Room: 525
Format: In Person
Moderator(s): Timo Sachsenberg
Authors List: Show
- Dong Wang, Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, China
- Robbin Bouwmeester, VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium, Belgium
- Aniel Sanchez, Section for Clinical Chemistry, Department of Translational Medicine, Lund University, Skane University Hospital Malmö, Sweden
- Mingze Bai, Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, China, China
- Husen M. Umer, Bioscience Core Laboratory, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia, Saudi Arabia
- Yasset Perez-Riverol, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK, United Kingdom
Presentation Overview: Show
The genomics landscape is evolving with the emergence of pangenomes, challenging the conventional single-reference genome model. The new human pangenome reference provides an extra dimension by incorporating variations observed in different human populations. However, the increasing use of pangenomes in human reference databases poses challenges for proteomics, which currently relies on UniProt canonical/isoform-based reference proteomics. Including more variant information in human proteomes, such as small and long open reading frames and pseudogenes, prompts the development of complex proteogenomics pipelines for analysis and validation. This study explores the advantages of pangenomes, particularly the human reference pangenome, on proteomics, and large-scale proteogenomics studies. We reanalyze two large human tissue datasets using the quantms workflow to identify novel peptides and variant proteins from the pangenome samples. Using three search engines SAGE, COMET, and MSGF+ followed by Percolator we analyzed 91,833,481 MS/MS spectra from more than 30 normal human tissues. We developed a robust deep-learning framework to validate the novel peptides based on DeepLC, MS2PIP and pyspectrumAI. The results yielded 170142 novel peptide spectrum matches, 4991 novel peptide sequences, and 3921 single amino acid variants, corresponding to 2367 genes across five population groups, demonstrating the effectiveness of our proteogenomics approach using the recent pangenome references.
Optimising Thermal Proteome Profiling experimental design with GPMelt
Confirmed Presenter: Cecile Le Sueur, EMBL Heidelberg, Germany
Room: 525
Format: In Person
Moderator(s): Timo Sachsenberg
Authors List: Show
- Cecile Le Sueur, EMBL Heidelberg, Germany
- Pablo Rivera-Mejías, EMBL Heidelberg, Germany
- Isabelle Becher, EMBL Heidelberg, Germany
- Mikhail Savitski, EMBL Heidelberg, Germany
- Magnus Rattray, The University of Manchester, United Kingdom
Presentation Overview: Show
Thermal proteome profiling (TPP) combines cellular thermal shift assay and quantitative mass spectrometry to explore protein interactions and states proteome-wide. Temperature range TPP (TPP-TR) datasets consist of protein melting curves, quantifying non-denatured proteins across a temperature gradient. Thermal stability changes are statistically evaluated by comparing melting curves between conditions, like drug treatment versus control. While powerful and versatile in uncovering new biology, TPP's experimental cost, including consumables and mass spectrometry measurement time, limits its accessibility to well-funded researchers. Moreover, the sample requirements hinder its application to rare samples. We recently introduced GPMelt, a statistical framework for TPP-TR datasets based on hierarchical Gaussian process models, robustly integrating replicates information and handling any melting curve shape. Here, we propose an enhanced GPMelt model together with an optimized low-cost and low-sample TPP-TR experimental design. By halving the consumables and mass spectrometry measurement time, this work broadens TPP-TR accessibility to a wider scientific community and opens its application to precious samples. Additionally, it establishes a smooth connection between TPP-TR and 2D-TPP protocols and analyses. 2D-TPP datasets compare protein thermal stability across a larger number of conditions, using a distinct sample multiplexing strategy that hinders melting curves reconstruction. Adapting this multiplexing strategy in combination with the enhanced GPMelt model strengthens 2D-TPP discoveries by retaining melting curves modeling, and hence key biological information. Collectively, extensions to the GPMelt model, combined with optimised experimental designs for both TPP-TR and 2D-TPP, can significantly increase TPP’s effectiveness and dissemination among scientists, paving the way for groundbreaking biological discoveries.