Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

CompMS: Computational Mass Spectrometry

COSI Track Presentations

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
Saturday, July 7th
10:15 AM-10:20 AM
CompMS: Introduction
Room: Columbus CD
  • Oliver Kohlbacher, University of Tübingen, Germany

Presentation Overview: Show

Welcome and introduction to COSI CompMS

10:20 AM-11:00 AM
Expanding our View of the Surfaceome: New Bioinformatic Tools and Technologies for Mapping Glycoproteins from Small Sample Sizes and Human Primary Cells
Room: Columbus CD
  • Rebekah L. Gundry, Medical College of Wisconsin, United States

Presentation Overview: Show

Cell surface proteins, glycoproteins, and glycans play critical roles in maintaining cellular structure and adhesion, and act as gatekeepers controlling how cells send and receive exogenous signals. Therefore, the collection of these molecules (i.e. surfaceome) is a rich source of accessible targets for developing new tools and strategies to identify, study, and manipulate specific cell types of interest, from immunophenotyping to immunotherapy. Classic Cell Surface Capture Technology (CSC) is a chemoproteomic approach that enables selective enrichment and identification of extracellular domains of cell surface N-glycoproteins. Since 2007, CSC has been applied to identify >3000 cell surface N-glycoproteins from >125 human and rodent cell types, providing unique surfaceome views in a cell-type specific manner to reveal new immunophenotyping markers and proteins involved in development and disease. However, while the classic CSC approach is highly specific for cell surface N-glycoproteins, the method requires >80 million cells on average to produce high quality results, precluding its application to rare cell types. In this study, we developed new bioinformatic and technological approaches to address the need for surfaceome analyses of small numbers of cells, including human primary cells. Our new bioinformatic approach combines predictive and empirical evidence to efficiently prioritize the selection of cell type-specific proteins. Our new technological approach implements an automated liquid handling workstation for CSC sample processing, which minimizes human intervention and decreases processing time from 5 days to 50 hours. This new µCSC method successfully identifies >500 cell surface proteins from just 5-10 million cells with >90% specificity. We show the utility of these new bioinformatic and technological developments for discovering surface markers on primary isolated human cardiomyocytes and blood cell types.

11:00 AM-11:20 AM
Proceedings Presentation: SIMPLE: Sparse Interaction Model over Peaks of moLEcules for fast, interpretable metabolite identification from tandem mass spectra
Room: Columbus CD
  • Dai Hai Nguyen, Kyoto University, Japan
  • Canh Hao Nguyen, Kyoto University, Japan
  • Hiroshi Mamitsuka, Kyoto University, Japan

Presentation Overview: Show

Motivation: Recent success in metabolite identification from tandem mass spectra has been led by machine learning, which has two stages: mapping mass spectra to molecular fingerprint vectors and then retrieving candidate molecules from the database.In the first stage, i.e. fingerprint prediction, spectrum peaks are features and considering their interactions would be reasonable for more accurate identification of unknown metabolites. Existing appoaches of fingerprint prediction are based on only individual peaks in the spectra, without explicitly considering the peak interactions. Also the current cutting-edge method is based on kernels, being computationally heavy and making hard to interpret the obtained results.

Results:We propose two learning models that allow to incorporate peak
interactions for fingerprint prediction. First, we extend the state-of-the-art kernel learning method by developing kernels for peak interactions to combine with kernels for peaks through multiple kernel learning (MKL). Second, we formulate a sparse interaction model for metabolite peaks, which we call SIMPLE, being computationally efficient and interpretable for fingerprint prediction. The formulation of SIMPLE is convex and guarantees global optimization, for which we develop an alternating direction method of multipliers (ADMM) algorithm. Experiments using the MassBank dataset show that both models achieved comparative prediction accuracy with the current top-performance kernel method. Furthermore SIMPLE clearly revealed individual peaks and their
interactions which contribute to enhancing the performance of fingerprint prediction.

11:20 AM-11:40 AM
Linear deconvolution for highly sensitive targeted analysis of data-independent acquisition mass spectrometry proteomics
Room: Columbus CD
  • Ryan Peckner, Broad Institute of MIT and Harvard, United States
  • Samuel Myers, Broad Institute of MIT and Harvard, United States
  • Jarrett Egertson, University of Washington, United States
  • Sebastian Vaca, Broad Institute of MIT and Harvard, United States
  • Jennifer Abelin, Broad Institute of MIT and Harvard, United States
  • Michael MacCoss, University of Washington, United States
  • Steven Carr, Broad Institute of MIT and Harvard, United States
  • Jacob Jaffe, Broad Institute of MIT and Harvard, United States

Presentation Overview: Show

Mass spectrometry with data-independent acquisition (DIA) has emerged as a promising method to greatly improve the comprehensiveness and reproducibility of targeted and discovery proteomics, in theory systematically measuring all peptide precursors within a biological sample. Despite the technical maturity of DIA, the analytical challenges involved in discriminating between peptides with similar sequences in convoluted spectra have limited its applicability in important cases, such as the detection of single-nucleotide polymorphisms and alternative site localizations in phosphoproteomics data. We have developed Specter, an algorithm that uses linear algebra to deconvolute DIA mixture spectra directly in terms of a spectral library, circumventing the problems associated with typical fragment correlation-based approaches. I will describe the mathematical basis of Specter, demonstrate its high sensitivity and performance relative to other methods by means of several complex datasets, and show that Specter is able to successfully analyze cases involving highly similar peptides that are typically challenging for DIA analysis methods.

11:40 AM-12:00 PM
Robust iTraq and TMT proteoform-level quantification and statistical analysis through Bayesian modelling
Room: Columbus CD
  • Alexander Phillips, University of Liverpool, United Kingdom
  • Richard Unwin, The University of Manchester, United Kingdom
  • Andrew Jones, University of Liverpool, United Kingdom
  • Simon Maskell, University of Liverpool, United Kingdom
  • Andrew Dowsey, University of Bristol, United Kingdom

Presentation Overview: Show

Introduction: Current methodologies for protein-level statistical analysis in LC-MS proteomics workflows generally do not account for digestion variability and outliers at both peptide and measurement level.

Methods: Running on a HTCondor cluster, our method implements a three-level hierarchical Bayesian model. For each protein a Poisson generalized linear model is constructed to estimate the unknown protein quantification pattern, the deviation of each peptide quantification pattern from the protein pattern, and the deviation of each feature quantification from its peptide quantification pattern. The protein pattern is inferred as the peptide pattern with the most consistent evidence in the data. We have also recently extended this model to realise proteoform-specific quantification, by estimating shared-peptide contributions by their mixture-modelling.

Results: We have performed validation with a spike-in dataset, together with a production clinical study on control vs post-mortem human Alzheimer's brain, illustrating substantial benefits to differential expression testing specificity and the ability to robust differentiate proteoforms for the first time.

Conclusion: We have demonstrated that Bayesian modelling can both robustly improve the quality of fold-change estimates in an LC-MS proteomics experiment, and yield proteoform-level differential quantifications under the substantial biological variability inherent in clinical studies.

12:00 PM-12:40 PM
Software solutions for multi-omic research
Room: Columbus CD
  • Evgenia Shiskova, University of Wisconsin, United States

Presentation Overview: Show

Recent advances in mass spectrometry (MS) technologies have afforded substantial increases in speed of data acquisition and experimental throughput and opened the door to large-scale MS-based profiling studies. Analysis of hundreds or even thousands of samples has become relatively routine, and such rapid generation of increasingly larger datasets constitutes an emerging challenge for data processing and interpretation. Here we describe a web-based tool that is designed for non-programmers and supports on-the-fly hierarchical organization of quantitative MS data, automated statistical analyses, and generation of interactive visualizations. The online platform enables users of diverse backgrounds to rapidly access and explore measurements within a multi-omic dataset, assisting with data interpretation and potentially empowering biological insight.
The second part of the talk will focus on LipiDex – our open-source software suite for LC-MS/MS lipidomics data analysis. This tool performs accurate quantitation of co-isolated and co-eluting isobaric lipids and features streamlined data filtration and flexible in silico libraries. To ensure LipiDex covers the expanding portfolio of novel lipid classes and fragmentation techniques, we recently developed a new data-driven approach to automatically generate tailored lipid spectral libraries, called Library Forge. The algorithm leverages inherent structural modularity of lipids to learn their complex fragmentation behavior and generate high-quality libraries with minimal user input. Library Forge and the other tools in the LipiDex data processing environment enable easy integration of robust lipidomic analysis into any lab’s multi-omic toolkit.

12:40 PM-2:00 PM
Lunch Break
2:00 PM-2:40 PM
Proteoform Informatics: Computation in Top-Down Proteomics
Room: Columbus CD
  • Richard LeDuc, Northwestern University, United States

Presentation Overview: Show

Proteoforms are where the central dogma of molecular biology is closest to the phenotype, and top-down proteomics measures proteoforms. Here, I will use the computational workflow of the National Resource for Translational and Developmental Proteomics to emphasis current theoretical and social informatic challenges. Theoretical problems include issues such as the analyses to infer mass (AIMs) which translates raw tandem MS data into neutral mass values and the calculation of FDR for protein and proteoform identifications. While social issues include efforts for open proteoform standards such as the recent ProForma notation and the design and implementation of a centralized database of experimentally verified proteoforms.

2:40 PM-3:00 PM
Proceedings Presentation: Bayes networks for mass spectrometric metabolite identification via molecular fingerprints
Room: Columbus CD
  • Marcus Ludwig, Friedrich-Schiller-University Jena, Germany
  • Kai Dührkop, Friedrich-Schiller-University Jena, Germany
  • Sebastian Böcker, Friedrich Schiller University Jena, Germany

Presentation Overview: Show

Metabolites, small molecules that are involved in cellular reactions, provide
a direct functional signature of cellular state. Untargeted metabolomics
experiments usually rely on tandem mass spectrometry to identify the
thousands of compounds in a biological sample. Recently, we presented
CSI:FingerID for searching in molecular structure databases using tandem mass
spectrometry data. CSI:FingerID predicts a molecular fingerprint that
encodes the structure of the query compound, then uses this to search a
molecular structure database such as PubChem. Scoring of the predicted query
fingerprint and deterministic target fingerprints is carried out assuming
independence between the molecular properties constituting the fingerprint.

We present a scoring that takes into account dependencies between molecular
properties. As before, we predict posterior probabilities of molecular
properties using machine learning. Dependencies between molecular properties
are modeled as a Bayesian tree network; the tree structure is estimated on
the fly from the instance data. For each edge, we also estimate the expected
covariance between the two random variables. For fixed marginal
probabilities, we then estimate conditional probabilities using the known
covariance. Now, the corrected posterior probability of each candidate can
be computed, and candidates are ranked by this score. Modeling dependencies
improves identification rates of CSI:FingerID by 2.85 percentage points.

3:00 PM-3:40 PM
Computational and Statistical Methods for Mass Spectrometry Imaging
Room: Columbus CD
  • Kylie Bemis, Northeastern University, United States

Presentation Overview: Show

Mass spectrometry imaging (MSI) allows visualization of the spatial distribution of molecular ions in a sample by repeatedly collecting mass spectra from spatial locations on its surface. The high-dimensional spectra of the resulting data combined with a complex spatial structure poses a challenge for existing statistical methods. Recent improvements in mass and spatial resolution have led to larger datasets and larger file sizes, further compounding the statistical and computational challenge.

The rapid evolution in instrumentation for high-resolution MSI must be matched by an evolution in methods for statistical analysis and tools for experimental design. However, despite the proliferation of machine learning algorithms and ad hoc data analysis tools for MSI, the development and adoption of appropriate statistical methods and reproducible experimental design has lagged behind. Many experiments with otherwise high-quality data still suffer from inadequate sample sizes or flawed experimental design, and few methods for statistical analysis exist.

We present several statistical methods for MSI, including a statistical learning method for supervised classification and unsupervised segmentation, and a Bayesian method for class comparison. These methods take into account the spatial structure of MSI experiments and allow for statistical inference. We present case studies using these methods to show why accounting for the spatial structure and experimental design of MSI experiments is vital to achieving accurate and reproducible results. These methods are implemented in our open-source R package Cardinal, which is designed for statistical analysis of MSI data.

Lastly, we present a recent extension of the Cardinal package. We have developed a new framework for statistical computing with larger-than-memory bioinformatics data-on-disk in R. Released as the standalone R package matter, it has also been integrated with Cardinal in order to facilitate rapid development of novel statistical methods for high-resolution MSI.

3:40 PM-4:00 PM
CompMS Poster Flash Presentation
Room: Columbus CD
4:00 PM-4:40 PM
Coffee Break
4:40 PM-5:20 PM
Measuring proteomes with long strings: A new, unconstrained strategy in mass spectrum interpretation
Room: Columbus CD
  • Joshua Elias, Stanford University, United States

Presentation Overview: Show

Thousands of protein post-translational modifications (PTMs) dynamically impact nearly all cellular functions. Although mass spectrometry is suited to PTM identification, it has historically been biased towards a few with established enrichment procedures. To measure all possible PTMs across diverse proteomes, software must overcome two fundamental challenges: intractably large search spaces and difficulty distinguishing correct from incorrect spectrum interpretations. In this talk, I will describe TagGraph, software that overcomes both challenges with a string-based search method that is orders of magnitude faster than current approaches, and a probabilistic validation model optimized for PTM assignments. When applied to a large human proteomic data set, TagGraph triples confident spectrum identifications while revealing thousands of modification types spanning nearly one million sites across the proteome. This analysis revealed new contexts for highly abundant yet understudied PTMs such as proline hydroxylation. TagGraph expands our ability to survey the full proteomic landscape of PTMs, and opens new analysis possibilities that have been difficult to address with prior software.

5:20 PM-6:00 PM
COSI CompMS Business Meeting
Room: Columbus CD
  • William Noble, University of Washington, United States
  • Oliver Kohlbacher, University of Tübingen, Germany

Presentation Overview: Show

COSI Business Meeting