Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

CompMS: Computational Mass Spectrometry

COSI Track Presentations

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
Tuesday, July 23rd
10:15 AM-10:20 AM
Welcome to CompMS COSI
Room: Shanghai 3/4 (Ground Floor)
  • Oliver Kohlbacher, Institute for Biomedical Informatics (IBMI), University of Tübingen, Tübingen, Germany
  • William Noble, University of Washington, United States
  • Olga Vitek, Northeastern University, United States

Presentation Overview: Show

...

10:20 AM-11:00 AM
Mobi-DIK: A novel algorithm for analysis of data-independent acquisition (DIA) data coupled to ion mobility
Room: Shanghai 3/4 (Ground Floor)
  • Hannes Roest, University of Toronto, Canada

Presentation Overview: Show

Data-independent acquisition (DIA) has gained popularity due to its
reproducibility and sensitivity in high-throughput proteomics studies.
Meanwhile, parallel-accumulation serial-fragmentation (PASEF) exploits
trapped ion mobility spectrometry to achieve high duty cycle,
efficient ion usage and improve peptide identification rates in DDA.
By coupling DIA isolation windows to the precise ion mobility elution
of the corresponding ions, the combination of windowed DIA with the
PASEF principle allows multiplexing of DIA windows in a single 100ms
ion mobility separation of precursor ions. Here, we present Mobi-DIK
(Ion Mobility DIA Analysis Kit) a novel software capable of analyzing
highly multiplexed diaPASEF data using the targeted extraction
paradigm first developed for SWATH-MS data. Our software exploits the
ion mobility dimension through selective extraction and scoring.

Results: Mobi-DIK is capable of splitting multiplexed scans
automatically and perform targeted extraction with high efficiency.
The simulation module can accurately simulate and predict coverage on
precursor, peptide and protein levels of different acquisition
schemes. Our data analysis pipeline Mobi-DIK is based on the open
source OpenMS C++ library and is capable of spectral library
generation and subsequent extraction, scoring and q-value estimation
for accurate FDR calculation. The software generates ion
mobility-enabled spectral libraries directly from highly fractionated
DDA PASEF runs from MaxQuant output and stores them in the
standardized TraML format. For analysis, Mobi-DIK automatically
calibrates mass (non-linear), retention time (non-linear) and drift
time (linear) between our assay library and experimental diaPASEF
runs, achieving less than 1% deviation in drift time values. Our
Mobi-DIK software combining DIA with IM in a targeted platform is
capable of quantifying 7000 proteins at 1% FDR.

11:00 AM-11:20 AM
Proceedings Presentation: pNovo 3: precise de novo peptide sequencing using a learning-to-rank framework
Room: Shanghai 3/4 (Ground Floor)
  • Hao Yang, Institute of Computing Technology, CAS, China
  • Hao Chi, Institute of Computing Technology, CAS, China
  • Wen-Feng Zeng, Institute of Computing Technology, CAS, China
  • Wen-Jing Zhou, Institute of Computing Technology, CAS, China
  • Si-Min He, Institute of Computing Technology, CAS, China

Presentation Overview: Show

Motivation: De novo peptide sequencing based on tandem mass spectrometry data is the key tech-nology of shotgun proteomics for identifying peptides without any database and assembling unknown proteins. However, owing to the low ion coverage in tandem mass spectra, the order of certain con-secutive amino acids cannot be determined if all of their supporting fragment ions are missing, which results in the low precision of de novo sequencing.
Results: In order to solve this problem, we developed pNovo 3, which used a learning-to-rank framework to distinguish similar peptide candidates for each spectrum. Three metrics for measuring the similarity between each experimental spectrum and its corresponding theoretical spectrum were used as important features, in which the theoretical spectra can be precisely predicted by the pDeep algorithm using deep learning. On seven benchmark data sets from six diverse species, pNovo 3 recalled 29–102% more correct spectra, and the precision was 11–89% higher than three other state-of-the-art de novo sequencing algorithms. Furthermore, compared with the newly developed DeepNovo, which also used the deep learning approach, pNovo 3 still identified 21–50% more spectra on the nine data sets used in the study of DeepNovo. In summary, the deep learning and learning-to-rank techniques implemented in pNovo 3 significantly improve the precision of de novo sequencing, and such machine learning framework is worth extending to other related research fields to distinguish the similar sequences.

11:20 AM-11:40 AM
Proceedings Presentation: NPS: scoring and evaluating the statistical significance of peptidic natural product–spectrum matches
Room: Shanghai 3/4 (Ground Floor)
  • Azat Tagirdzhanov, Center for Algorithmic Biotechnology, St. Petersburg State University, St. Petersburg, Russia, Russia
  • Alexander Shlemov, Center for Algorithmic Biotechnology, St. Petersburg State University, St. Petersburg, Russia, Russia
  • Alexey Gurevich, Center for Algorithmic Biotechnology, St. Petersburg State University, Russia

Presentation Overview: Show

Motivation: Peptidic Natural Products (PNPs) are considered a promising compound class that has many applications in medicine. Recently developed mass spectrometry-based pipelines are transforming PNP discovery into a high-throughput technology. However, the current computational methods for PNP identification via database search of mass spectra are still in their infancy and could be substantially improved.
Results: Here we present NPS, a statistical learning-based approach for scoring PNP–spectrum matches. We incorporated NPS into two leading PNP discovery tools and benchmarked them on millions of natural product mass spectra. The results demonstrate more than 45% increase in the number of identified spectra and 20% more found PNPs at a false discovery rate of 1%.
Availability: NPS is available as a command line tool and as a web application at http://cab.spbu.ru/software/NPS

11:40 AM-12:20 PM
Anything to gain from single-cell measurements?
Room: Shanghai 3/4 (Ground Floor)
  • Manfred Classen, ETH Zurich, Switzerland

Presentation Overview: Show

Single cell technologies are transforming the investigation of cell states in biological processes and heterogeneous tissues. Identification of cell types/states and reconstruction of cell state sequences from such data typically rely on unsupervised approaches, and therefore fail to take advantage of phenotype information coming along with single experiments (e.g. disease status or time points), and in presence of confounding variation fail to identify phenotype-associated cell subsets. To fill this gap, we have developed a suite of supervised learning approaches to identify phenotype associated cell subsets from high-dimensional single cell data. Specifically, we demonstrate how (deep and shallow) convolutional neural networks can identify of rare CMV infection and multiple sclerosis-associated cell subsets in peripheral blood, and extremely rare leukemic blast populations in minimal residual disease-like situations, as well as identification of morphological patterns associated with severity of prostate cancer. Further we recently developed psupertime, a supervised approach to pseudotime ordering. We demonstrate superior ordering for five single cell RNA-seq studies to conventional unsupervised pseudotime ordering techniques. We expect these supervised learning approaches to tap the potential of multi-experiment studies to come by enabling the identification and molecular characterization of phenotype-associated cell subpopulations in the complex tissue context across health and disease.

12:20 PM-12:40 PM
Proteomics & bioinformatics to evaluate the quality of transcriptome assembly and to measure the extent of animal intrapopulation variability
Room: Shanghai 3/4 (Ground Floor)
  • Yannick Cogne, Laboratory «Innovative technologies for Detection and Diagnostics» CEA-Marcoule, DRF-Li2D, France
  • Christine Almunia, Laboratory «Innovative technologies for Detection and Diagnostics» CEA-Marcoule, DRF-Li2D, France
  • Duarte Gouveia, Laboratory «Innovative technologies for Detection and Diagnostics» CEA-Marcoule, DRF-Li2D, France
  • Davide Degli Esposti, IRSTEA - Centre de Lyon, France
  • Olivier Pible, CEA / DSV, France
  • Olivier Geffard, IRSTEA - Centre de Lyon, France
  • Arnaud Chaumot, IRSTEA - Centre de Lyon, France
  • Jean Armengaud, Laboratory «Innovative technologies for Detection and Diagnostics» CEA-Marcoule, DRF-Li2D, France

Presentation Overview: Show

Sentinel animals are widely used for monitoring the quality of our environment. We explored the response of fresh-water gammarids by RNA-seq informed proteomics. To take into account the biodiversity of these gammarids, we measured the protein abundances by shotgun label-free proteomics of 164 individuals belonging to 7 species, leading to an amazing proteomic dataset. We also sequenced by RNAseq a male and a female of each of these 7 species.
We explored different strategies to improve the assembly of the RNAseq data considering the number of MS/MS spectra assigned as a key parameter, but also to optimize the construction of the RNA-seq derived protein sequence database. Once the proteomics data interpreted, we specifically analysed two regional Gammarus pulex populations to characterize the potential proteome divergence induced in one site by natural bioavailable mono-metallic contamination, i.e. cadmium, compared to a non-contaminated site.
We observed that the intra-population proteome variability of long-term exposed G. pulex was inflated relatively to the non-contaminated population. These bioinformatics results show that, while remaining a challenge for such organisms with not yet sequenced genomes, taking into account intra-population variability is important to better define the molecular players induced by toxic stress in a comparative field proteomics approach.

2:00 PM-2:40 PM
Complex-centric proteome profiling by SEC-SWATH-MS
Room: Shanghai 3/4 (Ground Floor)
  • Isabell Bludau, Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Switzerland

Presentation Overview: Show

Proteins are major effectors and regulators of biological processes and can elicit multiple functions depending on their interaction with other proteins. Therefore, it is of central interest in systems biology to determine the interactions and cooperation of proteins as a function of cell state. We have developed an integrated experimental and computational technique for detecting in parallel hundreds of protein complexes, as well as changes in their composition and abundance between samples, in a single operation. The method consists of size exclusion chromatography (SEC) to fractionate native protein complexes, SWATH/DIA mass spectrometry to precisely quantify the proteins in each SEC fraction, and the computational framework CCprofiler to detect and quantify protein complexes by error-controlled, complex-centric analysis, using prior information from generic protein interaction maps (Heusel & Bludau at al., 2019). Application of our workflow to the HEK293 cell line proteome delineates 462 complexes composed of 2,127 protein subunits, entailing 7673 unique protein-protein interactions. Our analysis further provided insights into novel sub-complexes and assembly intermediates of central regulatory complexes such as the proteasome. We have recently extended this workflow to study rearrangements of protein complex assemblies across different cell states, providing insights into assembly changes that are not captured by full proteome analyses. To increase throughput for such comparative SEC-SWATH-MS analyses, we established a fast protocol based on a 21 minute gradient on the EvoSep One HPLC system that enables the measurement of 65 SEC fractions of a biological sample per day, while minimizing loss of information. Furthermore, we extended the computational analysis workflow in CCprofiler to take advantage of available peptide-level information in the SEC-SWATH-MS data to investigate proteoform-specific complex integration. Overall, the method provides novel insights into the interplay between different protein variants and their impact on protein interactions and functionality on an unprecedented, system wide scale.

2:40 PM-3:00 PM
Proceedings Presentation: ADAPTIVE: leArning DAta-dePendenT, concIse molecular VEctors for fast, accurate metabolite identification from tandem mass spectra
Room: Shanghai 3/4 (Ground Floor)
  • Dai Hai Nguyen, Kyoto University, Japan
  • Canh Hao Nguyen, Bioinformatics Center, ICR, Kyoto University, Japan
  • Hiroshi Mamitsuka, Kyoto University / Aalto University, Japan

Presentation Overview: Show

Motivation: Metabolite identification is an important task in metabolomics to enhance the knowledge of biological systems. There have been a number of machine learning based methods proposed for this task, which predict a chemical structure of a given spectrum through an intermediate (chemical structure) representation called molecular fingerprints. They usually have two steps: 1) predicting fingerprints from
spectra; 2) searching chemical compounds (in database) corresponding to the predicted fingerprints. Fingerprints are feature vectors, which are usually very large to cover all possible substructures and chemical properties, and therefore heavily redundant, in the sense of having many molecular (sub)structures irrelevant
to the task, causing limited predictive performance and slow prediction.
Results: We propose ADAPTIVE, which has two parts: learning two mappings 1) from structures to molecular vectors and 2) from spectra to molecular vectors. The first part learns molecular vectors for metabolites from given data, to be consistent with both spectra and chemical structures of metabolites. In more detail, molecular vectors are generated by a model, being parameterized by a message passing
neural network (MPNN), and parameters are estimated by maximizing the correlation between molecular vectors and the corresponding spectra in terms of Hilbert-Schmidt Independence Criterion (HSIC). Molecular vectors generated by this model are compact and importantly adaptive (specific) to both given data and task of metabolite identification. The second part uses input output kernel regression (IOKR),
the current cutting-edge method of metabolite identification. We empirically confirmed the effectiveness of ADAPTIVE by using a benchmark data, where ADAPTIVE outperformed the original IOKR in both predictive performance and computational efficiency.

3:00 PM-3:20 PM
Proceedings Presentation: Unsupervised segmentation of mass spectrometric ion images characterizes morphology of tissues
Room: Shanghai 3/4 (Ground Floor)
  • Dan Guo, Northeastern University, United States
  • Kylie Bemis, Northeastern University, United States
  • Catherine Rawlins, Northeastern University, United States
  • Jeffrey Agar, Northeastern University, United States
  • Olga Vitek, Northeastern University, United States

Presentation Overview: Show

Mass spectrometry imaging (MSI) characterizes the spatial distribution of ions in complex biological samples such as tissues. Since many tissues have complex morphology, treatments and conditions often affect the spatial distribution of the ions in morphology-specific ways. Evaluating the selectivity and the specificity of ion localization and regulation across morphology types is biologically important. However, MSI lacks algorithms for segmenting images at both single-ion and spatial resolution. This manuscript contributes Spatial-DGMM, an algorithm and a workflow for the analyses of MSI experiments, that detects components of single-ion images with homogeneous spatial composition. The approach extends Dirichlet Gaussian mixture models (DGMM) to account for the spatial structure of MSI. Evaluations on simulated and experimental datasets with diverse MSI workflows demonstrated that Spatial-DGMM accurately segments ion images, and can distinguish ions with homogeneous and heterogeneous spatial distribution. We also demonstrated that the extracted spatial information is useful for downstream analyses, such as detecting morphology-specific ions, finding groups of ions with similar spatial patterns, and detecting changes in chemical composition of tissues between conditions.

3:20 PM-3:40 PM
SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information
Room: Shanghai 3/4 (Ground Floor)
  • Kai Dührkop, Friedrich-Schiller-University Jena, Germany
  • Markus Fleischauer, Friedrich Schiller University Jena, Germany
  • Marcus Ludwig, Friedrich Schiller University Jena, Germany
  • Alexander A. Aksenov, University of California, Los Angeles, United States
  • Alexey V. Melnik, University of California, Los Angeles, United States
  • Marvin Meusel, Friedrich Schiller University Jena, Germany
  • Pieter C. Dorrestein, Department of Chemistry and Biochemistry department, UCSD, United States
  • Sebastian Böcker, Friedrich Schiller University Jena, Germany
  • Juho Rousu, Aalto University, Finland

Presentation Overview: Show

The identification of molecules remains a central question in analytical chemistry, in particular for natural products research, untargeted metabolomics, etc. Mass spectrometry is a predominant experimental technique in these fields, but metabolite structural elucidation remains highly challenging. We report SIRIUS 4, a specialized tool that addresses two fundamental questions: What is the molecular formula of the query compound among all molecular formulas, both previously observed and unobserved? For both questions, SIRIUS 4 offers best-of-class performance; for searching in molecular structure databases, SIRIUS 4 integrates CSI:FingerID as a web service. In evaluation, the number of wrongly assigned molecular formulas decreased by 31.7% compared to SIRIUS 3.0. For structure elucidation, the CSI:FingerID web service achieved identification rates of 74% on challenging independent metabolomics datasets, searching in a biocompound structure database with 0.5 million structures. Finally, running times improved substantially (231- to 332-fold) to the previous version. To this end, Users can now analyze full full liquid chromatography-mass spectrometry (LC-MS) datasets, rather than just one spectrum at a time; MS-driven annotations can be obtained for all detected features, not just those passing a preliminary statistical test, say, on fold change.

3:40 PM-4:00 PM
CompMS COSI business meeting
Room: Shanghai 3/4 (Ground Floor)
  • Oliver Kohlbacher, Institute for Biomedical Informatics (IBMI), University of Tübingen, Tübingen, Germany
  • William Noble, University of Washington, United States
  • Olga Vitek, Northeastern University, United States

Presentation Overview: Show

...

4:40 PM-6:00 PM
Poster lightning talks - CompMS
Room: Shanghai 3/4 (Ground Floor)
  • ...

Presentation Overview: Show

...