Presentation Overview: Show
It is now widely known that lipids play a key role in a variety of diseases including diabetes, obesity, heart diseases, cancer, or neurodegenerative diseases. Consequently, lipidomics, which aims at studying the structure and function and the biochemical interactions of the complete set of lipids (the lipidome) in cells, fluids, or organism, has successfully been applied in biomedical and pharmaceutical research. More recently, it has been realized that not only native lipids but also epilipids (e.g. enzymatically or non-enzymatically modified lipids, which are usually low abundant species) can represent valuable biomarkers of biochemical impairments. The need of exploring the whole lipidome and epilipidome has boosted the use of untargeted approaches, which are now facilitated by the recent advances in mass spectrometry instrumentation. Nevertheless, untargeted approaches generate a massive amount of data and therefore the use of in silico tools for data analysis and interpretation become essential. Here, the Lipostar 2 software is described with a special focus on strategies for epilipidomics and on tools for biomarker discovery and for pharmaceutical applications.
Presentation Overview: Show
Secondary metabolites are a diverse class of small molecules with many applications in medicine and industry. Metabolomics often relies on tandem mass spectrometry to discover novel compounds but interpreting metabolomics mass spectra remains challenging and requires appropriate computational methods.
We developed VarMet, a database search tool for identifying variants of secondary metabolites in mass spectrometry data. This approach extends our previous molDiscovery model for small molecule fragmentation with a modification-tolerant search inspired by VarQuest and adapted to a wide range of metabolites, including polyketides, terpenes, and lipids. We evaluated VarMet on 8,765 spectra from the GNPS spectral library and a chemical database consisting of 44,961 molecules from the NP Atlas and GNPS annotations. Correct variant was ranked first for 35% of heavy (precursor mass >500 Da) and 10% of light spectra (precursor mass <500 Da). For 59% and 25% of the spectra groups the correct variant was within the top ten identifications.
We applied VarMet to the Smenospongia aurea metabolome and demonstrated how it identified a smenamide variant missed by the molecule network approach. Overall, VarMet fills the gap in the identification of novel variants of small molecules and better addresses the chemical diversity of secondary metabolites.
Presentation Overview: Show
Tracer metabolomics allows to follow an isotope-labeled substrate through downstream pathways. To understand changes between conditions of interest, it is then necessary to perform differential analysis. However, most of the available pipelines are dedicated to conventional metabolomics or are not open-source. We have developed DIMet (Differential Isotope targeted Metabolomics), an open-source tool for differential analysis of isotopically labeled data.
DIMet accepts isotopologues’ and full metabolites’ abundances as input. Fold Change is computed for each metabolite, and the statistical test of choice yields p-values which are adjusted for multiple comparisons. Moreover, DIMet allows jointly representing the results of differential analyses for isotope-labeled and RNAseq data in the form of metabolograms.
We illustrate the use of DIMet on data acquired from glioblastoma P3 xenograft cancer cells. Using 13C6-glucose, P3 wild-type and double LDHA/B KO cells were exposed to hypoxia. Both tracer metabolomics and transcriptome data were obtained.
Isotopologue labeled data was analyzed using DIMet, and RNAseq using DESeq2. LDHA/B KO and controls comparison yielded 21 DAMs and 8287 DEGs, respectively, showing impaired glycolysis, TCA cycle and gluconeogenesis . DIMet, together with the differential transcriptomics, allowed us to draw conclusions on GB metabolic plasticity at the systems level.
Presentation Overview: Show
Imaging Mass Spectrometry (MS) is increasingly used tool to study metabolites and lipids with spatial resolution in tissues and single cells. For instance, METASPACE, the largest repository for spatial metabolomics datasets, contains over 8,000 public datasets from a range of organisms, tissues, and conditions. However, current imaging MS approaches only allow for an ambiguous annotation of molecules based on their mass-to-charge ratio, limiting mechanistic insight in disease contexts or integration with other omics data. To address this issue, we present a network-based approach that leverages lipid-metabolic networks to improve imaging MS annotations. We propose to rank candidate molecules based on the assumption that metabolites and lipids are more likely to be detected if their network upstream or downstream counterparts are detected, too. We evaluated the performance of our approach on matched single-cell and bulk LC-MS/MS lipidomics of cultured cells, showing improved accuracy of predicted annotations confirmed by MS/MS spectra. By reducing the number of possible annotations per ion to a few biochemically highly likely annotations, our approach helps enhance the interpretability of imaging MS datasets on a pathway level. This can facilitate the usage of single-cell and spatial metabolomics to functionally understand metabolism in health and disease.
Presentation Overview: Show
Understanding underlying biological processes is essential for providing effective treatment for diseases. Investigating complex biological behavior at the cellular and molecular levels requires profiling different aspects of human biology, such as small molecules (metabolites). Liquid chromatography-mass spectrometry (LC-MS) based methods are ideal tools for characterizing and investigating human health by profiling metabolites. LC-MS-based metabolomics methods can yield data on thousands of features, each characterized by its descriptors: a measured mass-to-charge ratio (m/z), chromatographic retention time (RT), and signal intensity (SI). LC-MS techniques are powerful and, at the same time, very sensitive and complex. In addition, this complexity increases when working with metabolites among different studies or LC-MS platforms. We present massSight, a computational tool to inspect and adjust trends, scale raw metabolite intensities, align peaks (annotated and unannotated) between separately acquired data sets, and remove redundancies in nontargeted LC-MS data arising from multiple ionization products of a single metabolite. The platform will come with a set of novel tools, complementary to LC-MS metabolite profiling techniques, to accurately profile features, align features across studies and platforms, perform scaling, and consolidate adducts and fragments of chemical compounds. massSight is open-source software with documentation available online at http://github.com/omicsEye/massSight.
Presentation Overview: Show
Hydrogen-Deuterium Exchange mass-spectrometry (HDX-MS) has emerged as a powerful technique to explore the conformational dynamics of proteins and protein complexes in solution. In the bottom-up approach to MS, deuterium uptake is reported at the level of peptides, which complicates interpretation and means ad-hoc approaches are used to resolve contradictions between overlapping peptides. Here we propose to leverage the overlap in peptides, the temporal component of the data and the correlation along the sequence dimension to infer residue-level uptake patterns. Our model treats HDX-MS as a multiple change-point problem - inferring at which residues HDX has changed. Fitting our model in the Bayesian non-parametric framework allows inference of the number of parameters, quantitative assessments of the confidence of differential HDX and uncertainty estimates of the temporal kinetics. We benchmark our approach against others using a three-way proteolytic digestion experiment and find that it out-performs other available methods. We illustrate our approach on a number of case-studies.
Presentation Overview: Show
Top-down proteomics (TDP) directly analyzes intact proteins and thus provides more comprehensive proteoform-level information than conventional bottom-up proteomics that relies on digested peptides. Significant advancements have been made in TDP in different aspects, progressively unlocking the potential of TDP in numerous biological and medical applications. However, reliable and reproducible data analysis still remains one of the major bottlenecks in TDP. A key prerequisite step for robust data analysis is the establishment of an objective estimation of proteoform-level false discovery rate (FDR) in proteoform identification. The most widely used FDR estimation scheme is FDR via target-decoy approach (TDA), which has primarily been established for bottom-up proteomics. We present evidence that the commonly used TDA-based FDR estimation may not work at the proteoform-level due to an overlooked factor, namely the erroneous deconvolution of precursor masses (simply precursor deconvolution error), which leads to incorrect FDR estimation. We argue that the conventional TDA-based FDR in proteoform identification is in fact protein-level FDR rather than proteoform-level FDR unless precursor deconvolution error rate is taken into account. To address this issue, we propose a formula to correct for proteoform-level FDR bias by combining TDA-based FDR and precursor deconvolution error rate.
Presentation Overview: Show
Proteomic mass spectra have the intriguing property of being highly redundant in some information, while being incomplete at the same time. The wealth of spectra available allows for deep learning approaches, which can be designed to work without pre-designed features, but learn the fragmentation and interplay of molecules from data alone. The learning process includes steps ranging from the automated compilation of relevant data sets, to identification, quantification and classification of spectra. A particular interest is in the actual explanation of spectral patterns learnt by neural networks. Relevant applications include immunoproteomics and metaproteomics.
Presentation Overview: Show
With the widespread use of MS-based shotgun proteomics, countless datasets of different human cell types and tissues with deep proteome coverage are constantly being added to repositories providing valuable quantitative information on proteome-wide protein copy numbers. However, it remains largely underused because of technical challenges to compare protein levels across individual studies. Here we introduce MaxQuantAtlas, a software platform for the integration of MaxQuant-processed proteomics datasets over many samples acquired with label-free and label-based quantification strategies and instrument types.
MaxQuantAtlas produces one unified database table with concentration profiles over all samples. For isobaric labeling samples, we introduce algorithms for ratio decompression and combined MS1-MS2 quantification to obtain cellular protein abundances. Protein-level aggregated MS signals are scaled to concentrations using a proteomic ruler-like method. For multivariate analysis, we developed a novel imputation method compatible with varying dynamic ranges. A two-dimensional quality-score based on sample dynamic ranges and correlation of housekeeping proteins detects problematic samples leading to their automatic exclusion. We clearly observed meaningful clustering of samples by biological origin that was irrespective of quantification method. Samples yielded similar concentration profiles when they were analyzed in a label-free or multiplexed together with unrelated samples of TMT sets, showing successful integration across technologies.
Presentation Overview: Show
Thermal proteome profiling (TPP) is a proteome wide technology combining the cellular thermal shift assay with quantitative mass spectrometry to provide insights into protein interactions and states. Statistical analysis of temperature range TPP (TPP-TR) datasets relies on comparing protein melting curves, describing the amount of non-denatured proteins as a function of temperature, between different conditions (e.g. presence or absence of a drug). However, state-of-the-art models are restricted to sigmoidal melting behaviours, while unconventional melting curves represent up to 20% of TPP-TR datasets. We thus propose a novel statistical framework based on hierarchical Gaussian Process models, to make TPP-TR datasets analysis unbiased. The model scaled to multiple conditions and complex TPP-TR protocols. Especially, the analysis of peptide-level TPP-TR datasets, considering melting curves of tryptic peptides instead of protein averages, is implemented using deeper hierarchies. Unbiased analysis of these datasets, of high value for the study of protein post-translational modifications, were yet impossible due to unconventional melting curves abundance. Collectively, this statistical framework extends the analysis of TPP-TR datasets for both protein and peptide level melting curves, offering access to thousands of previously excluded melting curves and paving the way to new biological discoveries on protein interactions, localization and functions.
Presentation Overview: Show
Unraveling the dynamic interaction between plants and endophytes provides enormous opportunities to improve microbiome-optimized plant growth and health, which makes crops less dependent on fertilizers and pesticides. A critical step here is to unravel the plant biosynthetic pathways involved in the recruitment of endophytes and the regulatory networks governing them. Our project investigates the dynamics and architecture of plant gene regulatory networks (GRNs) to decipher plant biosynthetic pathways by integrative omics strategies. Here, whole transcriptome and metabolomes from roots and root exudates of Arabidopsis thaliana are being investigated to connect expression patterns of genes to metabolites that play a crucial role in the plant-endophytes interactions. To achieve this, we have developed a software, MEANtools, that predicts metabolic pathways, through the integration of transcriptomics with the untargeted metabolomics data and by incorporating knowledge from the known reactions and chemical structures available in the publicly available databases. We are testing the development of MEANtools with high-resolution time-based paired-transcriptomics and -metabolomics datasets, from plants and other species. We further propose that the pathway prediction accuracy of MEANtools can be further enhanced by integrating strategies that allow associating spatial and temporal gene expression with metabolite abundances across samples to identify potentially causal links.
Presentation Overview: Show
Liquid chromatography-high-resolution mass spectrometry (LC-HRMS) of complex samples rapidly generates gigabytes of data in a single study - of which only a fraction can be annotated, leaving a vast unknown chemical space to explore. This creates a need for flexible and scalable software tools for data processing. MZmine 3 is a modular platform-independent software with a vibrant open-source community. Recent contributions have expanded its capabilities to support various MS platforms, leading up to integrative workflows that combine ion mobility spectrometry (IMS) data from LC-MS and MS imaging. Those workflows enable large-scale metabolomics and lipidomics research by spectral preprocessing, feature detection, and various options for compound identification for thousands of samples in parallel. The modern graphical user interface and interactive visualization plots facilitate data exploration and validation of results from every processing step. In addition, the MZmine Processing Wizard introduces easy-to-configure workflows for different MS platforms and research targets.
This presentation will highlight the latest advances in the MZmine project, guiding large-scale untargeted MS data analysis as well as automatic reference data generation. We also introduce a new open MSn spectral library that was generated with MZmine, leveraging a high throughput MSn data acquisition strategy. A total of 15,000 bioactive compounds and natural products were analyzed within 10 days in both positive and negative ion modes (3 minutes for a mix of 10 compounds). Preliminary results gave an automatic annotation rate of over 70% of the compounds, yielding a mean of 60 MSn spectra per compound for various ion adducts and fragmentation energies. We anticipate this unique high-quality MSn tree library will accelerate the development of annotation tools and new machine learning-based tools.
Ref: Schmid, R., Heuckeroth, S., Korf, A. et al. Integrative analysis of multimodal mass spectrometry data in MZmine 3. Nat Biotechnol (2023). https://doi.org/10.1038/s41587-023-01690-2