General Computational Biology

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in CEST
Thursday, July 27th
Proceedings Presentation: PlasmoFAB: A Benchmark to Foster Machine Learning for Plasmodium falciparum Protein Antigen Candidate Prediction
Room: Lumière Auditorium
Format: Live from venue

  • Jonas Christian Ditz, University of Tübingen, Germany
  • Jacqueline Wistuba-Hamprecht, University of Tübingen, Germany
  • Timo Maier, University of Tübingen, Germany
  • Rolf Fendel, University of Tübingen, Germany
  • Nico Pfeifer, Department of Computer Science, University of Tübingen, Germany
  • Bernhard Reuter, Department of Computer Science, University of Tübingen, Germany, Germany

Presentation Overview: Show

Motivation: Machine learning methods can be used to support scientific discovery in healthcare-related research fields. However, these methods can only be reliably used if they can be trained on high-quality and curated datasets. Currently, no such dataset for the exploration of Plasmodium falciparum protein antigen candidates exists. The parasite Plasmodium falciparum causes the infectious disease malaria. Thus, identifying potential antigens is of utmost importance for the development of antimalarial drugs and vaccines. Since exploring antigen candidates experimentally is an expensive and time-consuming process, applying machine learning methods to support this process has the potential to accelerate the development of drugs and vaccines, which are needed for fighting and controlling malaria.

Results: We developed PlasmoFAB, a curated benchmark that can be used to train machine learning methods for the exploration of Plasmodium falciparum protein antigen candidates. We combined an extensive literature search with domain expertise to create high-quality labels for Plasmodium falciparum specific proteins that distinguish between antigen candidates and intracellular proteins. Additionally, we used our benchmark to compare different well-known prediction models and available protein localization prediction services on the task of identifying protein antigen candidates. We show that available general-purpose services are unable to provide sufficient performance on identifying protein antigen candidates and are outperformed by our models that were trained on this tailored data.

hECA: Human Ensemble Cell Atlas as a Virtual Body for “In Data” Cellular Experiments
Room: Lumière Auditorium
Format: Live from venue

  • Sijie Chen, Tsinghua University, China
  • Lei Wei, Tsinghua University, China
  • Xuegong Zhang, Tsinghua University, China

Presentation Overview: Show

Profiling the molecular features of all cells with their anatomical and functional attributes is essential for understanding the human body in health and diseases. Scientists have been enthusiastic in building such atlases of human cells using single-cell omics technologies. The community has conducted more and more single-cell studies with the rapid development and popularization single-cell RNA-sequencing technologies. Tremendous amount of single-cell data has been accumulating in the public domain. This suggests the possibility of building cell atlases by assembling such “shot-gun” data in scattered publications. Cell atlas assembly faces several major challenges comparing with the shot-gun assembly of the human genome. We proposed a unified information framework for assembling atlases and built the first cell-centric human Ensemble Cell Atlas (hECA) assembled from scattered data. We developed the “in data” cell sorting scheme that allows extracting cells using logic formula from the hECA as a “virtual human body” to investigate scientific questions involving multiple organs and cell types. We also developed a multidimensional coordinate system UniCoord for different physical and biological attributes of cells by adopting a supervised variational autoencoder (VAE) neural network model, and trained it on hECA to make it represent the diversity of healthy human cells.

Cell reference atlas for transcriptional alterations of Mouse Trigeminal Ganglion Neurons revealed by Single-Cell Analysis
Room: Lumière Auditorium
Format: Live-stream

  • Gerda Cristal Villalba Silva, Baylor College of Medicine, Brazil
  • Jin Li, Baylor College of Medicine, China
  • Rui Chen, Baylor College of Medicine, United States

Presentation Overview: Show

Trigeminal neurons play a crucial role in the perception of pain. Single-cell transcriptomics provides a promising avenue to decipher the underlying mechanisms of pain. In this case, a reference remains missing. To fill the gap, we have collected the most comprehensive datasets and employed a meta-analysis. We collected four public datasets for trigeminal tissues across various technologies. Raw sequencing reads were aligned to the mm10 reference genome using CellRanger . Standardized quality control analysis has been performed to exclude estimated empty cells, ambient RNAs, and doublets using dropkick, SoupX, and DoubletFinder. Processed datasets are integrated to reduce the sample effects by using scVI. We clustered the cells using Leiden algorithm. UMAP was used to generate cell clusters. We annotated clusters using known cell type marker genes. We used 4 public available datasets, 3 dropseq, and 1 from 10x genomics. We achieved 44,770 cells, 16 clusters, consisting of 6 neuron populations, 3 glial, 4 immune, fibroblasts and endothelial cells. To facilitate the public use of the generated atlas, we provided an automated cell type annotation utility using scArches, and the reference is visualized by CELLxGENE. This atlas may serve as a valuable data resource for the mouse trigeminal community.

Accurately clustering enormous numbers of sequences with Clusterize
Room: Lumière Auditorium
Format: Live from venue

  • Erik Wright, University of Pittsburgh, United States

Presentation Overview: Show

Exponential growth in the volume of biological sequences presents ever-increasing bioinformatics challenges. Clustering is often a first step in bioinformatics workflows to reduce sequences to more manageable numbers. Therefore, clustering presents a scalability bottleneck that is constrained by time and memory limitations. Here, I describe the development of Clusterize, a novel method for accurately clustering with linear time and memory complexity. The Clusterize algorithm linearizes the clustering problem through a process termed relatedness sorting. After linear-time relatedness sorting, each sequence only needs to be compared to fixed number of nearby sequences in the ordering and clustered if within the similarity threshold. I compare the performance of Clusterize relative to popular clustering programs, including CD-HIT, MMseqs, and UCLUST. Clusterize is able to quickly and accurately cluster tens of millions of homologous sequences, such as 16S amplicons, and non-homologous sequences, such as the UniProt database. Clusterize is far more accurate than another linear time clustering algorithm, Linclust, on many typical clustering tasks. Overall, Clusterize represents a novel approach for scaling clustering to new heights while assisting with the continual flood of biological sequences. Clusterize is part of the DECIPHER package for R available from the Bioconductor package repository.

Machine learning guides identification of virus antigen specificity based on deep T cell phenotypic profiles
Room: Lumière Auditorium
Format: Live from venue

  • Florian Schmidt, ImmunoScape Pte Ltd, Singapore
  • Hannah Fields, ImmunoScape Pte Ltd, United States
  • Yovita Purwanti, ImmunoScape Pte Ltd, Singapore
  • Ana Milojkovic, ImmunoScape Pte Ltd, Singapore
  • Syazwani Salim, ImmunoScape Pte Ltd, Singapore
  • Kan Xing Wu, ImmunoScape, Singapore
  • Yannick Simoni, ImmunoScape Pte Ltd, Singapore
  • Antonella Vitiello, ImmunoScape Pte Ltd, United States
  • Dan McLeod, ImmunoScape Pte Ltd, United States
  • Alessandra Nardin, ImmunoScape Pte Ltd, Singapore
  • Evan Newell, ImmunoScape Pte Ltd, Singapore
  • Katja Fink, ImmunoScape Pte Ltd, Singapore
  • Andreas Wilm, ImmunoScape Pte Ltd, Singapore
  • Michael Fehlings, ImmunoScape Pte Ltd, Singapore

Presentation Overview: Show

Following viral infection, the human immune system generates broad and dynamic CD8+ T cell responses to virus antigens. A characterization of such T cell responses allows to understand infection history and its contribution to protective immunity.

We performed in-depth profiling of CD8+ T cells reactive to CMV, EBV and Influenza virus derived antigens in peripheral blood samples from 114 healthy donors and 55 cancer patients using high-dimensional mass cytometry with combinatorial barcoding of peptide-MHC-I multimers and subsequent single cell RNA sequencing/VDJ-CITE-Seq for phenotypes and TCR repertoire analysis of identified antigen-specificities.

We analysed the expression of up to 138 surface markers from more than 500 antigen-specific T cell responses across six different HLA alleles by applying multiple machine learning approaches. Our data revealed unique phenotypic signatures of T cells specific for antigens from different virus categories. Based on these signatures, we built a ML approach to predict virus specificity from bulk CD8+ T cells. We validated our prediction capabilities in-silico using an independent sample cohort and also in-vitro by TCR expression in a Jurkat reporter assay. Our data suggest that machine learning can be used as a statistically rigorous and unbiased way to accurately predict antigen specificity from T cell phenotypes.

Proceedings Presentation: Privacy Preserving Population Stratification for Collaborative Genomic Research
Room: Lumière Auditorium
Format: Live-stream

  • Leonard Dervishi, Case Western Reserve University, United States
  • Wenbiao Li, Case Western Reserve University, United States
  • Anisa Halimi, IBM Research Europe, Ireland
  • Xiaoqian Jiang, UTHealth at Houston, United States
  • Jaideep Vaidya, Rutgers University, United States
  • Erman Ayday, Case Western Reserve University, United States

Presentation Overview: Show

The rapid improvements in genomic sequencing technology have led to the proliferation of locally collected genomic datasets. Given the sensitivity of genomic data, it is crucial to conduct collaborative studies while preserving the privacy of the individuals. However, before starting any collaborative research effort, the quality of the data needs to be assessed. One of the essential steps of the quality control process is population stratification: identifying the presence of genetic difference in individuals due to subpopulations. One of the common methods used to group genomes of individuals based on ethnicity is principal component analysis (PCA). In this paper, we propose a framework to perform population stratification using PCA across multiple collaborators in a privacy-preserving way. In our proposed client-server-based scheme, we initially let the server train a global PCA model on a publicly available genomic dataset which contains individuals from multiple populations. The global PCA model is later used to reduce the dimensionality of the local data by each collaborator (client). After adding noise to achieve local differential privacy (LDP), the collaborators send metadata (in the form of their local PCA outputs) about their research datasets to the server, which then aligns the local PCA results to identify the genetic differences among collaborators' datasets. Our results on real genomic data show that the proposed framework can perform population stratification with high accuracy while preserving the privacy of the research participants.

Proceedings Presentation: UNADON: Transformer-based model to predict genome-wide chromosome spatial position
Room: Lumière Auditorium
Format: Live from venue

  • Muyu Yang, Carnegie Mellon University, United States
  • Jian Ma, Carnegie Mellon University, United States

Presentation Overview: Show

The spatial positioning of chromosomes relative to functional nuclear bodies is intertwined with genome functions such as transcription, but the sequence patterns and epigenomic features that collectively influence chromatin spatial positioning in a genome-wide manner are not well understood. Here, we develop a new transformer-based deep learning model, called UNADON, that predicts the genome-wide cytological distance to a specific type of nuclear body, as measured by TSA-seq, using both sequence features and epigenomic signals. Evaluations of UNADON in four cell lines (K562, H1, HFFc6, HCT116) show high accuracy in predicting chromatin spatial positioning to nuclear bodies when trained on a single cell line. UNADON also performed well in an unseen cell type. Importantly, we reveal potential sequence and epigenomic factors that affect large-scale chromatin compartmentalization to nuclear bodies. Together, UNADON provides new insights into the principles between sequence features and large-scale chromatin spatial localization, which has important implications for understanding nuclear structure and function.

Proceedings Presentation: The impossible challenge of estimating non-existent moments of the Chemical Master Equation
Room: Lumière Auditorium
Format: Live from venue

  • Vincent Wagner, University of Stuttgart, Germany
  • Nicole Radde, University of Stuttgart, Germany

Presentation Overview: Show

Motivation: The Chemical Master Equation is a set of linear differential equations that describes the evolution of the probability distribution on all possible configurations of a (bio-)chemical reaction system. Since the number of configurations and therefore the dimension of the CME rapidly increases with the number of molecules, its applicability is restricted to small systems. A widely applied remedy for this challenge are moment-based approaches which consider the evolution of the first few moments of the distribution as summary statistics for the complete distribution.
Here, we investigate the performance of two moment-estimation methods for reaction systems whose equilibrium distributions encounter heavy-tailedness and hence do not possess statistical moments.
Results: We show that estimation via Stochastic Simulation Algorithm trajectories lose consistency over time and estimated moment values span a wide range of values even for large sample sizes. In comparison, the Method of Moments returns smooth moment estimates but is not able to indicate the non-existence of the allegedly predicted moments. We furthermore analyze the negative effect of a CME solution's heavy-tailedness on SSA run times and explain inherent difficulties.
While moment estimation techniques are a commonly applied tool in the simulation of (bio-)chemical reaction networks, we conclude that they should be used with care, as neither the system definition nor the moment estimation techniques themselves reliably indicate the potential heavy-tailedness of the CME's solution.

Leveraging the Genetic Correlation between Traits Improves the Detection of Epistasis in Genome-wide Association Studies
Room: Lumière Auditorium
Format: Live from venue

  • Julian Stamp, Brown University, United States
  • Alan Denadel, Brown University, United States
  • Daniel Weinreich, Brown University, United States
  • Lorin Crawford, Microsoft Research New England, United States

Presentation Overview: Show

In this study, we present the ``multivariate MArginal ePIstasis Test'' (mvMAPIT) --- a multi-outcome generalization of a recently proposed epistatic detection method which seeks to detect marginal epistasis or the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact --- thus, potentially alleviating much of the statistical and computational burden associated with conventional explicit search-based methods. Our proposed mvMAPIT builds upon this strategy by taking advantage of correlation structure between traits to improve the identification of variants involved in epistasis. We formulate mvMAPIT as a multivariate linear mixed model and develop a multi-trait variance component estimation algorithm for efficient parameter inference and P-value computation. Together with reasonable model approximations, our proposed approach is scalable to moderately sized GWA studies. With simulations, we illustrate the benefits of mvMAPIT over univariate (or single-trait) epistatic mapping strategies. We also apply mvMAPIT framework to protein sequence data from two broadly neutralizing anti-influenza antibodies and approximately 2,000 heterogenous stock of mice from the Wellcome Trust Centre for Human Genetics.

Proceedings Presentation: CellBRF: a feature selection method for single-cell clustering using cell balance and random forest
Room: Lumière Auditorium
Format: Live from venue

  • Yunpei Xu, Central South University, China
  • Hong-Dong Li, Central South University, China
  • Cui-Xiang Lin, Central South University, China
  • Ruiqing Zheng, Central South University, China
  • Yaohang Li, Old Dominion University, China
  • Jinhui Xu, State University of New York at Buffalo, United States
  • Jianxin Wang, Central South University, China

Presentation Overview: Show

Motivation: Single cell RNA-sequencing (scRNA-seq) offers a powerful tool to dissect the complexity of biological tissues through cell sub-population identification in combination with clustering approaches. Feature selection is a critical step for improving the accuracy and interpretability of single-cell clustering. Existing feature selection methods do not make full use of the information of cell-type-discriminating ability of genes. We hypothesize that incorporating such information could further boost the performance of single cell clustering.

Results: We develop CellBRF, a feature selection method that considers genes’ relevance to cell types for single-cell clustering. The key idea is to identify genes that are most important for discriminating cell types through random forests guided by predicted cell labels. Moreover, it proposes a class balancing strategy to mitigate the impact of unbalanced cell type distributions on feature importance evaluation. We benchmark CellBRF on thirty-three scRNA-seq datasets representing diverse biological scenarios, and demonstrate that it substantially outperforms state-of-the-art feature selection methods in terms of clustering accuracy and cell neighborhood consistency. Furthermore, we demonstrate the significantly outstanding performance of our selected features with three case studies on cell differentiation stage identification, non-malignant cell subtype identification, and rare cell identification. CellBRF provides a new and effective tool to boost single-cell clustering accuracy.

Availability and implementation: All source codes of CellBRF are freely available at

Inferring Sex-Specific Genetic Signal in Hypertension by Gene-Based Association Methods on UK-Biobank Data
Room: Lumière Auditorium
Format: Live from venue

  • Roei Zucker, hebrew university of jerusalem, Israel
  • Michael Kovalerchik, Hebrew University of Jerusalem, Israel
  • Michal Linial, The Hebrew University of Jerusalem, Israel

Presentation Overview: Show

Hypertension is a polygenic disease that affects over 1.2 billion adults aged 30–79 worldwide. It is a major risk factor for renal, cerebrovascular, and cardiovascular diseases. The heritability of hypertension is estimated to be high; nevertheless, the understanding of the underlying mechanisms remains scarce and incomplete. Using a novel method called PWAS (proteome-wide association study) on participants from the UK Biobank (UKB), we discovered 70 statistically significant associated genes, most of which failed to reach significance by the routine GWAS, which is variant-based. Our findings were validated against independent cohorts, including the Finnish Biobank, and confirmed a substantial fraction of the PWAS hypertension-associated genes. The gene-based analyses that were performed on both sexes separately revealed a sex-dependent genetic signal with a stronger component associated with females. Analysis of the measurements for systolic and diastolic blood pressure for the entire UKB cohort confirmed the dominant genetic contribution for females. In this study, we will demonstrate the advantage of applying gene-based association methods over the classical GWAS in interpretability and in identifying sex-specific genetic signals as a lead towards mechanistic understanding of hypertension and related phenotypes.

Genetic interactions between translesion DNA synthesis enzymes in cancer
Room: Lumière Auditorium
Format: Live from venue

  • Aleix Bayona-Feliu, Institute for Research in Biomedicine (IRB BARCELONA), Spain
  • Marcos Moreno, Institute for Research in Biomedicine (IRB BARCELONA), Spain
  • Sergio Marín-Edo, Institute for Research in Biomedicine (IRB BARCELONA), Spain
  • Marcel McCullough, Institute for Research in Biomedicine (IRB BARCELONA), Spain
  • Miguel-Martín Álvarez, Institute for Research in Biomedicine (IRB BARCELONA), Spain
  • Fran Supek, Institute for Research in Biomedicine (IRB BARCELONA), Spain

Presentation Overview: Show

Cancer-associated mutagenesis is accelerated by genome instability (GIN), a hallmark of cancer cells. GIN promotes cell transformation by alteration of cancer driver genes, and it may also facilitate tumor cells adaptation to therapy. The DNA repair machinery plays a master role preventing DNA damage and GIN, and indeed various DNA repair pathways are often deficient in cancer. One of these pathways is translesion DNA synthesis (TLS), which allows replication forks to bypass bulky DNA lesions, preventing gross chromosomal rearrangements but at the cost of introducing point mutations and indels. TLS is a conserved mechanism that diversified during evolution leading to multiple TLS enzymes with different, but overlapping specificities for DNA lesions in human cells. However, given their common ancestry, TLS polymerases perform redundant functions and can substitute for each other, making very difficult their study using single KOs. Here, we develop a double knock-out strategy using CRISPR-Cas12 to analyze pairwise deletions of TLS enzymes in human cells, and unveil interactions between them under a wide range of commonly used chemotherapeutic agents in lung adenocarcinoma and ovarian carcinoma cell models. Understanding genetic interactions between DNA repair enzymes under particular therapies may reveal new strategies aimed to target specifically cancer cells.

A Fair Experimental Comparison of Neural Network Architectures for Latent Representations of Multi-Omics for Drug Response Prediction
Room: Lumière Auditorium
Format: Live from venue

  • Tony Hauptmann, Johannes Gutenberg University of Mainz, Germany
  • Stefan Kramer, Johannes Gutenberg University of Mainz, Germany

Presentation Overview: Show

Recent years have seen a surge of novel neural network architectures for multi-omics integration. One important parameter is the integration depth: the point at which the latent representations are computed or merged, which can be early, intermediate, or late. The literature on integration methods grows steadily, however, close to nothing is known about the relative performance of these methods under fair experimental conditions and under consideration of different use cases. We developed a comparison framework that trains multi-omics integration methods under equal conditions. We incorporated four recent deep learning methods, early integration, PCA, and a novel method, Omics Stacking, that combines the advantages of intermediate and late integration. Experiments were conducted on a drug response data set with multiple omics data. Our experiments confirmed that early integration has the lowest predictive performance. Statistical differences can, overall, rarely be observed, however, in terms of the average ranks of methods, Super.FELT performed best in a cross-validation setting and Omics Stacking best on the external test set. When faced with a new data set, Super.FELT is a good option in the cross-validation setting as well as Omics Stacking in the external test set setting.

Pharmaceutical patent landscaping: A novel approach to understand patents from the drug discovery perspective
Room: Lumière Auditorium
Format: Live from venue

  • Yojana Gadiya, University of Bonn, Germany
  • Philip Gribbon, Fraunhofer ITMP, Germany
  • Martin Hofmann-Apititus, Fraunhofer SCAI, Germany
  • Andrea Zaliani, Fraunhofer ITMP, Germany

Presentation Overview: Show

Patents play a crucial role in the drug discovery process by providing legal protection for discoveries and incentivising investments in research and development. By identifying patterns within patent data resources, researchers can gain insight into the market trends and priorities of the pharmaceutical industries, as well as provide additional perspectives on more fundamental aspects such as the emergence of potential new drug targets. In this paper, we used the PEMT to integrate and analyse patent literature for rare diseases (RD) and Alzheimer's disease (AD). This is followed by a systematic review of the underlying patent landscape to decipher trends and applications in patents. We start by discussing organisations involved in R&D in AD and RD. This allows us to gain an understanding of the importance of AD and RD from specific organisational perspectives. Next, we analysed the historical focus of patents for therapeutic targets and correlated them with market scenarios allowing the identification of prominent targets for a disease. Lastly, we identified repurposed drugs within the two diseases with the help of patents. The study demonstrates the expanded applicability of patent documents from legal to drug discovery, design, and research, thus, providing a valuable resource for future drug discovery efforts.

CARMEN: a pan-HLA and pan-cancer proteogenomic database on antigen presentation to support cancer immunotherapy
Room: Lumière Auditorium
Format: Live from venue

  • Ashwin Adrian Kallor, International Center for Cancer Vaccine Science, University of Gdansk, Poland
  • Michał Waleron, International Center for Cancer Vaccine Science, University of Gdansk, Poland
  • Patricia Eugenio, LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal, Portugal
  • Catia Pesquita, LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal, Portugal
  • Daniel Faria, LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal, Portugal
  • Fabio Massimo Zanzotto, University of Rome Tor Vergata, Rome, Italy, Italy
  • Christophe Battail, French Alternative Energies and Atomic Energy Commission (CEA), Paris, France, France
  • Ajitha Rajan, University of Edinburgh, Edinburgh, United Kingdom, United Kingdom
  • Javier Alfaro, International Center for Cancer Vaccine Science, University of Gdansk, Poland

Presentation Overview: Show

Cancer immunotherapy has greatly improved the quality of life of cancer patients and it hinges on the discovery of novel cancer antigens that could be targeted to improve disease outcomes. We have developed a pan-cancer, pan-HLA, and pan-tissue database containing immunopeptidomics data mapped to transcriptomic, genomic, immunological and biochemical data. The database was generated from 77 different publicly available immunopeptidomics mass spectrometry datasets collected between 2015-2022 (73 cancer and 4 normal datasets), covering 15 different types of cancers and 152 different HLA-I alleles. The peptides contained in our database were obtained by a combination of closed, open and de novo searches using an in-house developed computational pipeline. Following rigorous false discovery rate estimation at 1% and a second-round search to eliminate any false signals that may not have been detected in the previous round of FDR estimation, we obtained a list of 11.2 million peptide-HLA combinations comprising both coding and non-coding regions of the genome as well as bacterial peptides. These peptides have been mapped to chromosomal coordinates to facilitate adoption by the genomics community of this useful resource on antigen presentation. Our database includes a FAIR knowledge graph which contextualizes and enriches the data.

Integrative Multi-Omics Analysis Reveals Novel Immune Subtypes of Colorectal Cancer
Room: Lumière Auditorium
Format: Live-stream

  • Chuling Hu, Department of General Surgery (Department of Colorectal Surgery), The Sixth Affiliated Hospital, Sun Yat-sen University, China
  • Du Cai, The Sixth Affiliated Hospital, Sun Yat-sen University, China
  • Weiqiang You, The Sixth Affiliated Hospital, Sun Yat-sen University, China
  • Junwei Liu, Guangzhou Laboratory, China
  • Cheng-Hang Li, The Hong Kong University of Science and Technology, China
  • Min-Yi Lv, The Sixth Affiliated Hospital, Sun Yat-sen University, China
  • Bao-Wen Gai, The Sixth Affiliated Hospital, Sun Yat-sen University, China
  • Jiaxin Lei, School of Medicine, Shenzhen Campus of Sun Yat-Sen University, China
  • Run Xian Wang, The Fifth Affiliated Hospital of Sun Yat-sen University, China
  • Xiao-Jian Wu, The Sixth Affiliated Hospital, Sun Yat-sen University, China
  • Feng Gao, The Sixth Affiliated Hospital, Sun Yat-sen University, China

Presentation Overview: Show

Background: Heterogeneity of tumor immune microenvironment accounts for differential prognosis and immunotherapy response among colorectal cancer (CRC) patients. Here, we developed novel immune subtypes through integrative multi-omics analysis to characterize CRC heterogeneity.

Methods: Immune-related gene expression, mutation, and methylation profiles were collected from TCGA (n = 627) to perform multi-omics factor analysis (MOFA) and establish the Multi-Omics Tumor Immune Features Clustering of CRC (MotifCC). Transcriptomic, genomic, and epigenetic landscapes were analyzed to characterize differences among MotifCC clusters. Independent validation of MotifCC was performed in our large-scale, in-house COCC cohort (Clinical Omics Study of Colorectal Cancer in China, n = 1001).

Results: The three MotifCC clusters showed distinct characteristics. Cluster1 was a high immune and stroma infiltration subtype with the worst prognosis. Cluster2 was characterized by the low immune infiltration, high metabolic intensity and the best survival. With medium immune infiltration and intermediate prognosis, Cluster3 was distinguished by the highest methylation and stemness status. Besides, Cluster3 was associated with intermediate prognosis but better response to immunotherapy.

Conclusion: We established the MotifCC, novel immune subtypes capturing the multi-omics heterogeneity of CRC and facilitating patient stratification for immunotherapy.

ProsperousPlus: An integrated platform for protease-specific substrate cleavage prediction and machine learning model construction of more than 100 proteases
Room: Lumière Auditorium
Format: Live-stream

  • Fuyi Li, Northwest A&F University, China
  • Jiangning Song, Monash University, Australia

Presentation Overview: Show

Proteases play a crucial role in various cellular processes, and the precise cleavage of substrates by proteases is essential for these processes to occur correctly. Accurately predicting substrate cleavage sites is a crucial step in understanding protease function and substrate specificity. Many bioinformatics methods have been developed to predict protease-specific substrate cleavage sites. However, with the development of mass spectrometry technology, an enormous amount of protease substrate cleavage data has been generated and will continually grow in the future. Consequently, it is not efficient and practical to train new models based on these rapidly accumulated data every year and update the prediction server for the wider community. In this study, we developed a multi-faceted, versatile bioinformatics tool, termed ProsperousPlus, that enables fast, accurate and high-throughput prediction of substrate cleavage sites for 110 proteases. Benchmarking tests show that ProsperousPlus achieves competitive predictive performance compared with state-of-the-art approaches. Furthermore, ProsperousPlus provides sought-after assistance for non-programming background users to build their customised in-house models and easily meet specific needs. It is anticipated that researchers with little bioinformatics expertise will be able to efficiently use rapidly accumulating substrate cleavage data to train in-house prediction models to meet their specific requirements.

Variant impact based patient similarity networks for cancer subtype analysis
Room: Lumière Auditorium
Format: Live from venue

  • Hakime Öztürk, DKFZ, Germany
  • Nagarajan Paramasivam, DKFZ, Germany
  • Simon Kreutzfeldt, NCT, Germany
  • Peter Horak, NCT, Germany
  • Christoph Heilig, NCT, Germany
  • Stefan Fröhling, NCT, Germany
  • Daniel Huebschmann, DKFZ, Germany
  • Oliver Stegle, DKFZ, Germany

Presentation Overview: Show

Computational methods that decipher rare and private somatic changes can provide critical insights into the underlying mechanisms of cancer development and progression. Identifying potential cancer subtypes that might be associated with diverse biological responses is a key first step to define target therapeutics.

Machine and deep learning (ML/DL) methods that use clinical and/or multi-omics data have been adopted for the identification of cancer subtypes. There also exists a growing collection of sequence-based ML/DL models that accurately predict different epigenetic traits (e.g. transcription factor binding), and allow for estimating the impact of individual somatic aberrations. The application of sequence-based MD/DL on a genome-wide scale enables augmenting somatic mutations by a model-based view that captures functionally relevant differences between individuals.

In this study, we adopt SEI, a sequence-based DL model that is trained to predict more than 21K different regulatory activities, to obtain mutation impact embeddings. We first identify mutations with strong impacts through investigating clusters of alternative and reference sequence embeddings. Then, mutation impact embeddings are utilized to generate a patient similarity network (PSN) for unsupervised identification of patient subgroups. The proposed approach provides a novel strategy of utilizing variant impact scores in PSNs for cancer subtyping.