Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

General Computational Biology

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in UTC
Thursday, July 29th
11:00-11:20
Unifed Methods for Feature Selection in Large-Scale Genomic Studies with Censored Survival Outcomes
Format: Pre-recorded with live Q&A

Moderator(s): Xuegong Zhang

  • Lauren Spirko-Burns, Temple University, United States
  • Karthik Devarajan, Fox Chase Cancer Center, United States

Presentation Overview: Show

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous datasets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards (PH), which is unlikely to hold for each feature. When applied to genomic features exhibiting some form of non-proportional hazards (NPH), these methods could lead to an under- or over-estimation of the effects. We propose a broad array of marginal screening techniques that aid in feature ranking and selection by accommodating various forms of NPH. We evaluate the performance of our measures using extensive simulation studies and publicly available datasets in cancer genomics. We demonstrate that the proposed methods successfully address the issue of NPH in genomic feature selection and outperform existing methods.

11:20-11:40
Revealing cell-to-cell variability changes in the aging immune cells by applying accurate gene expression variability metric
Format: Pre-recorded with live Q&A

Moderator(s): Xuegong Zhang

  • Atefeh Taherian Fard, University of Queensland, Australia
  • Jessica Mar, The University of Queensland, Australia
  • Huiwen Zheng, The University of Queensland, Australia
  • Xiao Dong, Department of Genetics, Albert Einstein College of Medicine, United States
  • Jan Vijg, Department of Genetics, Albert Einstein College of Medicine;, United States

Presentation Overview: Show

During ageing, cell-to-cell variability has been shown to increase in multiple organs and tissues which reflects the heterogeneity that results from stochastic cell to cell variation. Although the concept of cell-to-cell variability is not new, different metrics are being used to measure this and it is unclear what the optimal approach is.

We conducted a systematic evaluation of 12 mostly applied metrics to access the performance of the metrics with both simulated and experimentally derived scRNA-seq datasets. We investigated the performance of these metrics against data structures, stably expressed genes and other properties. The best performing metric is applied to publicly available aging datasets to investigate the cell-to-cell variability changes in immune cells. As a result, we identified lists of differentially varied genes within and across cell types that showed distinct biological functions in aging. We also identified cell-type-specific transcription factors then constructed regulatory networks based on them for different age groups. The connectivity in the networks significantly changed with co-expressed genes, which directionality matched with the cell-to-cell variability changes in aging immune cells. Through these analyses, we highlight the importance of capturing the cell-to-cell variability in the complex biological process and its specificity at the cell type level.

11:40-12:00
Unlocking insights into cellular senescence through single cell transcriptomics of ageing mesenchymal stem cells
Format: Pre-recorded with live Q&A

Moderator(s): Xuegong Zhang

  • Atefeh Taherian Fard, Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Australia
  • Jessica Mar, Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Australia

Presentation Overview: Show

Cellular senescence acts to protect against cancer, and other fundamental biological processes such as development, tissue repair, and ageing. Having a clear understanding of the molecular mechanisms that define the progression of senescence is critical to identifying any new therapeutic strategies that impact age-related diseases. The recent advances in single cell (sc) technologies have helped to understand the regulatory mechanisms and modulators of single cells. The application of these technologies have the potential to unlock insights into cellular senescence in different tissue and cell types. Here for the first time, sc RNA-seq data was generated to investigate the gene expression heterogeneity of MSCs undergoing replicative senescence. We computationally characterised different MSCs sub-populations at the different stages of cell cycle, compared the transcription profile of cells going from a proliferative to a senescent state and identified the key factors driving this transitional process. We found that, there are atleast three different senescent phenotypes in the aging MSCs. Using novel computational methods and statistical approaches for sc RNA-seq data analysis, we identified senescent phenotypes that are linked to SASP, oncogene- and SASP-induced senescence escapees, revealing a level of previously unappreciated heterogeneity associated with the senescent phenotype.

12:00-12:20
Pathway-primed explainable neural network for scRNA-Seq data
Format: Pre-recorded with live Q&A

Moderator(s): Xuegong Zhang

  • Carlos Loucera, Clinical Bioinformatics Area, Fundacion Progreso y Salud (FPS), Spain
  • Joaquin Dopazo, Clinical Bioinformatics Area, Fundacion Progreso y Salud (FPS), Spain
  • Pelin Gundogdu, Clinical Bioinformatics Area, Fundacion Progreso y Salud (FPS), Spain
  • Inmaculada Álamo-Álvarez, Clinical Bioinformatics Area, Fundacion Progreso y Salud (FPS), Spain
  • Isabel A. Nepomuceno-Chamorro, Dpto. Lenguajes y Sistemas Informáticos, Universidad de Sevilla, Spain

Presentation Overview: Show

In this work, we propose an intelligible pathway-driven neural network for correctly solving cell-type related problems at single-cell resolution while providing a biologically meaningful representation. The network architecture is conditioned on biological priors (knowledge-primed layers) extracted from well-known curated sources: the KEGG metabolic and signalization pathway database. Although the NN is trained under a supervised learning scenario, it can also be employed for unsupervised tasks by making use of the representation of the data learned by the intermediate layers (feature learning). Furthermore, thanks to the biological priors, the interpretability of the network is improved at the same time that the dimensionality of the scRNA-seq data is reduced. Finally, the encoded representation is also used to train anomaly detection algorithms which allow us to detect abnormal or unseen cell types.
In order to illustrate the capabilities of our model, we have compared the performance of our method against popular cell-type identification methods using a diverse set of problems: supervised cell-type annotation, unsupervised clustering visualization, unknown cell-type identification and biological interpretation. Our model obtains competitive results in all areas, while providing a functional interpretation of the data by means of the activation scores of the knowledge-primed layers.

12:40-13:00
scGNN: a novel graph neural network framework for single-cell RNA-Seq analyses
Format: Pre-recorded with live Q&A

Moderator(s): Xuegong Zhang

  • Juexin Wang, University of Missouri, United States
  • Anjun Ma, Ohio State University, United States
  • Qin Ma, Ohio State University, United States
  • Dong Xu, Univ. of Missouri-Columbia, United States

Presentation Overview: Show

Single-cell RNA-sequencing (scRNA-Seq) is widely used to reveal the heterogeneity and dynamics of tissues, organisms, and complex diseases, but its analyses still suffer from multiple grand challenges, including the sequencing sparsity and complex differential patterns in gene expression. We introduce the scGNN (single-cell graph neural network) to provide a hypothesis-free deep learning framework for scRNA-Seq analyses. This framework formulates and aggregates cell–cell relationships with graph neural networks and models heterogeneous gene expression patterns using a left-truncated mixture Gaussian model. scGNN integrates three iterative multi-modal autoencoders and outperforms existing tools for gene imputation and cell clustering on four benchmark scRNA-Seq datasets. In an Alzheimer’s disease study with 13,214 single nuclei from postmortem brain tissues, scGNN successfully illustrated disease-related neural development and the differential mechanism. scGNN provides an effective representation of gene expression and cell–cell relationships. It is also a powerful framework that can be applied to general scRNA-Seq analyses.

13:00-13:20
Towards an integrative multi-omics workflow
Format: Pre-recorded with live Q&A

Moderator(s): Xin Gao

  • Florian Jeanneret, Université Paris-Saclay, CEA, List, F-91120, Palaiseau, France., France
  • Stéphane Gazut, Université Paris-Saclay, CEA, List, F-91120, Palaiseau, France., France

Presentation Overview: Show

The advent of high-throughput techniques has greatly enhanced biological discovery. Last years, analysis of multi-omics data has taken the front seat to improve physiological understanding. Handling functional enrichment results from various biological data raises practical questions.

We propose an integrative workflow, wrapped in the Bioconductor R package multiSight, to better interpret biological process insights in a multi-omics approach. In this work, we present this workflow applied to breast cancer data from The Cancer Genome Atlas (TCGA) related to Invasive Ductal Carcinoma (IDC) and Invasive Lobular Carcinoma (ILC). Pathway enrichment by Over Representation Analysis (ORA) and Gene Set Enrichment Analysis (GSEA) has been conducted with both features' information from differential expression analysis (DEA) or selected features from multi-block sPLS-DA methods. Then, comprehensive comparisons of enrichment results have been carried out by looking at classical enrichment analysis, probabilities pooling by Stouffer's Z scores method and pathway clustering into biological themes.

Our work shows that ORA enrichment with selected sPLS-DA features and pathways probabilities pooling by Stouffer's method lead to enrichment maps highly associated to the physiological knowledge of the IDC or ILC phenotypes, better than ORA and GSEA with differential expression driven features.

13:20-13:40
Topological Strategies for the Analysis of Rhythmic Dynamics in Transcriptomic Time-Series Data
Format: Pre-recorded with live Q&A

Moderator(s): Xin Gao

  • Elan Ness-Cohn, Northwestern University, United States
  • Rosemary Braun, Northwestern University, United States

Presentation Overview: Show

The circadian clock drives the oscillatory expression of thousands of genes across all tissues and bears significant implications for human health. RNA-seq timeseries experiments interrogate the mechanistic links between transcriptional rhythms and phenotypic outcomes. Analysis methods must overcome the challenges of sparse temporal sampling, noisy data, and non-strictly periodic dynamics.

We present two complementary methods to overcome these challenges: “TimeCycle” detects oscillatory dynamical components in noisy, sparsely sampled data; and “TimeChange” quantifies how gene rhythms change across experimental conditions. Methods leverage a data transformation technique known as time-delay embedding to reconstruct the underlying state space for each gene-of-interest. Takens’ embedding theorem implies that rhythmic dynamics will exhibit circular patterns in the embedded space. “TimeCycle” quantifies the circularity of the embedding using persistent homology, an algebraic method for discerning the topological features of data. The persistence scores are compared to a biologically-informed null model that considers RNA transcription and degradation rates to identify cycling genes. “TimeChange” nonparametrically compares the distributions of points in the embedded space to assess whether the topological structures differ significantly between phenotypes, thereby quantifying differences in transcriptional dynamics without requiring knowledge of the underlying model.

We demonstrate each method’s accuracy and reliability using synthetic and real data.

13:40-14:00
Predicting T Cell Activation for Mutational Epitopes
Format: Pre-recorded with live Q&A

Moderator(s): Xin Gao

  • Emilio Dorigatti, Helmholtz Zentrum München, Ludwig Maximilian Universität München, Germany
  • Felix Drost, Helmholtz Zentrum München, Technische Universität München, Germany
  • Adrian Straub, Technische Universität München, Germany
  • Philipp Hilgendorf, Technische Universität München, Germany
  • Kilian Schober, Universitätsklinikum Erlangen, Germany
  • Dirk Busch, Technische Universität München, Germany
  • Benjamin Schubert, Helmholtz Zentrum München, Technische Universität München, Germany

Presentation Overview: Show

The recognition of pathogen- and tumor-derived epitopes by T cells plays a crucial role in the adaptive immune system. Therefore, understanding and predicting the binding between T cell receptors (TCRs) and epitopes will greatly benefit the development of novel immunotherapies. However, this task remains challenging as there can be more than 1e20 TCRs in nature and every human harbors at least 1e7 different TCRs. In this work, we analyzed a novel dataset of experimentally determined T cell activation for 36 different murine TCRs in response to systematic single-amino acid mutations of the widely used model epitope SIINFEKL. This comprehensive dataset of 5,472 unique pMHC-TCR interactions allowed us to build random forest models that predict T cell activation accurately within- and across-TCRs. Predictive performance was consistently high in a qualitative and quantitative manner, and was further validated by spatial modelling of the TCR-epitope complex, as the predictive model showed high sensitivity to epitope positions which were identified to lie in close proximity to the TCR.

14:20-14:40
MoSwA: Protein Sequence Diversity Motif Switch Analyser for Viruses
Format: Pre-recorded with live Q&A

Moderator(s): Xin Gao

  • Asif M. Khan, Perdana University, Kuala Lumpur, Malaysia / Bezmialem Vakif University, Beykoz, Istanbul, Turkey, Turkey
  • Muhammet Celik, Bezmialem Vakif University, Istanbul, Turkey / Konya Food and Agriculture University, Konya, Turkey, Turkey
  • Kaushal Kumar Singh, Quantum Cipher Private Limited, New Delhi, India, India
  • Shan Tharanga, Centre for Bioinformatics, School of Data Sciences, Perdana University, Kuala Lumpur, Malaysia, Malaysia

Presentation Overview: Show

Protein sequence diversity is one of the major challenges in the design of interventions against viruses. Shannon’s entropy has been used as a quantitative measure of protein sequence diversity, applied via a user-defined k-mer sliding window. Studies have classified distinct k-mer peptides at a given position into diversity motifs based on their incidence: index (predominant sequence), major (most common) variant, unique (singleton) and minor (incidence between major and unique). Motif switching at a given k-mer alignment position is a phenomenon where fitness change in one or more amino acids, such as through mutations, changes the incidence of a given k-mer sequence across its overlapping positions, resulting in a sequence rank change, and thus, a motif change. Identifying k-mer positions that exhibited a motif switch and determining the nature of the switches was a challenge given the large combination of switches that are possible and their omnipresence. Herein, we present MoSwA (https://github.com/macelik/MoSwA), a tool that not only identifies all alignment k-mer positions that exhibit motif switching, but also provides a multi-faceted and extensive characterisation of the switches. The input to MoSwA is a protein multiple sequence alignment and enables a comparative analyses of motif switches within and between viral species proteomes.

14:40-15:00
Design of protein variants with desirable properties using Deep Mutational Scanning and Machine Learning approaches
Format: Pre-recorded with live Q&A

Moderator(s): Xin Gao

  • David Medina, Centre for Biotechnology and Bioengineering, Department of Chemical Engineering and Biotechnology, University of Chile, Chile
  • Alvaro Olivera-Nappa, Centre for Biotechnology and Bioengineering, Department of Chemical Engineering and Biotechnology, University of Chile, Chile

Presentation Overview: Show

Designing proteins with desirable properties is one of the most significant biotechnological challenges. Rational design or directed evolution have been widely used. However, they present difficulties or limitations. Deep Mutational Scanning has generated an alternative to the two classic approaches. Nevertheless, it requires computational strategies and methods to extract the information. Recently, all design techniques have incorporated Machine Learning methods into their protocols. Despite the significant advances developed, the root problem persists as there are no computational tools that facilitate design in a simple way and with high precision. In addition to this problem, we propose a new approach that combines physicochemical properties and digital signal processing as a protein representation strategy and assembled learning to train predictive models with high performance. The design component considers the aspects of co-evolution through epistatic models and thermodynamics. Finally, we incorporate an optimization system based on genetic algorithms to generate the sequences according to the desirable properties. Our approach has been tested to improve the stability of glycoproteins and enantioselectivity in epoxy hydrolase proteins, achieving predictive performance more significant than 87% precision, demonstrating the great utility of the proposed method and the advantages of combining different points of view to represent a problem.

15:00-15:20
Enhancing Protein-Protein Interaction Prediction with Deep Learning
Format: Live-stream

Moderator(s): Xin Gao

  • Wei Wang, University of California, Los Angeles, United States

Presentation Overview: Show

Protein–protein interaction (PPI) prediction represents a fundamental computational biology problem. To address this problem, extensive research efforts have been made to extract predefined features from the sequences. Based on these features, statistical algorithms are learned to classify the PPIs. However, such explicit features are usually costly to extract, and typically have limited coverage on the PPI information. We present an end-to-end framework for PPI predictions, which incorporates a deep neural network architecture that leverages both robust local features and contextualized information, which are significant for capturing the mutual influence of proteins sequences. It relieves the data pre-processing efforts that are required by other systems, and generalizes well to different application scenarios. Moreover, it shows a promising performance on more challenging problems of interaction type prediction and binding affinity estimation, where existing approaches fall short.



International Society for Computational Biology
525-K East Market Street, RM 330
Leesburg, VA, USA 20176

ISCB On the Web

Twitter Facebook Linkedin
Flickr Youtube