ISMB/ECCB 2021 Special Sessions

Schedule subject to change

SST01: Representation learning in biology

Sunday, July 25 (11:00 – 12:20; 12:40 – 14:00; 14:20 – 15:20 UTC)

Organizers:

Christian Dallago, Technical University of Munich, Germany
Ananthan Nambiar, University of Illinois at Urbana-Champaign, United States
Ali Madani, Salesforce Research, United States
Peter Koo, Cold Spring Harbor Laboratory, United States

Biological data needs to be transformed in a meaningful representation to fully take advantage of computational techniques. Human curation of computable representations has long been preferred for analytical and predictive tasks on biological data. However, the hand-crafted expert approach poses challenges for the encoding of vast biological datasets of yet-uncharacterized biological processes. In recent years, advanced statistical approaches, such as deep learning, have shown great promise in delivering tools to learn computable representations of data. These approaches learn representations even in the absence of supervised understanding, and thus could be transformational in biology. Over the last few years, many representation learning approaches have been used to gain insights into biological processes using protein or genome sequences, chemical sequences, transcriptomics and proteomics data, and biomedical imaging.

In this special session we would like to give an overview of how representation learning has reached various disciplines in biology, and how it adds to the toolkit available to bio-researchers today. Additionally, we want to give a platform for machine learning experts to discuss the fundamental statistical tools necessary to learn useful representations. For this purpose, we invited prominent speakers who played a role in the advancement of representation learning in biology. The focus of the talks will cover the broad concept of representation learning in biology with particular use cases in areas like learning embeddings for protein sequences or transcriptomics. The target audience for this program are machine learning scientists interested in moving into the biological domain and biologists interested in understanding cutting-edge concepts in machine learning. Finally, we open the floor to emerging researchers by promoting submissions via lightning talks and a poster session. Find up to date details at https://representation.learning.bio

SST02: Computational Biology going Green

Sunday, July 25 (11:00 – 12:20; 12:40 – 14:00; 14:20 – 15:20 UTC)

Organizers:

Geoff Barton, University of Dundee, United Kingdom
Alex Bateman, EMBL-EBI, United Kingdom
Michael Inouye, University of Cambridge and Baker Institute, United Kingdom

We are living at a time of unprecedented human driven environmental change that threatens catastrophic changes for society and the environment. Our science has great potential to contribute to environmental science and biodiversity research outcomes. Globally, IT is thought to account for up to 7% of carbon dioxide emissions and this is likely to grow. Therefore, computationally biology is directly contributing to the climate crisis. In this special session we will draw in experts from across Computational Biology and High Performance Computing (HPC) to present on the challenges for making computation sustainable. This will cover computational efficiency, life cycle, local data centres, cloud and costs.

The following topics and questions would be considered within the scope of this special session:

  • How to design a green data centre
  • How do I know if my cloud provider is green?
  • Discussion of the size of required computation to solve a problem. Do you really need to compute across all the data?
  • Impact of GPU vs conventional computer hardware.
  • Short and high intensity (e.g. 10,000 cpus for 1 hour) computing vs longer lowintensity (laptop or few cores for days/weeks).
  • How green is your desktop?
  • What happens to hardware at the end of its life?
  • How often should I refresh my computer hardware?
  • How to influence your institution to consider the environmental impact of computing in data centre planning.

SST03: New developments in AI for Integrating imaging and genomic data

Tuesday, July 27 (11:00 – 12:20; 12:40 – 14:00; 14:20 – 15:20 UTC)

Organizers:

Olivier Gevaert, Stanford University, United States
William Hsu, UCLA, United States
Arvind Rao, University of Michigan, United States

Vast amounts of biomedical data are now routinely available for patients, ranging from radiographic images to clinical and genomic data, spanning multiple biological scales. AI and machine learning are increasingly used to enable pattern discovery to link such data for improvements in patient diagnosis, prognosis and tailoring treatment response. Yet, few studies focus on how to link different types of imaging and molecular data in synergistic ways, and to develop data fusion approaches for clinical decision support. This tutorial will describe considerations, approaches, software toolkits, and open challenges related to multi-omics, multi-modal and multi-scale data fusion of imaging and molecular data.

This special session will focus on emerging AI applications on spatial transcriptomics data, and the use of multi-modal data for clinical oncology. Bioinformaticists and computational biologists will get an in-depth overview of the different types of imaging data that exist, how to model them, and how they can be linked to genomic data.

SST04: Emerging gain-of-function mutations and their characterization by multi-omics network biology

Wednesday, July 28 (11:00 – 12:20; 12:40 – 14:00; 14:20 – 15:20 UTC)

Organizers:

Zeynep Coban-Akdemir, The University of Texas Health Science Center at Houston, United States
Stephen Yi, The University of Texas at Austin, United States

  • Quantitative and analytical technologies to understand cellular networks and their rewiring by mutations
  • Artificial intelligence and machine learning to bridge genotypes and phenotypes in health and disease
  • Computational modeling of the functional impact of sequence variations involved in disease development and progression
  • Prediction and annotation of driver genomic aberrations leading to changes in cellular functions
  • Prioritization of QTLs that modulate gene regulation and cellular decisions
  • Distinguishing RNA versus protein expression/regulation based on genetic variants by multi-omics networks
  • Modeling signaling network perturbation and dynamics based on structural genomics and proteomics

Traditionally, disease causal mutations were thought to disrupt gene function. However, it becomes more clear that many deleterious mutations could exhibit a 'gain-of-function' (GOF) behavior1. Systematic investigation of such mutations has been lacking and largely overlooked. Advances in next-generation sequencing have identified thousands of genomic variants that perturb the normal functions of proteins, further contributing to diverse phenotypic consequences in disease2. Elucidating the functional pathways rewired by GOF mutations will be crucial for prioritizing disease-causing variants and their resultant therapeutic liabilities3. In distinct cell types (with varying genotypes), precise signal transduction controls cell decision, including gene regulation and phenotypic output4. When signal transduction goes awry due to GOF mutations, it would give rise to various disease types. Quantitative and molecular technologies are in demand to understand cellular networks and their perturbations by GOF mutations, bridging genotype and phenotype in health and disease5. This may provide explanations for 'missing heritability' in previous genome-wide association studies. We envision that it will be instrumental to push current paradigm towards a thorough functional and quantitative modeling of all GOF mutations and their mechanistic molecular events involved in disease development and progression. Many fundamental questions pertaining to genotype-phenotype relationships remain unresolved. For example, what are common types of genomic aberrations leading to GOF? how do interaction networks undergo rewiring upon GOF mutations? Which GOF mutations are key for gene regulation and cellular decisions? What are the GOF mechanisms at the RNA and protein regulation levels? Is it possible to leverage GOF mutations to reprogram signal transduction in cells, aiming to cure disease? To begin to address these questions, in this special session, we will cover a wide range of topics regarding GOF disease mutations and their characterization by multi-omic networks. We highlight the fundamental function of GOF mutations and discuss the potential mechanistic effects in the context of signaling networks 6,7. We also discuss advances in bioinformatic and computational resources, which will dramatically help with studies on the functional and phenotypic consequences of GOF mutations.

Together, this special session leads to an emerging area in computational biology, and is becoming an important area of research in the future. The session is innovative because it will provide unique insights in prioritizing driver functional GOF disease mutations, and uncovering individualized molecular mechanisms. Furthermore, it is significant because it will greatly facilitate the functional annotation of a large number of GOF mutations, providing a fundamental link between genotype and phenotype in human disease.

References

1 Li, Y., Zhang, Y., Li, X., Yi, S. & Xu, J. Gain-of-Function Mutations: An Emerging Advantage for Cancer Biology. Trends Biochem Sci 44, 659-674,(2019). PMID:31047772.
2 Yi, S. et al. Functional variomics and network perturbation: connecting genotype to phenotype in cancer. Nat Rev Genet 18, 395-410,(2017). PMID: 28344341. PMC6020840.
3 Sahni, N. et al. Widespread macromolecular interaction perturbations in human geneticdisorders. Cell 161, 647-660,(2015). PMID: 25910212. PMC4441215.
4 Li, Y. et al. Gene Regulatory Network Perturbation by Genetic and Epigenetic Variation. Trends Biochem Sci 43, 576-592, (2018). PMID: 29941230. PMC6215597.
5 Latysheva, N. S. et al. Molecular Principles of Gene Fusion Mediated Rewiring of Protein Interaction Networks in Cancer. Mol Cell 63, 579-592,(2016). PMID: 27540857. PMC5003813.
6 Coban-Akdemir, Z. et al. Identifying Genes Whose Mutant Transcripts Cause Dominant Disease Traits by Potential Gain-of-Function Alleles. Am J Hum Genet 103, 171- 187,(2018). PMID: 30032986. PMC6081281.
7 Litchfield, K. et al. Escape from nonsense-mediated decay associates with anti-tumor immunogenicity. Nat Commun 11, 3800,(2020). PMID: 32733040. PMC7393139.

SST05: Single Cell and Spatial Data Analysis

Friday, July 30 (11:00 – 12:20; 12:40 – 14:00; 14:20 – 15:20 UTC)

Organizers:

Malte Luecken, Helmholtz Center Munich, Germany
Shyam Prabhakar, Genome Institute of Singapore
Florian Schmidt, Genome Institute of Singapore
Fabian Theis, Helmholtz Center Munich, Germany
Barbara Treutlein, ETH Zürich, Switzerland

Since its first usage in 2009, single cell technology (SCT) has revolutionized our understanding of molecular biology. We obtained unprecedented resolution on cellular identity, diversity, development, and function. The rapid advancement of SCT in the past decade renders SCT to be the default approach to profile transcriptomes, proteomes and more recently also genomes as well as epigenomes. With the emergence of multimodal assays to profile various “omics” within a single cell, SCTs have crossed the next frontier in enabling new biological discoveries not only in basic science but also with respect to clinical applications. The latter will tremendously benefit from single cell perturbation experiments. They provide unique insights on cell-cell signaling and interaction, which are critical for translational applications. Spatial methods such as single-cell in situ sequencing and imaging-based methods add a new dimension to the study of single cells in diverse tissues, which will open new opportunities, for instance in cancer biology and neuroscience. Ultimately, it is increasingly clear that SCTs will be key to bringing personalized and precision medicine into the clinic.

However, to leverage the unique advantages of single cell data, it must be analysed in a scalable, yet robust and interpretable way. Due to the diversity, technical biases and high level of noise in single cell data, software and algorithm development are challenging. For instance, the best clustering strategy for a specific transcriptomics data type might prove to be not at all usable for single cell chromatin accessibility data. Even within a single data modality, the experimental strategy greatly influences data analysis. Methods that integrate datasets to build reference atlases that overcome limitations of individual experiments again pose new questions on what is meaningful variation versus unwanted noise between datasets. Recently, high throughput assays for multi-modal and/or spatial characterization of single cells have added yet another layer of complexity. The development of integrative data analysis methods is thus essential to obtaining biological insights that can be translated into a product for diagnostic or personalized / precision medicine strategy.