Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

RegSys COSI

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in UTC
Monday, July 26th
11:00-11:40
COSI REGSYS KEYNOTE: Evolution of epigenomes towards drug resistance in breast cancer
Format: Live-stream

Moderator(s): Judith Zaugg

  • Céline Vallot
11:40-12:00
Design and power analysis for multi-sample single cell transcriptomics experiments
Format: Pre-recorded with live Q&A

Moderator(s): Judith Zaugg

  • Katharina T. Schmid, Institute of Computational Biology, Helmholtz Zentrum München, Germany
  • Cristiana Cruceanu, Department of Translational Research, Max Planck Institute for Psychiatry, Munich, Germany
  • Anika Böttcher, Institute of Diabetes and Regeneration Research, Helmholtz Diabetes Center, Helmholtz Zentrum München, Germany
  • Heiko Lickert, Institute of Diabetes and Regeneration Research, Helmholtz Diabetes Center, Helmholtz Zentrum München, Germany
  • Elisabeth B. Binder, Department of Translational Research, Max Planck Institute for Psychiatry, Munich, Germany
  • Fabian J. Theis, Institute of Computational Biology, Helmholtz Zentrum München, Germany
  • Matthias Heinig, Institute of Computational Biology, Helmholtz Zentrum München, Germany

Presentation Overview: Show

Single cell RNA-seq has revolutionized transcriptomics by providing cell type resolution for interindividual differential gene expression and expression quantitative trait loci analyses. However, efficient power analysis methods accounting for the characteristics of single cell data and interindividual comparison are missing.
We present a statistical framework for design and power analysis of multi-sample single cell transcriptomics experiments. The model relates sample size, number of cells per individual and sequencing depth to the power of detecting differentially expressed genes and expression quantitative trait loci genes within cell types. The overall power is decomposed into the probability of detecting the expression in sparse single cell RNA-seq and the power of the statistical test.
The estimated power of our model was supported by simulation-based methods, however requiring drastically less runtime and memory. It thus enables fast systematic comparison of alternative experimental designs and optimization for a limited budget. We evaluated data driven priors for a range of applications and single cell platforms. In many settings, shallow sequencing of high numbers of cells leads to higher overall power than deep sequencing of fewer cells.
The model including priors is implemented as an R package scPower available on github and is accessible as a web tool.

12:00-12:20
Single-cEll Marker IdentificaTiON by Enrichment Scoring
Format: Pre-recorded with live Q&A

Moderator(s): Judith Zaugg

  • Anna Hendrika Cornelia Vlot, Berlin Institute for Medical Systems Biology, Germany
  • Setareh Maghsudi, University of Tübingen, Germany
  • Uwe Ohler, Berlin Institute for Medical Systems Biology, Germany

Presentation Overview: Show

Cell idenity marker identification from single-cell omics data commonly consists of differential testing between cell clusters. The assignment of cells to clusters is nontrivial and often requires prior knowledge. Yet, cluster assignment unertainities are not generally taken into account. In response, we present SEMITONES, a method for cluster-independent marker identification. This method identifies marker genes of potentially overlapping neighbourhoods using a linear regression framework quantifying feature selectivity to a certain cell neighbourhood. SEMITONES is implemented in Python 3 and freely available on GitHub (www.github.com/ohlerlab/SEMITONES). In healthy human haematopoiesis single-cell RNA-sequencing, SEMITONES accurately identifies known and potential novel marker genes, inclduing known markers not identified by Seurat v3 or the cluster-inedpendent differential expression testing method singleCellHaystack. SEMITONES also outperforms competitors on simulated scRNA-seq data (SEMITONES AUROC 0.78, others 0.54-0.73). Further applications of the method include the construction of co-enrichment graphs, identification of cis-regulatory regions from single-cell ATAC-seq data, and identification of spatially restricted markers from spatial transcriptomics data. Finally, SEMITONES can be used for the inverse problem, i.e. the annotation of cells based on significant enrichment of known marker genes. Overall, SEMITONES provides a flexible and accurate framework for cluster-independent marker identification from diverse single-cell omics data.

12:40-13:20
COSI REGSYS KEYNOTE: Learning large-scale perturbation effects in single cell genomics
Format: Live-stream

Moderator(s): Uwe Ohler

  • Fabian Theis
13:20-13:40
Iterative single-cell multi-omic integration using online learning
Format: Pre-recorded with live Q&A

Moderator(s): Uwe Ohler

  • Chao Gao, University of Michigan, Ann Arbor, MI, USA., United States
  • Jialin Liu, University of Michigan, Ann Arbor, MI, USA., United States
  • April Kriebel, University of Michigan, Ann Arbor, MI, USA., United States
  • Sebastian Preissl, University of California San Diego, La Jolla, CA, USA., United States
  • Chongyuan Luo, The Salk Institute for Biological Studies, La Jolla, CA, USA., United States
  • Rosa Castanon, The Salk Institute for Biological Studies, La Jolla, CA, USA., United States
  • Justin Sandoval, The Salk Institute for Biological Studies, La Jolla, CA, USA., United States
  • Angeline Rivkin, The Salk Institute for Biological Studies, La Jolla, CA, USA., United States
  • Joseph Nery, The Salk Institute for Biological Studies, La Jolla, CA, USA., United States
  • Margarita Behrens, The Salk Institute for Biological Studies, La Jolla, CA, USA., United States
  • Joseph Ecker, The Salk Institute for Biological Studies, La Jolla, CA, USA., United States
  • Bing Ren, University of California San Diego, La Jolla, CA, USA., United States
  • Joshua Welch, University of Michigan, Ann Arbor, MI, USA., United States

Presentation Overview: Show

Integrating large single-cell gene expression, chromatin accessibility and DNA methylation datasets requires general and scalable computational approaches. Here we describe online integrative non-negative matrix factorization (iNMF), an algorithm for integrating large, diverse and continually arriving single-cell datasets. Our approach scales to arbitrarily large numbers of cells using fixed memory, iteratively incorporates new datasets as they are generated and allows many users to simultaneously analyze a single copy of a large dataset by streaming it over the internet. Iterative data addition can also be used to map new data to a reference dataset. Comparisons with previous methods indicate that the improvements in efficiency do not sacrifice dataset alignment and cluster preservation performance. We demonstrate the effectiveness of online iNMF by integrating more than 1 million cells on a standard laptop, integrating large single-cell RNA sequencing and spatial transcriptomic datasets, and iteratively constructing a single-cell multi-omic atlas of the mouse motor cortex.

13:40-14:00
Motif syntax determinants of single-cell chromatin dynamics in human somatic cell reprogramming
Format: Pre-recorded with live Q&A

Moderator(s): Uwe Ohler

  • Surag Nair, Stanford University, United States
  • Mohamed Ameen, Stanford University, United States
  • Laksshman Sundaram, Stanford University, United States
  • Akshay Balsubramani, Stanford University, United States
  • Glenn Markov, Stanford University, United States
  • David Burns, Stanford University, United States
  • Helen Blau, Stanford University, United States
  • Kevin Wang, Stanford University, United States
  • Anshul Kundaje, Stanford University, United States

Presentation Overview: Show

Ectopic induction of the Yamanaka factors— OCT4, SOX2, KLF4, and MYC (OSKM) in somatic cells initiates a multi-phasic process that culminates in the conversion of a fraction of starting cells into an embryonic stem cell (ESC) like state. To overcome challenges of low efficiency and high heterogeneity, we profile the chromatin and expression dynamics at single-cell resolution over a time course of human fibroblasts induced with OSKM. We train neural networks to learn predictive regulatory sequence models of base-resolution chromatin accessibility profiles from each of the cell states across the reprogramming pseudotime trajectories. Locus level interpretation of cell state-specific deep learning models yields dynamic motif syntax maps of each CRE that often showcase different repertoires of cooperative transcription factors regulating CREs at different stages of reprogramming. Synthesizing motif syntax maps across modules of cis-regulatory elements (CREs) highlights previously underappreciated sequence determinants— a new class of anti-accessibility motifs whose presence is predictive of lower accessibility, and low-affinity OSK motifs in transient CREs. Collectively, our data and analysis refine our understanding of human somatic cell reprogramming by providing a detailed resource that links stage-specific combinatorial transcription factor activity across dynamic CREs with gene regulation that results in distinct non-reprogramming and reprogramming trajectories.

14:20-14:40
Identification of transcription factor co-binding partners using non-negative matrix factorization
Format: Pre-recorded with live Q&A

Moderator(s): Alejandra Medina-Rivera

  • Ieva Rauluseviciute, Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway, Norway
  • Timothée Launay, Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway, Norway
  • Jaime A Castro-Mondragon, Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway, Norway
  • Anthony Mathelier, Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway, Norway

Presentation Overview: Show

Transcription factor (TF) binding to DNA is key to transcription regulation. While the binding properties of many individual TFs are well known, there is limited understanding on how TFs interact with DNA cooperatively, either forming dimers or co-binding to the same region. Such combinatorial binding of TFs is important to cell differentiation, development, and responses to external stimuli. We present COBIND, a novel method based on non-negative matrix factorization (NMF) to automatically reveal co-appearing binding motifs and infer co-binding TF partners. Specifically, NMF is applied to one-hot encoded regions flanking direct TF-DNA interactions from UniBind, which are used as anchors to identify non-redundant co-appearing motifs at fixed distances. Using motif similarity and protein-protein interaction knowledge, COBIND culminates with the identification of co-binding TF partners and their underlying binding grammar. We applied COBIND to 6,237 TFBS datasets for 404 TFs in 7 species. The method uncovers well known co-binding events (e.g. SOX2-POU5F1 and CTCF-ZBTB3) together with new co-binding configurations not yet reported in the literature. We show that co-binding configurations are usually recurrent within TF families and tend to be more evolutionarily conserved than individual binding sites for several TFs.

14:40-15:00
Proceedings Presentation: Resolving diverse protein-DNA footprints from exonuclease-based ChIP experiments
Format: Pre-recorded with live Q&A

Moderator(s): Alejandra Medina-Rivera

  • Anushua Biswas, CSIR-National Chemical Laboratory, India
  • Leelavati Narlikar, CSIR-National Chemical Laboratory, India

Presentation Overview: Show

High-throughput chromatin immunoprecipitation (ChIP) sequencing-based assays capture genomic regions associated with the profiled transcription factor (TF). ChIP-exo is a modified protocol, which uses lambda exonuclease to digest DNA close to the TF-DNA complex, in order to improve on the positional resolution of the TF-DNA contact. Because digestion occurs in the 5′–3′ orientation, it produces directional footprints near the complex, on both sides of the DNA. Like all ChIP-based methods, ChIP-exo reports a mixture of different regions associated with the TF: those bound directly as well as via intermediaries. However, the distribution of footprints are likely to be indicative of the complex forming at the DNA. We present ExoDiversity, which uses a model-based framework to learn a joint distribution over footprints and motifs, thus resolving the mixture of ChIP-exo footprints into diverse binding modes. It uses no prior motif or TF information and automatically learns the number of different modes from the data. We show its application on a wide range of TFs and organisms/cell-types. Because its goal is to explain the complete set of reported regions, it is able to identify co-factor TF motifs that appear in a small fraction of the dataset. Further, ExoDiversity discovers small nucleotide variations within and outside canonical motifs, which co-occur with variations in footprints, suggesting that the TF-DNA structural configuration at those regions is likely to be different. Finally, we show that detected modes have specific DNA shape features and conservation signals, giving insights into the structure and function of the putative TF-DNA complexes.

15:00-15:20
Domain adaptive neural networks improve cross-species prediction of transcription factor binding
Format: Pre-recorded with live Q&A

Moderator(s): Alejandra Medina-Rivera

  • Kelly Cochran, Stanford University, United States
  • Divyanshi Srivastava, The Pennsylvania State University, United States
  • Avanti Shrikumar, Stanford University, United States
  • Akshay Balsubramani, Stanford University, United States
  • Anshul Kundaje, Stanford University, United States
  • Shaun Mahony, The Pennsylvania State University, United States

Presentation Overview: Show

The intrinsic DNA sequence preferences and cell-type specific cooperative partners of transcription factors (TFs) are typically highly conserved. Hence, despite the rapid evolutionary turnover of individual TF binding sites, predictive sequence models of cell-type specific genomic occupancy of a TF in one species should generalize to closely matched cell types in a related species. To assess the viability of cross-species TF binding prediction, we train neural networks to discriminate ChIP-seq peak locations from genomic background and evaluate their performance within and across species. Cross-species predictive performance is consistently worse than within-species performance, which we show is caused in part by species-specific repeats. To account for this domain shift, we use an augmented network architecture to automatically discourage learning of training species-specific sequence features. This domain adaptation approach corrects for prediction errors on species-specific repeats and improves overall cross-species model performance. Our results demonstrate that cross-species TF binding prediction is feasible when models account for domain shifts driven by species-specific repeats.

Tuesday, July 27th
11:00-11:40
COSI REGSYS KEYNOTE: Multi-omics data integration methods to study rare genetic diseases
Format: Live-stream

Moderator(s): Simon van Heeringen

  • Anaïs Baudot
11:40-12:00
Evaluating the predictive power of enhancer-mediated cell-type specific gene regulatory networks
Format: Pre-recorded with live Q&A

Moderator(s): Simon van Heeringen

  • Aryan Kamal, EMBL, Germany
  • Christian Arnold, EMBL, Germany
  • Annique Claringbould, EMBL, Germany
  • Sophia Müller-Dott, EMBL, Germany
  • Neha Daga, EMBL, Germany
  • Olga Sigalova, EMBL, Germany
  • Maksim Kholmatov, EMBL, Germany
  • Lixia He, University Hospital Heidelberg, Germany
  • Caroline Pabst, University Hospital Heidelberg, Germany
  • Judith Zaugg, EMBL, Germany

Presentation Overview: Show

Disease-associated genetic variants often lay in non-coding regions, and likely have a regulatory role. To understand their effects it is crucial to identify genes that are modulated by specific regulatory elements (e.g. enhancers). However, these regulatory elements are also modulated by the activity of transcription factors (TFs) that regulate them, often in a cell-type specific manner. Thus, regulatory elements integrate genetic, epigenetic, and TF-mediated cellular signals. TFs, regulatory elements, and their target genes form a gene regulatory network (GRN) that comprises cell-type specific links. Many methods exist for reconstructing GRNs from high-throughput expression or chromatin data. However, most methods either lack regulatory elements (i.e. connect TFs directly to genes), or focus on enhancer-gene links (i.e. lacking TFs), and thus are unable to integrate the impact of genetic variants and TFs simultaneously. Another important bottleneck for GRN reconstruction is the lack of a framework to globally evaluate their performance. To address these challenges, we (1) present a method for reconstructing cell-type specific GRNs that integrate enhancers and TFs, (2) a general framework for evaluating GRNs based on their cell-type specific predictive power, and (3) apply our approach to understand the response to infection in macrophages.

12:00-12:20
Combining in vitro quantification with in vivo detection of protein-DNA interactions reveals subtle signals in gene regulatory networks
Format: Pre-recorded with live Q&A

Moderator(s): Simon van Heeringen

  • Yuning Zhang, Center for Genomic and Computational Biology, Duke University, United States
  • Raluca Gordan, Duke University, United States

Presentation Overview: Show

Transcription factors (TFs) bind specific sites across the genome to regulate gene expression. To understand this regulation, genomic binding sites of TFs are oftentimes profiled using in vivo assays such as ChIP-seq (which captures binding in the cell but is prone to technical biases), as well as in vitro assays such as protein binding microarray or PBM (which yields quantitative binding measurements but lacks cellular context).

We show that using quantitative in vitro measurements to interpret ChIP-seq data can help us uncover subtle signals in genomic targeting by TFs. First, to understand the relationship between in vitro and in vivo DNA-binding data, we simulated the generation of ChIP-seq reads based on PBM-derived binding probabilities combined with chromatin accessibility. We found that in vitro binding specificities are largely retained in the cell, although noise is introduced by the ChIP-seq experimental steps.

Next, we focused on two model systems of competitive binding by paralogous TFs. With prior knowledge from in vitro binding data, we uncovered in vivo evidence that TF paralogs compete for DNA-binding in a fine-tuned fashion driven by subtle differences in specificity. This finding supports the strategy of combining in vitro quantification with in vivo profiling to understand TF-driven regulation.

12:40-13:00
Characterizing the cellular diversity of early development of the human hindbrain and spinal cord
Format: Pre-recorded with live Q&A

Moderator(s): Michael Hoffman

  • Sushmita Roy, Wisconsin Institute for Discovery, Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, United States
  • Junha Shin, Wisconsin Institute for Discovery, University of Wisconsin-Madison, United States
  • Nisha Iyer, Wisconsin Institute for Discovery, Department of Biomedical Engineering, University of Wisconsin-Madison, United States
  • Sunnie Grace McCalla, Wisconsin Institute for Discovery, University of Wisconsin-Madison, United States
  • Stephanie Cuskey, Wisconsin Institute for Discovery, University of Wisconsin-Madison, United States
  • Jacky Tian, Wisconsin Institute for Discovery, Department of Biomedical Engineering, University of Wisconsin-Madison, United States
  • Tessa Doersch, Wisconsin Institute for Discovery, University of Wisconsin-Madison, United States
  • Noah Nichol, Wisconsin Institute for Discovery, Department of Biomedical Engineering, University of Wisconsin-Madison, United States
  • Randolph Ashton, Wisconsin Institute for Discovery, Department of Biomedical Engineering, University of Wisconsin-Madison, United States

Presentation Overview: Show

Single-cell RNA-seq (scRNA-seq) enables us to profile genome-wide gene expression in individual cells. A key problem in scRNA-seq data analysis is the robust identification of cell types and the genes that drive the cell-type specificity. We developed a two-step computational pipeline that first uses consensus clustering to define cell types and then applies multi-task clustering to identify combinatorial patterns of gene expression. We applied our approach to a novel scRNA-seq dataset profiling the chemically induced differentiation of human pluripotent stem cells (hPSCs) into major cell types of the hindbrain and spinal cord. We identified 17 groups of cardinal cell clusters that were each further organized into hierarchically related subgroups and are associated with known neuronal gene markers and previously measured mouse in vivo gene expression patterns. We applied our multi-task clustering algorithm to identify gene modules for each of the cell subtypes in the cardinal cluster set and identified several genes that exhibit lineage specific patterns of expression. Taken together, our computational pipeline and novel dataset should be a useful resource for deciphering the cellular complexity of the human early neuronal development.

13:00-13:20
Mapping the DNA accessibility landscape of B-ALL patients revealed principles of cancer evolution.
Format: Pre-recorded with live Q&A

Moderator(s): Michael Hoffman

  • Giacomo Corleone, IRCCS Regina Elena National Cancer Institute, Italy
  • Stefano Di Giovenale, IRCCS Regina Elena National Cancer Institute, Italy
  • Cristina Sorino, IRCCS Regina Elena National Cancer Institute, Italy
  • Maurizio Fanciulli, IRCCS Regina Elena National Cancer Institute, Italy

Presentation Overview: Show

Pediatric B-cell Acute Lymphoblastic Leukemia (B-ALL) is the primary cause of death from hematological disease in children. Despite the enormous improvement of treatments based on innovative immunotherapy, applications of new approaches particularly effective in relapsed patients constitute a significant emergency in clinical practice.

In this work, we have built the most extensive map of the accessibility landscape of B-ALL to date from 35 B-ALL patients. We integrated this map with a plethora of transcriptomic and epigenomics pan-cancer profiles to define the key determinant of B-ALL post-therapy relapse. We observe that relapsed patients are dominated by regulatory elements (N=~6000) originally represented at diagnosis that shrinks under treatments and subsequently re-expand, driving the relapse. Motif analysis coupled with multi-omics integration suggests that these elements are likely regulated by the binding of the transcription factors ERG, EBF, and RUNX.

We identified an enhancer regulating the DCMP Deaminase gene (DCTD) and gained insights into its mechanistic regulation. While the DCTD gene is broadly expressed in human tissues, enhancer activity is detected only in Leukemia Cell Lines (GTEX data). To directly test the regulatory potential of DCTD enhancers, we generated conditional knock-out of the DCTD enhancer and gene in primary cell lines. Our data revealed that DCTD enhancer is a crucial determinant of DCTD mRNA expression and protein abundance. Strikingly, DCTD enhancer depletion abrogates proliferation, thus suggesting that DCTD enhancer activity is a driver of clonal proliferation in B-ALL relapse.

Taken together, our data revealed that regulatory activity dynamically changes during cancer progression and represents principal phenomena underlying functional mechanisms of B-ALL.

13:20-13:40
Identifying cell-state-associated alternative splicing events and their co-regulation
Format: Pre-recorded with live Q&A

Moderator(s): Michael Hoffman

  • Carlos Buen Abad Najar, University of California, Berkeley, United States
  • Prakruthi Burra, University of California, Berkeley, United States
  • Nir Yosef, University of California, Berkeley, United States
  • Liana Lareau, University of California, Berkeley, United States

Presentation Overview: Show

Alternative splicing shapes the transcriptome and contributes to cell identity, but single-cell RNA sequencing has struggled to capture the impact of alternative splicing. We previously showed that low recovery of mRNAs from single cells led to erroneous conclusions about the cell-to-cell variability of alternative splicing. Here, we present a new method, Psix, to confidently identify splicing that changes across a landscape of single cells, using a probabilistic model that is robust against the data limitations of scRNA-seq. Its autocorrelation-inspired approach finds patterns of alternative splicing that correspond to cell identity without the need for explicit cell clustering, labeling, or trajectory inference. Applying Psix to mouse brain development data, we identify genes whose alternative splicing patterns cluster into modules of co-regulation. We show that the exons in these modules are enriched for binding by neuronal splicing factors, and that their changes in splicing correspond to changes in expression of these splicing factors. Thus, Psix reveals cell-type-dependent splicing patterns and the wiring of the splicing regulatory networks that control them. Our new method will enable scRNA-seq analysis to go beyond transcription to understand layers of post-transcriptional regulation in determining cell identity.

13:40-14:00
Viral integration transforms chromatin to drive oncogenesis
Format: Pre-recorded with live Q&A

Moderator(s): Michael Hoffman

  • Mehran Karimzadeh, University of Toronto, Canada
  • Christopher Arlidge, University Health Network, Canada
  • Ariana Rostami, University of Toronto, Canada
  • Mathieu Lupien, University Health Network, Canada
  • Scott V. Bratman, University Health Network, Canada
  • Michael M. Hoffman, University Health Network, Canada

Presentation Overview: Show

Human papillomavirus (HPV) drives almost all cervical cancers and up to ~70% of head and neck cancers. Frequent integration into the host genome occurs only for tumourigenic strains of HPV. We found that viral integration events often occurred along with changes in chromatin state and expression of genes near the integration site. We investigated whether introduction of new transcription factor binding sites due to HPV integration could invoke these changes. ChIP-seq revealed that conserved CTCF sequence motif in the HPV genome with enriched chromatin accessibility signal bound CTCF in 5 HPV+ cancer cell lines. Significant changes in CTCF binding pattern and increases in chromatin accessibility occurred exclusively within 100 kbp of HPV integration sites. The chromatin changes co-occurred with out-sized changes in transcription and alternative splicing of local genes. We analyzed the essentiality of genes upregulated around HPV integration sites of The Cancer Genome Atlas (TCGA) HPV+ tumours. HPV integration upregulated genes with significantly higher essentiality. Our results suggest that introduction of a new CTCF binding site due to HPV integration reorganizes chromatin and upregulates genes essential for tumour viability in some HPV+ tumours. These findings emphasize a newly recognized role of HPV integration in oncogenesis.

14:20-15:00
COSI REGSYS KEYNOTE: Regulation and human disease - from genome to networks
Format: Live-stream

Moderator(s): Shaun Mahony

  • Olga Troyanskaya
15:00-15:20
Learning determinants of nucleosome positioning through sequence-based models of cell-free DNA
Format: Pre-recorded with live Q&A

Moderator(s): Shaun Mahony

  • Christopher Probert, Stanford University, United States
  • Charles McAnany, Stowers Institute for Medical Research, United States
  • Jennifer Caswell-Jin, Stanford University, United States
  • Melanie Weilert, Stowers Institute for Medical Research, United States
  • Zhicheng Ma, Stanford University, United States
  • Julia Zeitlinger, Stowers Institute for Medical Research, United States
  • Christina Curtis, Stanford University, United States
  • Anshul Kundaje, Stanford University, United States

Presentation Overview: Show

DNA sequence significantly influences nucleosome positioning through both short-range intrinsic effects and longer-range extrinsic effects mediated by DNA binding proteins. Existing analyses of sequence determinants on nucleosome position in the human genome lack fully generalizable predictive models incorporating both types of effects. We introduce a novel framework for end-to-end learning and interpretation of sequence determinants of in vivo nucleosome position measured from fragmentation matrices of blood plasma-derived cell-free DNA. We first train a fully dense convolutional neural network to predict nucleosome occupancy directly from DNA sequence at base pair-resolution. We then present a suite of purpose-developed sequence interpretation techniques to survey the intricate balance between sequence-driven intrinsic and TF-driven extrinsic effects in diverse genomic contexts. Our study presents novel insights into nucleosome positioning determinants, including finding epistatic interactions between nonadjacent bases synergistically influencing occupancy.

Wednesday, July 28th
11:00-11:40
COSI REGSYS KEYNOTE: Tracing the evolution of gene regulation and the emergence of new traits
Format: Live-stream

Moderator(s): Anthony Mathelier

  • Camille Berthelot
11:40-12:00
Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network
Format: Pre-recorded with live Q&A

Moderator(s): Anthony Mathelier

  • Charles Lecellier, CNRS, France
  • Laurent Brehelin, LIRMM/CNRS, France
  • Mathys Grapotte, IGMM, France
  • Chloé Bessiere, Master STIC pour la Santé, France
  • Manu Saraswat, University of British Columbia, India
  • Fantom Consortium, RIKEN, Japan
  • Clément Chatelain, SANOFI R&D, France

Presentation Overview: Show

Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of Transcription Start Sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. In this work (Nature Communications, in press), we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called Short Tandem Repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.

12:00-12:20
Sequence-based modeling of genome 3D architecture from kilobase to chromosome-scale
Format: Pre-recorded with live Q&A

Moderator(s): Anthony Mathelier

  • Jian Zhou, University of Texas Southwestern Medical Center, United States

Presentation Overview: Show

To learn the complex sequence dependencies of multiscale genome architecture, here we developed a sequence-based deep learning approach, Orca, that predicts genome 3D architecture from kilobase to whole-chromosome scale, covering structures including chromatin compartments. Orca also captures the sequence dependencies of diverse types of interactions, from CTCF-mediated to enhancer-promoter interactions and Polycomb-mediated interactions. Orca enables the interpretation of the effects of any structural variant at any size on multiscale genome organization. We show that the models accurately recapitulate the effects of experimentally studied structural variants at varying sizes (300bp-80Mb). Furthermore, Orca enables in silico virtual screen assays to probe the sequence-basis of genome 3D organization at different scales. At the chromatin compartment scale, based on virtual screens of sequence activities, we propose a new model for the sequence basis of chromatin compartments: sequences at active transcription start sites are primarily responsible for establishing the expression-active compartment A, while the inactive compartment B typically requires extended stretches of AT-rich sequences (at least 6-12kb) and can form ‘passively’ without depending on any particular sequence pattern. Orca thus effectively provides an “in silico genome observatory” to predict variant effects on genome structure and probe the sequence-based mechanisms of genome organization.

12:40-13:20
COSI REGSYS KEYNOTE: Chromatin Conformation during Early Embryonic Development
Format: Live-stream

Moderator(s): Raluca Gordan

  • Juanma Vacquerizas
13:20-13:40
Proceedings Presentation: DECODE: A Deep-learning Framework for Condensing Enhancers and Refining Boundaries with Large-scale Functional Assays
Format: Pre-recorded with live Q&A

Moderator(s): Raluca Gordan

  • Mark Gerstein, Yale University, United States
  • Min Xu, Carnegie Mellon University, United States
  • Zhanlin Chen, Yale University, United States
  • Jing Zhang, UC Irvine, United States
  • Jason Liu, Yale University, United States
  • Yi Dai, UC Irvine, United States
  • Donghoon Lee, Icahn School of Medicine at Mount Sinai, United States
  • Martin Min, 5NEC Laboratories America, United States

Presentation Overview: Show

Summary: Mapping distal regulatory elements, such as enhancers, is the cornerstone for investigating genome evolution, understanding critical biological functions, and ultimately elucidating how genetic variations may influence diseases. Previous enhancer prediction methods have used either unsupervised approaches or supervised methods with limited training data. Moreover, past approaches have operationalized enhancer discovery as a binary classification problem without accurate enhancer boundary detection, producing low-resolution annotations with redundant regions and reducing the statistical power for downstream analyses (e.g., causal variant mapping and functional validations). Here, we addressed these challenges via a two-step model called DECODE. First, we employed direct enhancer activity readouts from novel functional characterization assays, such as STARR-seq, to train a deep neural network classifier for accurate cell-type-specific enhancer prediction. Second, to improve the annotation resolution (~500 bp), we implemented a weakly-supervised object detection framework for enhancer localization with precise boundary detection (at 10 bp resolution) using gradient-weighted class activation mapping.
Results: Our DECODE binary classifier outperformed the state-of-the-art enhancer prediction methods by 24% in transgenic mouse validation. Further, DECODE object detection can condense enhancer an-notations to only 12.6% of the original size, while still reporting higher conservation scores and genome-wide association study variant enrichments. Overall, DECODE improves the efficiency of regulatory element mapping with graphic processing units for deep-learning applications and is a powerful tool for enhancer prediction and boundary localization.

13:40-14:00
Proceedings Presentation: EnHiC: Learning fine-resolution Hi-C contact maps using a generative adversarial framework
Format: Pre-recorded with live Q&A

Moderator(s): Raluca Gordan

  • Yangyang Hu, University of California Riverside, United States
  • Wenxiu Ma, University of California Riverside, United States

Presentation Overview: Show

Motivation:
The Hi-C technique has enabled the genome-wide mapping of chromatin interactions. However, high-resolution Hi-C data requires costly, deep sequencing, therefore has only been achieved in a limited number of cell types. Machine learning models based on neural networks have been developed as a remedy to this problem.

Results:
In this work, we propose a novel method, EnHiC, for predicting high-resolution Hi-C matrices from low-resolution input data based on a generative adversarial network (GAN) framework. Inspired by non-negative matrix factorization, our model fully exploits the unique properties of Hi-C matrices and extracts rank-1 features from multi-scale low-resolution matrices to enhance the resolution. Using three human Hi-C datasets, we demonstrate that EnHiC accurately and reliably enhances the resolution of Hi-C matrices and outperforms other GAN-based models. Moreover, EnHiC-predicted high-resolution matrices facilitate accurate detections of TADs and fine-scale chromatin interactions.

Availability:
EnHiC is publicly available at https://github.com/wmalab/EnHiC.

14:20-14:40
Interpretable deep learning for chromatin-informed inference of transcriptional programs driven by somatic alterations across cancers
Format: Pre-recorded with live Q&A

Moderator(s): Sushmita Roy

  • Yifeng Tao, Carnegie Mellon University, United States
  • Xiaojun Ma, University of Pittsburgh, United States
  • Drake Palmer, University of Pittsburgh, United States
  • Russell Schwartz, Carnegie Mellon University, United States
  • Xinghua Lu, University of Pittsburgh, United States
  • Hatice Osmanbeyoglu, University of Pittsburgh, United States

Presentation Overview: Show

Cancer is a disease of gene dysregulation, where cells acquire somatic and epigenetic alterations that drive aberrant cellular signaling. Interpreting patient somatic alterations within context-specific regulatory programs will facilitate personalized therapeutic decisions for each individual. We develop a partially interpretable neural network model with encoder-decoder architecture, called Chromatin-informed Inference of Transcriptional Regulators Using Self-attention mechanism (CITRUS), to model the impact of somatic alterations on cellular states and further onto downstream gene expression programs. The encoder module employs a self-attention mechanism to model the contextual functional impact of somatic alterations in a tumor-specific manner. Furthermore, the model uses a layer of hidden nodes to explicitly represent the state of transcription factors (TFs), and the decoder learns the relationships between TFs and their target genes guided by the sparse prior based on TF binding motifs in the tumor-type specific open chromatin regions. We applied CITRUS to genomic, mRNA sequencing, and ATAC-seq data from tumors of 17 cancer types profiled by The Cancer Genome Atlas. Our computational framework outperforms the competing models in predicting RNA expression, enables us to share information across tumors to learn patient-specific TF activities, revealing similarities and differences of regulatory programs between and within tumor types.

14:40-15:00
Proceedings Presentation: scPNMF: sparse gene encoding of single cells to facilitate gene selection for targeted gene profiling
Format: Pre-recorded with live Q&A

Moderator(s): Sushmita Roy

  • Dongyuan Song, University of California, Los Angeles, United States
  • Kexin Li, University of California, Los Angeles, United States
  • Jingyi Li, University of California, Los Angeles, United States
  • Zachary Hemminger, University of California, Los Angeles, United States
  • Roy Wollman, University of California, Los Angeles, United States

Presentation Overview: Show

Motivation: Single-cell RNA sequencing (scRNA-seq) captures whole transcriptome information of individual cells. While scRNA-seq measures thousands of genes, researchers are often interested in only dozens to hundreds of genes for a closer study. Then a question is how to select those informative genes from scRNA-seq data. Moreover, single-cell targeted gene profiling technologies are gaining popularity for their low costs, high sensitivity, and extra (e.g., spatial) information; however, they typically can only measure up to a few hundred genes. Then another challenging question is how to select genes for targeted gene profiling based on existing scRNA-seq data.
Results: Here we develop the single-cell Projective Non-negative Matrix Factorization (scPNMF) method to select informative genes from scRNA-seq data in an unsupervised way. Compared with existing gene selection methods, scPNMF has two advantages. First, its selected informative genes can better distinguish cell types. Second, it enables the alignment of new targeted gene profiling data with reference data in a low-dimensional space to facilitate the prediction of cell types in the new data. Technically, scPNMF modifies the PNMF algorithm for gene selection by changing the initialization and adding a basis selection step, which selects informative bases to distinguish cell types. We demonstrate that scPNMF outperforms the state-of-the-art gene selection methods on diverse scRNA-seq datasets. Moreover, we show that scPNMF can guide the design of targeted gene profiling experiments and cell-type annotation on targeted gene profiling data.

15:00-15:20
A Comparative Genomics Approach to Identifying Candidate Enhancers Associated with Phenotypes
Format: Pre-recorded with live Q&A

Moderator(s): Sushmita Roy

  • Irene Kaplow, Carnegie Mellon University, United States
  • Daniel Schäffer, Carnegie Mellon University, United States
  • Kathleen Foley, Lehigh University, United States
  • Wynn Meyer, Lehigh University, United States
  • Morgan Wirthlin, Carnegie Mellon University, United States
  • Danielle Levesque, University of Maine, United States
  • Emma Teeling, University College Dublin, United States
  • Badoi Phan, Carnegie Mellon University, United States
  • Patrick Sullivan, University of North Carolina at Chapel Hill, United States
  • Alyssa Lawler, Carnegie Mellon University, United States
  • Ashley R. Brown, Carnegie Mellon University, United States
  • Xiaomeng Zhang, Carnegie Mellon University, United States
  • Chaitanya Srinivasan, Carnegie Mellon University, United States
  • Diane Genereux, University of Massachusetts Medical School, United States
  • Elinor Karlsson, University of Massachusetts Medical School, United States
  • Andreas Pfenning, Carnegie Mellon University, United States

Presentation Overview: Show

Advances in the genome sequencing have provided a comprehensive view of cross-species conservation across small segments of nucleotides. These conservation measures have proven invaluable to associate phenotypic variation, both within and across species, to variation in genotype at protein-coding genes or very highly conserved enhancers. However, these approaches cannot be applied to the vast majority of enhancers, where the conservation level of individual nucleotides are often low even when enhancer function is conserved. To overcome this limitation, we developed the TACIT-ML (Tissue Aware Conservation Inference Through Machine Learning) approach, in which convolutional neural network models learn the regulatory code connecting genome sequence to open chromatin in a tissue of interest, allowing us to accurately predict cases where differences in genotype are associated with differences in open chromatin in that tissue at enhancer regions. We apply this technique to identify dozens of new associations between genetic variation in orthologous motor cortex and liver enhancers across 222 Boreoeuthrian mammals to differences in species’ brain size, longevity, carnivory, and body temperature. One of the longevity-associated liver enhancers is near TDG, a gene that regulates DNA repair and methylation in an age-dependent manner and whose liver expression across species is associated with longevity.



International Society for Computational Biology
525-K East Market Street, RM 330
Leesburg, VA, USA 20176

ISCB On the Web

Twitter Facebook Linkedin
Flickr Youtube