Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide




Schedule subject to change
Wednesday, July 15th
10:40 AM-10:45 AM
RegSys Introduction
Format: Live-stream

  • Shaun Mahony, Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, United States
  • Alejandra Medina-Rivera, International Laboratory for Human Genome Research, Universidad Nacional Autónoma de México, Querétato, México, Mexico
10:45 AM-11:20 AM
RegSys KEYNOTE: Detection of functional cis-regulatory variations causal for rare genetic disorders
Format: Live-stream

  • Wyeth Wasserman, Centre for Molecular Medicine and Therapeutics, University of British Columbia, Canada

Presentation Overview: Show

Despite great expectation, definitively causal cis-regulatory sequence variations have removed elusive despite widespread access to whole genome sequencing. Three case studies of cis-regulatory alterations detected using computational approaches will be presented that cause strabismus, glutaminase deficiency and a novel neurodevelopmental disorder. Each features a distinct mechanism, including a short 4 basepair deletion, a trinucleotide repeat expansion and a large structural inversion. A semi-quantitative framework for scoring the evidence supporting candidate cis-regulatory variants is presented, based on a reference collection of cis-regulatory alterations in genes encoding DNA binding transcription factors. The scoring framework features two sections, one pertaining to the evidence linking the genotype to the phenotype, and one pertaining to the evidence that the sequence variation functionally alters gene regulation.

11:20 AM-11:40 AM
Hox binding specificity is directed by DNA sequence preferences and differential abilities to engage inaccessible chromatin
Format: Pre-recorded with live Q&A

  • Divyanshi Srivastava, Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, United States
  • Milica Bulajić, Department of Biology, New York University, New York, NY, United States
  • Esteban Mazzoni, Department of Biology, New York University, New York, NY, United States
  • Shaun Mahony, Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, United States

Presentation Overview: Show

The Hox transcription factors (TFs) bind similar target sequence motifs in vitro, yet they specify distinct cellular fates in the developing embryo. How the Hox TFs achieve the specificity required in order to drive differential cell fates remains unclear. Here, we induce the expression of several Hox TFs in the uniform cellular context of embryonic stem cell derived progenitor motor neurons (pMNs). We ask whether Hox TFs bind differentially in pMNs, and whether differential Hox TF binding is determined by sequence preferences or the ability of Hox TFs to differentially interact with preexisting pMN chromatin. We find that Hox TFs bind both “shared” and “unique” sites in the genome, and sequence preferences are unable to explain the observed diversity in Hox TF binding. To examine the degree to which preexisting chromatin predicts induced Hox binding, we develop a novel bimodal neural network framework that estimates the degree to which sequence and pMN chromatin features predict induced Hox TF binding. Results from our network suggest that the preexisting chromatin environment determines the binding of different Hox TFs to varying degrees. For Hox TFs that bind similar sequences, differential abilities to bind inaccessible chromatin drive differential Hox TF binding.

12:00 PM-12:20 PM
Deep learning models of 422 C2H2 Zinc Finger transcription factor binding profiles reveal alternate combinatorial DNA binding sequence preferences
Format: Pre-recorded with live Q&A

  • Abhimanyu Banerjee, Stanford University, United States
  • Irene Kaplow, Carnegie Mellon University, United States
  • Frank Schmitges, University of Toronto, Canada
  • Lixia Jiang, Stanford University, United States
  • Zheng Zuo, Stanford University, United States
  • Zahoor Zafarullah, Stanford University, United States
  • Georgi Marinov, Stanford University, United States
  • Ernest Radovani, University of Toronto, Canada
  • Laksshman Sundaram, Stanford University, United States
  • Avanti Shrikumar, Stanford University, United States
  • Hamed Najafabadi, Mcgill University, Canada
  • Jack Greenblatt, University of Toronto, Canada
  • Polly Fordyce, Stanford University, United States
  • Michael Snyder, Stanford University, United States
  • Tim Hughes, University of Toronto, Canada
  • Anshul Kundaje, Stanford University, United States

Presentation Overview: Show

The C2H2-Zinc Finger family of transcription factors (TFs) constitute half of all human TFs. These TFs contain multiple zinc-finger (ZF) domains that could combinatorially bind distinct alternate motifs. The in vivo DNA binding preferences remain largely unexplored due to challenges in profiling and motif discovery. To comprehensively characterize binding preferences of these TFs, we performed ChIP-seq on 208 C2H2-ZF transcription factors using eGFP-TF fusion constructs in HEK293 cells to complement previous ChIP-seq and ChIP-exo studies of C2H2-ZFs. We trained neural networks (NNs) to learn accurate sequence models of base-resolution binding profiles of 422 C2H2- ZF TFs. We distilled robust motif representations from the NN models. Several TFs exhibited novel alternate DNA binding preferences involving distinct motifs that aligned to different combinations of ZF domains based on the C2H2-ZF B1H "Recognition Code”. We performed Spec-seq experiments to validate variably spaced alternate motifs and novel motifs that differ from B1H predicted motifs. Our motifs were significantly longer than previously discovered motifs and identified far more ZF domains engaging in binding DNA than previously reported. Our results significantly expand the cis-regulatory lexicon of the human genome.

12:20 PM-12:40 PM
Proceedings Presentation: Fully Interpretable Deep Learning Model of Transcriptional Control
Format: Pre-recorded with live Q&A

  • Yi Liu, University of Chicago, United States
  • Kenneth Barr, University of Chicago, United States
  • John Reinitz, University of Chicago, United States

Presentation Overview: Show

The universal expressibility assumption of Deep Neural Networks (DNNs)
is the key motivation behind recent works in the system biology community to employ DNNs to solve important problems in functional genomics and molecular genetics. Typically, such investigations have taken a "black box" approach in which the internal structure of the model used is set purely by machine learning considerations with little consideration of representing the internal structure of the biological system by the mathematical structure of the DNN. DNNs have not yet been applied to the detailed modeling of transcriptional control in which mRNA production is controlled by the binding of specific transcription factors to DNA, in part because such models are in part formulated in terms of specific chemical equations that appear different in form from those used in neural networks. In this paper, we give an example of a DNN which can model the detailed control of transcription in a precise and predictive manner. Its internal structure is fully interpretable and is faithful to underlying chemistry of transcription factor binding to DNA. We derive our DNN from a systems biology model that was not previously recognized as having a DNN structure. Although we apply our DNN to data from the early embryo of the fruit fly Drosophila, this system serves as a testbed for analysis of much larger data sets obtained by systems biology studies on a genomic scale.

2:00 PM-2:40 PM
RegSys KEYNOTE: Deep Learning of Immune Differentiation
Format: Live-stream

  • Sara Mostafavi, University of British Columbia, Canada

Presentation Overview: Show

The mammalian genome contains several million cis-regulatory elements, whose differential activity marked by open chromatin determines cellular differentiation. While the growing availability of functional genomics assays allows us to systematically identify cis-regulatory elements across varied cell types, how the DNA sequence of cis-regulatory elements is decoded and orchestrated on the genome scale to determine cellular differentiation is beyond our grasp. In this talk, I’ll present recent work using machine learning as a tool to derive an understanding of the relationship between regulatory sequence and cellular function in the context of immune cell differentiation. In particular, I’ll present our deep learning approach (AI-TAC) to combining a large and granular compendium of epigenomic data and will describe approaches to robustly interpreting complex models in order to uncover mechanistic insights into immune gene regulation (Yoshida et al., Cell 2019; Maslova et al., bioRxiv 2019). Our work shows that a deep learning approach to genome-wide chromatin accessibility can uncover patterns of immune transcriptional regulators that are directly coded in the DNA sequence, and thus providing a powerful in-silico framework (an in-silico assay of sorts) to mechanistically probe the relationship between regulatory sequence and its function.

2:40 PM-3:00 PM
A guide to predicting activity of enhancer orthologs in hundreds of species
Format: Pre-recorded with live Q&A

  • Irene Kaplow, Carnegie Mellon University, United States
  • Morgan Wirthlin, Carnegie Mellon University, United States
  • Alyssa Lawler, Carnegie Mellon University, United States
  • Xiaoyu Zhang, Carnegie Mellon University, United States
  • Ashley Brown, Carnegie Mellon University, United States
  • Andreas Pfenning, Carnegie Mellon University, United States

Presentation Overview: Show

Many phenotypes, including vocal learning, longevity, and brain size, have evolved through gene expression, meaning that their differences across species are caused by differences in genome sequence at enhancers. While some of the genes involved in these phenotypes have been identified, in most cases it remains unknown which enhancers are responsible and how genome sequence differences in those enhancers have led to differences in gene expression. We developed a machine learning model that predicts species-specific brain enhancer activity from genome sequences at orthologs of open chromatin regions.
We trained our models using brain ATAC-seq data from mouse, rat, Rhesus macaque, and human. Our model achieved AUPRC = 0.88 on the entire validation set and AUPRC = 0.70 on species-specific enhancers and non-enhancers. We used our models to make brain enhancer activity predictions across hundreds of mammals. We demonstrated that similarity in predictions between species is negatively correlated with evolutionary distance. We then used our predictions to identify clade-specific enhancers and showed that our predictions were consistent with our data. Our approach to predicting enhancer orthologs’ activity and our metrics for evaluating such predictions can be applied to any tissue or cell type with open chromatin data available from multiple species.

3:20 PM-3:40 PM
Comparison of chromatin contacts maps from GAM and Hi-C reveals method specific interactions linked with active and inactive chromatin
Format: Pre-recorded with live Q&A

  • Christoph J. Thieme, MDC Berlin, Germany
  • Robert A. Beagrie, Weatherall Institute of Molecular Medicine, United Kingdom
  • Catherine Baugher, Ohio University, United States
  • Yingnan Zhang, Ohio University, United States
  • Markus Schueler, MDC Berlin, Germany
  • Carlo Annunziatella, Università di Napoli Federico II, Italy
  • Yichao Li, Ohio University, United States
  • Alexander Kukalev, MDC Berlin, Germany
  • Dorothee C.A. Kraemer, MDC Berlin, Germany
  • Rieke Kempfer, MDC Berlin, Germany
  • Antonio Scialdone, Helmholtz Zentrum München, Germany
  • Mario Nicodemi, Università di Napoli Federico II, Italy
  • Lonnie R. Welch, Ohio University, United States
  • Ana Pombo, MDC Berlin, Germany

Presentation Overview: Show

Gene expression is functionally coupled with 3D genome configuration. Genome Architecture Mapping (GAM) is a ligation-free, genome-wide method that maps chromatin contacts in 3D, based on measuring the frequency of locus co-segregation from an ensemble of ultra-thin nuclear slices of random orientation. To compare GAM and Hi-C, we used our new high-throughput, multiplexed GAM pipeline to produce a deep dataset from mouse embryonic stem cells, and we devised a procedure to extract contacts preferentially detected by either GAM or Hi-C. Strong contacts enriched in the GAM data contain a 2-fold amplification of feature pairs associated with TF binding (including CTCF), histone marks, and enhancers. In contrast, feature pairs enriched in Hi-C-specific contacts are characterized by heterochromatin marks (H3K9me3 and H3K20me3). In general, genomic regions with increased transcriptional activity often form GAM-detected contacts that are underestimated by Hi-C. We are currently investigating whether the differences can be explained by increased contact multiplicity, which could limit ligation-dependent detection. Our findings expand our current understanding of 3D genome folding and highlight the importance of orthogonal approaches.

3:40 PM-4:00 PM
Mustache: Multi-scale Detection of Chromatin Loops from Hi-C and Micro-C Maps using Scale-Space Representation
Format: Pre-recorded with live Q&A

  • Ferhat Ay, La Jolla Institute for Immunology, United States
  • Abbas Roayaei Ardakany, la jolla institute for allergy and immunology, United States
  • Halil Tuvan Gezer, Sabanci University, Turkey
  • Stefano Lonardi, University of California Riverside, United States

Presentation Overview: Show

We present Mustache, a new method for multi-scale detection of chromatin loops from Hi-C and Micro-C contact maps using a technical advance in computer vision called scale-space theory. When applied to high-resolution Hi-C and Micro-C data, Mustache detects loops at a wide range of genomic distances, identifying structural and regulatory interactions that are supported by independent conformation capture experiments as well as by known correlates of loop formation such as CTCF binding, enhancers and promoters. Unlike the commonly used HiCCUPS tool, Mustache runs on general-purpose CPUs and it is very time efficient with a runtime of only a few minutes per chromosome for 5kb-resolution human genome contact maps. Extensive experimental results show that Mustache reports two to three times the number of HiCCUPS loops, which are reproducible across replicates. It also recovers a larger proportion of published ChIA-PET and HiChIP loops than HiCCUPS. A comparative analysis of Mustache’s experimental results on Hi-C and Micro-C data confirms strong agreement between the two datasets with Micro-C providing better power for loop detection. Overall, our experimental results show that Mustache enables a more efficient and comprehensive analysis of the chromatin looping from high-resolution Hi-C and Micro-C datasets. Mustache is freely available at https://github.com/ay-lab/mustache.

4:00 PM-4:20 PM
Deciphering the role of 3D genome organization in breast cancer susceptibility
Format: Pre-recorded with live Q&A

  • Sushmita Roy, University of Wisconsin-Madison, United States
  • Brittany Baur, University of Wisconsin-Madison, United States
  • Da-Inn Lee, University of Wisconsin-Madison, United States
  • Jill Haag, University of Wisconsin-Madison, United States
  • Michael Gould, University of Wisconsin-Madison, United States

Presentation Overview: Show

Cancer risk by environmental exposure is modulated by an individual’s genetics and age at exposure. This age-specific period of susceptibility is referred to as a “Window of Susceptibility” (WOS). Radiation exposures poses a high breast cancer risk for women between the early childhood and young adult stage and is reduced in the mid-30s. Rats have a similar WOS for developing mammary cancer. Previous studies have identified a looping interaction between a genomic region and a known cancer gene, PAPPA. However, the global role of three-dimensional organization in WOS is not known. Therefore, we generated Hi-C and RNA-seq data in rat mammary epithelial cells within and outside WOS. We compared the temporal changes in chromosomal looping to those in expression and find that interactions that have significantly higher counts within WOS are significantly enriched for differentially expressed genes that are higher in WOS. To systematically identify differential domains of interactions, we leveraged symmetric non-negative matrix factorization. This revealed clusters of dynamic regions that change their chromosome conformation between the two time points. Our results suggest that WOS-specific changes in 3D genome organization are linked to transcriptional changes that may increase susceptibility to breast cancer at an early age.

4:20 PM-4:40 PM
dcHiC: Differential Compartment Analysis of Hi-C datasets.
Format: Pre-recorded with live Q&A

  • Abhijit Chakraborty, La Jolla Institute for Immunology, United States
  • Ferhat Ay, La Jolla Institute for Immunology, United States
  • Jeffrey Wang, La Jolla Institute for Immunology, United States

Presentation Overview: Show

Principal Component Analysis (PCA) of Hi-C data provides a critical understanding of genome organization. The first principal component-score determines A/B compartmentalization and signifies the underlying chromatin-state. Although determining the component-score per sample is simple, a comparison of this score across multiple Hi-C samples is not straightforward. It’s a limitation for comparative analysis of genome organization across cell-types and conditions with available Hi-C datasets. Here, we introduce a systematic approach ‘dcHiC’ (differential compartment analysis of Hi-C) to measure principal component-scores of Hi-C profiles and identify differential compartments across datasets.

dcHiC employs Hierarchical Multiple Factor Analysis (HMFA) to balance the importance of multiple Hi-C datasets before performing PCA. This normalizes component-scores and helps to identify significant compartment changes across the genome. We also introduce compartmental Gene Set Enrichment Analysis or ‘cGSEA’ to find biologically relevant differential compartments across datasets. We used dcHiC to compare a total of 57 human Hi-C datasets, encompassing various biological conditions. We observed that dcHiC identified the least differences among replicate experiments and detected previously validated compartmental changes during cellular-differentiation. Compartmentalization is a critical process and we believe that our framework will help to understand its role in genome organization and downstream effects in a systematic manner.

5:00 PM-5:40 PM
RegSys KEYNOTE: A single cell lens into regulatory inference
Format: Live-stream

  • Dana Pe'Er, Program for Computational and Systems Biology, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, United States

Presentation Overview: Show


5:40 PM-6:00 PM
How to build regulatory networks from single-cell gene expression data
Format: Pre-recorded with live Q&A

  • Aditya Pratapa, Virginia Tech, United States
  • Amogh Jalihal, Virginia Tech, United States
  • Jeffrey Law, Virginia Tech, United States
  • Aditya Bharadwaj, Virginia Tech, United States
  • T. M. Murali, Virginia Tech, United States

Presentation Overview: Show

Nearly twenty methods have been developed to infer gene regulatory networks (GRNs) from single-cell RNA-seq data. An experimentalist seeking to analyze a new dataset faces a daunting task in selecting an appropriate inference method since there are no widely accepted ground-truth datasets for assessing algorithm accuracy and the criteria for evaluation and comparison of methods are varied. We present BEELINE, a systematic evaluation framework for algorithms that infer GRNs from scRNA-seq data. We find that the accuracies of the algorithms are moderate. Techniques that do not require pseudotime-ordered cells are generally more accurate. Based on these results, we develop recommendations on GRN algorithms to end users. We present choices for ground-truth networks, selecting genes for analysis, evaluation measures, and generation of simulated scRNA-seq datasets. Finally, we discuss the potential of supervised algorithms for GRN inference.

Thursday, July 16th
10:40 AM-11:20 AM
RegSys KEYNOTE: Genetic regulation of gene expression, environmental contexts and disease risk
Format: Live-stream

  • Francesca Luca

Presentation Overview: Show

Human complex traits result from genetic and environmental factors. Genetic regulation of gene expression is a key mechanism for complex trait variation. However, we have limited knowledge of the molecular mechanisms underlying interactions between genetic and environmental factors (GxE). We have developed allele-specific and eQTL mapping approaches to identify genetic effects on gene expression that are modified by cellular and environmental contexts. Our studies demonstrated that 50% of genes whose expression is regulated by GxE are associated with complex traits. Furthermore, environmental factors amplify genetic risk in a gene and genotype-specific manner. These context-specific genetic effects can be further complicated by cell composition heterogeneity and by psychosocial experiences. We have studied these complex interactions in a cohort of children with asthma. Using bulk RNA-sequencing and a machine learning approach, we have denoised and imputed transcriptional signatures of biological, clinical and psychosocial factors in leukocytes. To characterize the contribution of the cellular context to the response to immuno-modulators, we used single cell RNA-sequencing. We identified hundreds of context-specific genetic effects on both mean gene expression and its variance. By combining our results with a transcriptome-wide association study, we identified several genes associated with asthma risk having GxE on gene expression. Our results highlight the importance of identifying genetic effects on gene regulation across cell-types and environments to fully capture and understand the likely mechanisms underlying inter-individual variation in disease risk.

11:20 AM-11:40 AM
Characterization of regulatory variants in promoters with enhancer activity and their relation with human diseases
Format: Pre-recorded with live Q&A

  • Lucia Ramirez-Navarro, Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Mexico
  • Salvatore Spicuglia, Aix-Marseille Univ, INSERM UMR 1090, Theory and Approaches of Genome Complexity (TAGC), F-13288 Marseille, France, France
  • Lisa Strug, University of Toronto. The Hospital for Sick Children, Toronto, ON, Canada
  • Jessica Dennis, Vanderbilt Genetics Institute. University of British Columbia, Vancouver, BC, Canada
  • Alejandra Medina-Rivera, International Laboratory for Human Genome Research, Universidad Nacional Autónoma de México, Querétato, México, Mexico

Presentation Overview: Show

It has been shown that around 2 to 6% of the promoters have enhancer potential (ePromoters) and enrichment in gene ontologies show they are related to inflammatory or stress response. These sequences share epigenetic characteristics with enhancers and do show contact with other promoters in 3D interactions. Given that the majority of genetic variants associated with human diseases and traits have been found to be located in non-coding DNA, in this follow up analysis we set out to characterize regulatory variants in ePromoters by changes in the transcription factor binding.

Using a collection of variants retrieved from the GWAS catalog, CLINVAR and the GTEX project, we identified 109 and 190 variants to be likely affecting the binding for 42 and 46 transcription factors in HELA and K562, respectively. Most of these variants are also associated with diseases that have an inflammatory component, supporting our hypothesis of ePromoters being associated with inflammatory response.

Understanding ePromoters and the regulatory mechanisms that affect their dual function will help identify the causes of human diseases and traits.

12:00 PM-12:20 PM
Learning global patterns of epigenetic variation across individuals
Format: Pre-recorded with live Q&A

  • Jason Ernst, UCLA, United States
  • Jennifer Zou, UCLA, United States

Presentation Overview: Show

Many studies have identified variation across individuals in transcription factor binding, gene expression, histone modifications, and other molecular phenotypes. Although these sources of variation have been useful for understanding the regulation of specific data types at a single genomic site, it is often unclear how these data types and regions are related and how they interact in regulatory networks. In this study, we propose a method to identify global patterns of histone modifications across multiple marks and individuals that reoccur in many regions of the genome. We learn a multivariate hidden Markov model where all histone marks in all individuals are used as features by applying a “stacked” version of the ChromHMM framework. We applied this framework to a dataset of 75 individuals with 3 marks (H3K27ac, H3Kme1, H3K4me3) in the lymphoblastoid cell line (LCL) and a dataset of 93 individuals comprising of autism cases and controls with 2 marks (H3K27ac, H3K4me3). We show how these global patterns of epigenetic variation across individuals can be used for three applications 1) to improve power in local histone quantitative trait loci studies 2) to identify putative trans-regulators and 3) to identify regions of the genome that are enriched for complex disease risk.

12:20 PM-12:40 PM
Proceedings Presentation: MAGGIE: leveraging genetic variation to identify DNA sequence motifs mediating transcription factor binding and function
Format: Pre-recorded with live Q&A

  • Zeyang Shen, University of California San Diego, United States
  • Marten Hoeksema, University of California San Diego, United States
  • Zhengyu Ouyang, University of California San Diego, United States
  • Christopher Benner, University of California San Diego, United States
  • Christopher Glass, University of California San Diego, United States

Presentation Overview: Show

Motivation: Genetic variation in regulatory elements can alter transcription factor (TF) binding by mutating a TF binding motif, which in turn may affect the activity of the regulatory elements. However, it is unclear which TFs are prone to be affected by a given variant. Current motif analysis tools either prioritize TFs based on motif enrichment without linking to a function or are limited in their applications due to the assumption of linearity between motifs and their functional effects.
Results: We present MAGGIE, a novel method for identifying motifs mediating TF binding and function. By leveraging measurements from diverse genotypes, MAGGIE uses a statistical approach to link mutation of a motif to changes of an epigenomic feature without assuming a linear relationship. We benchmark MAGGIE across various applications using both simulated and biological datasets and demonstrate its improvement in sensitivity and specificity compared to the state-of-the-art motif analysis approaches. We use MAGGIE to reveal insights into the divergent functions of distinct NF-κB factors in the pro-inflammatory macrophages, showing its promise in discovering novel functions of TFs. The Python package for MAGGIE is freely available at https://github.com/zeyang-shen/maggie.

3:20 PM-4:00 PM
RegSys KEYNOTE: A unified atlas of CD8 T cell dysfunctional states in cancer and infection
Format: Live-stream

  • Christina Leslie

Presentation Overview: Show

CD8 T cells play an essential role in defense against viral and bacterial infections and in tumor immunity. Deciphering T cell loss of functionality is complicated by the conspicuous heterogeneity of CD8 T cell states described across different experimental and clinical settings. By carrying out a unified analysis of over 300 ATAC-seq and RNA-seq experiments from twelve independent studies of CD8 T cell dysfunction in cancer and infection we defined a shared differentiation trajectory towards terminal dysfunction and its underlying transcriptional drivers and revealed a universal early bifurcation of functional and dysfunctional T cell activation states across models. Experimental dissection of acute and chronic viral infection using scATAC-seq and allele-specific scRNA-seq identified state-specific transcription factors and captured the early emergence of highly similar TCF1+ progenitor-like populations at an early branch point, at which epigenetic features of functional and dysfunctional T cells diverge. Our atlas of CD8 T cell states will facilitate mechanistic studies of T cell immunity and translational efforts.

4:00 PM-4:20 PM
Investigating regional somatic mutation rate variation across functional elements in whole cancer genomes
Format: Pre-recorded with live Q&A

  • Christian Lee, Ontario Institute of Cancer Research, Canada
  • Diala Abd-Rabbo, Ontario Institute of Cancer Research, Canada
  • Jüri Reimand, Ontario Institute of Cancer Research, Canada

Presentation Overview: Show

Tumorigenesis is driven by somatic mutations that allow cells to acquire the hallmarks of cancer. Mutation rates in the genome vary at the nucleotide and megabase scale, however, variations at the intermediate level remain largely unexplored. Understanding this landscape may uncover mutational processes at gene regulatory and genome architectural elements, inform discovery of coding and non-coding driver mutations and identify clinical associations.
We recently developed RM2, a statistical framework to detect regional mutation rates and signature activity across whole genomes. RM2 leverages negative binomial regression and calculates nucleotide and megabase-scale covariates to estimate regional differences in genomic elements compared to flanking sequences. Using a pan-cancer dataset of 2,500 whole genomes from the PCAWG project, we confirmed mutation enrichment in CTCF binding sites and discovered a stronger signal in sites co-bound with RAD21. We also found that transcription start sites of highly expressed genes show significantly elevated mutation rates. Subsets of these sites participate in cancer-related pathways, show signals of positive selection and contain motif-disrupting mutations that overlap chromatin interactions displaying coordinated changes in gene expression, perhaps indicating non-coding mechanisms of oncogenesis. In summary, our systematic approach characterizes novel mutational processes that shape cancer genomes at classes of gene regulatory elements.

4:40 PM-5:20 PM
RegSys KEYNOTE: RNA-binding proteins and their targets
Format: Live-stream

  • Quaid Morris, Memorial Sloan Kettering Cancer Centre, United States

Presentation Overview: Show

RNA-binding proteins (RBPs) recognize specific RNA sequences or structural patterns, computational models of these, called motifs, are most easily derived from in vitro selection assays. Recently, high-throughput assays of dozens of RBPs have produced millions of target sequences for individual RBPs. I will describe recent efforts in my lab to develop new motif finding methods that use these data to infer highly detailed sequence and structural binding preferences for RBPs. Ultimately our goal is to assign binding preferences to every RBP. We have developed new matrix factorization methods that map between RNA-binding protein domain (RBD) sequence and RNA specificity. These methods improve on the state-of-art in homology-based RNA specificity prediction and automatically infer the RNA-contacting residues in RBDs.

5:20 PM-5:40 PM
Proceedings Presentation: TopicNet: a framework for measuring transcriptional regulatory network change
Format: Pre-recorded with live Q&A

  • Mark Gerstein, Yale University, United States
  • Shaoke Lou, Yale University, United States
  • Tianxiao Li, Yale University, United States
  • Xiangmeng Kong, Yale University, United States
  • Jing Zhang, Yale University, United States
  • Jason Liu, Yale University, United States
  • Donghoon Lee, Yale University, United States

Presentation Overview: Show

Next generation sequencing data highlights comprehensive and dynamic changes in the human gene regulatory network. Moreover, changes in regulatory network connectivity (network “rewiring”) manifest different regulatory programs in multiple cellular states. However, due to the dense and noisy nature of the connectivity in regulatory networks, directly comparing the gains and losses of targets of key TFs is not that informative. Thus, here, we seek a abstracted lower-dimensional representation to understand the main features of network change. In particular, we propose a method called TopicNet that applies latent Dirichlet allocation (LDA) to extract meaningful functional topics for a collection of genes regulated by a TF. We then define a rewiring score to quantify the large-scale changes in the regulatory network in terms of topic change for a TF. Using this framework, we can pinpoint particular TFs that change greatly in network connectivity between different cellular states. This is particularly relevant in oncogenesis. Also, incorporating gene-expression data, we define a topic activity score that gives the degree that a topic is active in a particular cellular state. Furthermore, we show how activity differences can highlight differential survival in certain cancers.

5:40 PM-6:00 PM
Defining cell type-varying networks of chromatin marks and transcription factors underlying cell-fate transition dynamics
Format: Pre-recorded with live Q&A

  • Shilu Zhang, University of Wisconsin-Madison, United States
  • Sushmita Roy, University of Wisconsin-Madison, United States

Presentation Overview: Show

Cell-fate specification is the process of taking a cell from one state and turning it to into a different state. Cell fate specification is determined by interactions between transcription factors (TFs) and chromatin state that together determine cell-type specific expression state. However, our understanding of what combinatorial interactions are at play and how they change is limited. Recently, genome-wide datasets for multiple chromatin marks and TFs have become available that can be leveraged to identify how chromatin marks and TFs interact and how it impacts the expression state. Here we present Cell type Varying Networks (CVN), a multi-task learning framework to capture the relationship between chromatin marks, TFs and expression levels in each cell type on a lineage and understand how changes in chromatin state and TF binding influences changes in expression levels. Compared to existing approaches like ChromNet, CVNs have better performance in predicting the interactions. Additionally, we applied CVN on data in four cellular stages during reprogramming, identified subnetworks and chromatin hubs that are common and differentially wired across the cellular stages and provided insight into network changes during reprogramming. Taken together, CVN is a powerful framework to infer cell type-specific interactions between chromatin marks, TFs and expression.