Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide


RegSys: Regulatory and Systems Genomics

COSI Track Presentations

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
Monday, July 9th
10:15 AM-10:20 AM
Welcome to RegSys Day 1
Room: Grand Ballroom B
  • Julia Zeitlinger, Stowers Institute, United States

Presentation Overview: Show

Welcome to RegSys COSI

10:20 AM-10:50 AM
Keynote: A tale of DNA shape analysis: a user manual and next steps
Room: Grand Ballroom B
  • Remo Rohs, University of Southern California, United States

Presentation Overview: Show

Many structures of protein-DNA complexes have been solved and high-throughput binding assays were developed because structural biology and genomics researchers were equally puzzled by the question of how proteins bind DNA with high specificity. However, there was little communication between these two fields of research. High-throughput DNA shape prediction established a cross talk between both fields. The primary goal of DNA shape analysis remains the quest for mechanistic insights into protein-DNA readout modes based on sequencing data without the need of structure determination. A plethora of high-throughput sequencing data is available from a variety of experimental approaches. In contrast, structural biology, albeit being an atomic-resolution approach, often reports the binding of a protein to only a single DNA target. A number of studies incorporated DNA shape features in the quantitative modeling of binding specificities. These studies emphasized the importance of interactions between nucleotide positions within a binding site and its flanks, although the definitions of DNA sequence versus shape still differ in structural biology and genomics. Next steps focus on understanding biological phenomena such as purifying selection, additional layers of binding specificity determinants such as DNA methylation and histone modifications, and new computational approaches including artificial intelligence and quantum computing.

10:50 AM-11:00 AM
Divergence in DNA specificity among paralogous transcription factors contributes to their differential in vivo binding
Room: Grand Ballroom B
  • Ning Shen, Duke University, United States
  • Yuning Zhang, Duke University, United States
  • Jingkang Zhao, Duke University, United States
  • Raluca Gordan, Duke University, United States

Presentation Overview: Show

DNA-binding specificity is a fundamental characteristic of transcription factors (TFs). In eukaryotes, most TF-coding genes have undergone gene duplication and divergence during evolution, resulting in paralogous factors with highly conserved DNA-binding domains and recognizing similar DNA sequence motifs. However, paralogous TFs oftentimes bind to distinct targets in the cell, and they perform distinct regulatory functions. The differential genomic targeting by paralogous TFs is generally assumed to be due to interactions with protein cofactors or the chromatin environment. Using a computational-experimental framework called iMADS (integrative Modeling and Analysis of Differential Specificity), we show that, contrary to previous assumptions, paralogous TFs bind differently to genomic target sites even in vitro. We used iMADS to quantify, model, and analyze specificity differences between 11 TFs from 4 protein families. We found that paralogous TFs have diverged mainly at medium and low affinity sites, which are poorly captured by current motif models. We identify sequence and shape features differentially preferred by paralogous TFs, and we show that the intrinsic differences in specificity among paralogous TF contribute to their differential in vivo binding. Thus, our study represents a step forward in deciphering the molecular mechanisms of differential specificity in TF families.

11:00 AM-11:20 AM
Mismatched base-pairs locally distort DNA structure and can induce increased DNA-binding by transcription factor proteins
Room: Grand Ballroom B
  • Ariel Afek, Duke University, United States
  • Raluca Gordan, Duke University, United States

Presentation Overview: Show

Transcription factors (TFs) are known to recognize DNA using both sequence (direct) and shape (indirect) readout. To investigate the contribution of shape to protein-DNA binding, we use mismatches (i.e. mis-paired bases) to induce significant structural changes in TF-DNA binding sites, with minimal changes in the DNA sequence of these sites.

We present Saturation Mismatch Binding Assay (SaMBA), the first assay to characterize the effects of mismatches on TF-DNA binding in high-throughput. For genomic sequences of interest, SaMBA generates DNA duplexes containing all possible single-base mismatches, and quantitatively assesses the effects of the mismatches on TF-DNA interactions.

We applied SaMBA to measure binding of 21 TFs (covering 14 structural families) to thousands of mismatched sequences, and mapped the impact of mismatches on these TFs. For all tested factors we found that DNA mismatches within binding sites can significantly increase TF binding levels. Furthermore, for several TFs we identified non-specific genomic regions that become strongly bound after certain mismatches are introduced.

Structural analyses of mismatches that increase TF binding revealed that these mismatches oftentimes distort the naked DNA to induce shapes that are also present in the protein-bound sites, thus providing direct evidence of the contribution of DNA shape to protein-DNA recognition.

11:20 AM-11:40 AM
Enhancer RNA profiling predicts transcription factor activity
Room: Grand Ballroom B
  • Joseph Azofeifa, University of Colorado, Boulder, United States
  • Mary Allen, University of Colorado, Boulder, United States
  • Josephina Hendrix, University of Colorado, Boulder, United States
  • Timothy Read, University of Colorado, Boulder, United States
  • Jonathan Rubin, University of Colorado, Boulder, United States
  • Robin Dowell, University of Colorado, Boulder, United States

Presentation Overview: Show

Transcription factors (TFs) manage cellular response via binding to transcriptional regulatory elements, including both enhancers and promoters. In both scenarios, short unstable bidirectional transcripts co-occur. At enhancers, these transcripts are termed enhancer RNAs (eRNAs). While the function of eRNAs remains unclear, we have shown that these transcripts provide a molecular marker of locations of transcription factor activity (Azofeifa et. al. Genome Research 2018). Key to our approach is a mixture model of RNA polymerase that precisely infers the positions of polymerase loading genome-wide. Using this method, we have begun to uncover previously unappreciated elements of transcription factor activity across numerous cell types and conditions.

11:40 AM-12:00 PM
Identification of Transcription Factor Binding Sites using ATAC-seq
Room: Grand Ballroom B
  • Zhijian Li, RWTH Aachen Medical Faculty, Germany
  • Marcel H. Schulz, Saarland University, Germany
  • Martin Zenke, RWTH Aachen Medical Faculty, Germany
  • Ivan G. Costa, RWTH Aachen University, Germany

Presentation Overview: Show

Transposase-Accessible Chromatin (ATAC) followed by sequencing (ATAC-seq) is a simple and fast protocol for detection of open chromatin. However, computational footprinting in ATAC-seq, i.e. search for regions with depletion of cleavage events due to transcription factor binding sites, has been poorly explored so far. We propose HINT-ATAC, a footprinting method that addresses ATAC- seq specific protocol artifacts. HINT-ATAC uses a probabilistic framework based on Variable-order Markov models to learn the complex sequence cleavage preferences of the transposase enzyme. Moreover, we observed specific strand specific cleavage patterns around the binding sites of transcription factors, which are determined by local nucleosome architecture. HINT-ATAC explores local nucleosome architecture to significantly outperform competing footprinting methods in predicting transcription factor binding sites by ChIP-seq.

12:00 PM-12:20 PM
Proceedings Presentation: SigMat: A Classification Scheme for Gene Sig-nature Matching
Room: Grand Ballroom B
  • Jinfeng Xiao, University of Illinois at Urbana-Champaign, United States
  • Charles Blatti, University of Illinois at Urbana-Champaign, United States
  • Saurabh Sinha, University of Illinois at Urbana-Champaign, United States

Presentation Overview: Show

Motivation: Several large-scale efforts have been made to collect gene expression signatures from a variety of biological conditions, such as response of cell lines to treatment with drugs, or tumor samples with different characteristics. These gene signature collections are utilized through bioinformatics tools for “signature matching”, whereby a researcher studying an expression profile can identify previously cataloged biological conditions most related to their profile. Signature matching tools typically retrieve from the collection the signature that has highest similarity to the user-provided profile. Alternatively, classification models may be applied where each biological condition in the signature collection is a class label; however, such models are trained on the collection of available signatures and may not generalize to the novel cellular context or cell line of the researcher’s expression profile.
Results: We present an advanced multi-way classification algorithm for signature matching, called SigMat, that is trained on a large signature collection from a well-studied cellular context, but can also classify signatures from other cell types by relying on an additional, small collection of signatures representing the target cell type. It uses these “tuning data” to learn two additional parameters that help adapt its predictions for other cellular contexts. SigMat outperforms other similarity scores and classification methods in identifying the correct label of a query expression profile from as many as 244 candidate classes (drug treatments) cataloged by the LINCS L1000 project. SigMat retains its high accuracy in cross-cell line applications even when the amount of tuning data is severely limited.
Availability: SigMat is available on GitHub at https://github.com/JinfengXiao/SigMat.

12:20 PM-12:30 PM
Candidate non-coding driver mutations in super-enhancers and long-range chromatin interaction networks
Room: Grand Ballroom B
  • Juri Reimand, Ontario Institute for Cancer Research, Canada

Presentation Overview: Show

A catalogue of mutations that drive tumorigenesis and progression is essential to understanding tumor biology and developing therapies. Protein-coding driver mutations have been well-characterized by large exome-sequencing studies, however many tumors have no mutations in protein-coding drivers and few non-coding drivers besides the TERT promoter are known. To fill this gap, we analyzed 150,000 cis-regulatory regions in 1,844 whole cancer genomes from the ICGC-TCGA PCAWG project. Using our new method, ActiveDriverWGS, we found 41 frequently mutated regulatory elements (FMREs) enriched in non-coding SNVs and indels characterized by aging-associated mutation signatures and frequent structural variants. FMREs were enriched in super-enhancers and long-range chromatin interactions, suggesting that the mutations drive cancer by altering distal gene regulation. The chromatin interaction network of FMREs and target genes revealed associations of mutations and differential gene expression of known and novel cancer genes, activation of immune response pathways and altered enhancer marks. Thus distal genomic regions may include additional, infrequently mutated drivers that act on target genes via chromatin loops. Our study is an important step towards finding such regulatory regions and deciphering the somatic mutation landscape of the non-coding genome.

12:30 PM-12:31 PM
What We Talk About When We Talk About Enhancers
Room: Grand Ballroom B
  • Mary Lauren Benton, Vanderbilt University, United States
  • Sai Charan Talipineni, University of Pittsburgh, United States
  • Dennis Kostka, University of Pittsburgh, United States
  • John Capra, Vanderbilt University, United States

Presentation Overview: Show

Non-coding gene regulatory enhancers are essential to transcription in mammalian cells. As a result, numerous experimental and computational strategies have been developed to identify cis-regulatory enhancer sequences. Most studies consider enhancers identified by only a single method, and concordance between sets from different methods has not been comprehensively evaluated. We assessed the similarities of enhancer sets identified by ten representative strategies in four biological contexts and evaluated the robustness of resulting downstream conclusions. We demonstrate significant dissimilarity between enhancer sets in genomic characteristics, evolutionary conservation, and association with functional loci. We find most regions identified as enhancers are supported by only one method. The disagreement is sufficient to influence interpretation of functional loci, and to lead to disparate conclusions about enhancer biology and disease mechanisms. We also find limited evidence that regions identified by multiple methods are better enhancer candidates than regions identified by a single strategy. Our results highlight the inherent complexity of enhancer biology and argue that current approaches have yet to adequately account for enhancer diversity. To facilitate assessment of enhancer diversity in future studies, we developed creDB, a database of enhancer annotations designed to integrate into bioinformatics workflows.

12:31 PM-12:32 PM
Differential Analysis of Regulatory Elements Based on ChIP-seq Data
Room: Grand Ballroom B
  • Verena Heinrich, Max Planck Institute for Molecular Genetics, Germany
  • Anna Ramisch, Max-Planck-Institut fuer Molekulare Genetik, Germany
  • Martin Vingron, Max Planck Institut fuer molekulare Genetik, Germany

Presentation Overview: Show

Enhancers are critical for gene regulation not only in differentiation processes but also during disease development. It remains a challenge to identify these regulatory elements in a cell-type or even disease-state dependent manner. Thus, rather than comparing separated epigenetic signature tracks we propose an approach to computationally map and compare enhancers across different samples and conditions.
Here we present a two-step framework to predict and assign condition dependent enhancers solely based on ChIP-seq histone modification data. To this end, a random forest based classifier is trained on a set of high confidence regions and used for enhancer prediction. We will demonstrate that the presented approach can be applied across different tissues and species without the need of re-training.
In a second step, all regions are assigned to different biological conditions by applying a permutation test directly to enhancer probability values and are subsequently formed into regulatory units by incorporating topologically associated domains (TADs).
We have applied our strategy to several projects which encompass different numbers and types of conditional states and were able to prioritize candidate enhancer regions that are correlated to the respective biological question.

12:32 PM-12:33 PM
ChIP-exo analysis highlights Fkh transcription factors as hubs that integrate multi-scale networks in budding yeast
Room: Grand Ballroom B
  • Thierry Mondeel, University of Amsterdam, Netherlands
  • Petter Holland, Chalmers University of Technology, Sweden
  • Jens Nielsen, Chalmers University of Technology, Sweden
  • Matteo Barberis, University of Amsterdam, Netherlands

Presentation Overview: Show

Forkhead (Fkh) transcription factors are evolutionarily conserved among eukaryotes, and coordinate a timely cell cycle progression. In budding yeast, Fkh are expressed during a lengthy window of the cell cycle, being potentially able to function as hubs integrating multiple cellular network. Here, we report on a novel ChIP-exo dataset of Fkh targets, which combines ChIP with lambda exonuclease digestion followed by high-throughput sequencing, that allows identification of a nearly complete set of binding sites at single nucleotide resolution. The available software for ChIP-seq analyses, GEM and MACE, yielded problems when analyzing ChIP-exo dataset. Therefore, we have developed a novel ChIP-exo data analysis method, that we named maxPeak. This method confirms known Fkh targets, and points to many novel ones across various cellular processes. We analyzed target genes with respect to their functional enrichment, temporal expression during the cell cycle and metabolic pathways they occur in. Furthermore, we present a comprehensive overview of the current knowledge of Fkh targets by integrating our results with complementary genome-wide studies available in literature, also pointing at differences in metabolic targets between Fkh. Our work highlights Fkh as hubs that integrate multi-scale regulatory networks to achieve proper timing of cell division in budding yeast.

12:33 PM-12:34 PM
Genome-wide identification of enhancer release and retargeting, a novel disease mechanism involving enhancer-gene target switching
Room: Grand Ballroom B
  • Joydeep Mitra, Albert Einstein College of Medicine, United States
  • Soohwan Oh, University of California San Diego, La Jolla, CA, United States
  • Wenbo Li, University of Texas Health Science Center, United States
  • Michael Rosenfeld, University of California San Diego, La Jolla, CA, United States
  • Zhengdong Zhang, Albert Einstein College of Medicine, United States

Presentation Overview: Show

Enhancers and promoters both play indispensable roles in gene transcription activation. Recently, we observed that mutation in, or loss of a preferred cognate promoter can release its regulatory enhancer to loop to, and activate an alternative promoter in its chromosomal neighborhood. Here, we present a novel computational approach to identify such 'enhancer release and retargeting' (ERR) events on a genome-wide scale, and their implications in human diseases, through statistical analysis and integration of various genomic data sets and the GWAS catalog of human complex diseases. We identified putative ERR events, with a count ranging from 31 to 525, in all 48 human tissues available in the current version of GTEx. Over a hundred of them are common to multiple tissues, with some occurring in as many as 36 tissue types. In several ERR events, enhancer retargeting would cause activation of genes associated with diseases. Our analysis shows that ERR, a previously unobserved and unsuspected mechanism, by which genetic alterations of promoters causes activation of alternative gene promoters, is a common occurrence in the transcriptomes of multiple tissue types. Moreover, our study suggests that ERR may also allude to a previously-overlooked mechanism underlying disease and developmental defect risk.

12:34 PM-12:35 PM
Network Inference from Single-Cell Transcriptomic Data Using Granger Causality
Room: Grand Ballroom B
  • Atul Deshpande, University of Wisconsin Madison; Morgridge Institute of Research, United States
  • Anthony Gitter, University of Wisconsin-Madison, United States

Presentation Overview: Show

Advances in single-cell transcriptomics have enabled observing gene expression in individual cells, providing a detailed view of dynamic biological processes. Cells’ expression states allow them to be ordered based on their progression through a process such as differentiation. Ordered data can be valuable in understanding the underlying gene-gene regulatory interactions that control the process. Regulatory interactions between genes can manifest as 'causal' relationships in their expression trends; for example, increases and decreases in expression of a regulator gene may consistently precede those of its target genes. However, the distribution of cells along the process is not uniform, preventing the use of standard mathematical methods for detecting dependencies in temporal data, including Granger causality.

We present an ensemble approach using a generalized Lasso-based Granger causality test suitable for analyzing irregular time series to infer gene regulatory networks from ordered single-cell data. Modified Borda count aggregation combines multiple rankings obtained from diverse kernel-based Granger causality analyses. The kernel smooths over the irregularly-spaced and missing observations.
We apply our algorithm to mouse embryonic stem-cell differentiation datasets and demonstrate that it recovers gold standard transcriptional regulatory interactions more accurately than existing single-cell network inference algorithms.

12:35 PM-12:36 PM
Novel normalization and clustering methods for accurately identifying subtypes of single cells in fresh human brains
Room: Grand Ballroom B
  • Yungil Kim, Icahn School of Medicine at Mt. Sinai, United States
  • John Fullard, Icahn School of Medicine at Mt. Sinai, United States
  • Ying-Chih Wang, Icahn School of Medicine at Mt. Sinai, United States
  • Kristin Beaumont, Icahn School of Medicine at Mt. Sinai, United States
  • Robert P. Sebra, Icahn School of Medicine at Mt. Sinai, United States
  • Panos Roussos, Icahn School of Medicine at Mt. Sinai, United States

Presentation Overview: Show

Better understanding of regulatory architectures and underlying disease etiology substantially enhance targeting effective risk variants or biological entities in complex diseases including Alzheimer (AD). Resolving the heterogeneity of various immune cell types, researchers recently deployed scRNA-seq into AD transgenic mice for identifying potential markers. However, current analytical pipelines for the error-prone single cell assays either used conventional methods from bulk RNA-seq studies with vulnerable assumptions or incorporated partial findings from scRNA-seq data into statistical methods. Due to the effects of inevitable noise and large sparcity in such high-dimensional data and a complex mixture of biological stochasticity and technical variability, the analytical outcomes are thus highly questionable especially in their accuracy. With more than 100,000 cells from fresh human brains in 12 individuals via drop-seq protocols, we developed a novel computational framework for identifying immune-related cell subtypes in unprecedentedly high resolution by iteratively combining parametric modeling and nonparametric approaches. We show that our approach successfully identifies both known and hidden and rare subtypes and more accurately reveals their associated marker genes than current methods. Our novel method would provide a detailed description of mechanistic interplay among distinctive immune cells in multiple scales and therefore assist better therapeutics in neurological diseases.

12:36 PM-12:37 PM
Bridging the Gap Between Detailed Biophysical Modeling and Large-Scale Inference of Expression Regulation
Room: Grand Ballroom B
  • Konstantine Tchourine, NYU - Center for Genomics and Systems Biology, United States
  • Christine Vogel, New York University, United States
  • Richard Bonneau, New York University, United States

Presentation Overview: Show

Despite the availability of large datasets of RNA and protein expression data, the underlying regulatory mechanisms that allow a cell to adequately respond to a wide range of external stimuli have not been fully quantified. For example, many existing transcription network inference methods typically do not estimate biophysical parameters involved in RNA transcription, but instead rely on probabilistic methods such as random decision forests. Conversely, proper biophysical modeling has typically only been implemented in small-scale systems, such as the LAC operon. The primary goal of my work is to demonstrate that increasing the level of biophysical detail can improve large-scale modeling and inference of regulation.

First, I demonstrate that explicitly accounting for RNA degradation improves transcription regulatory network inference in S. cerevisiae and B. subtilis, and that RNA half-lives estimated de-novo via condition- and gene-specific network inference optimization correspond to experimentally measured RNA half-lives. Furthermore, I demonstrate that more accurate mathematical treatment of the contribution of transcriptional repression to RNA expression regulation infers more biophysically accurate models of regulation for many genes.

12:37 PM-12:38 PM
Functional impact of genomic rearrangements on gene expression and chromatin organization
Room: Grand Ballroom B
  • Yad Ghavi-Helm, Institute of Functional Genomics of Lyon, France
  • Aleksander Jankowski, EMBL, Germany
  • Sascha Meiers, EMBL, Germany
  • Rebecca Rodríguez Viales, EMBL, Germany
  • Jan Korbel, EMBL, Germany
  • Eileen E. M. Furlong, EMBL, Germany

Presentation Overview: Show

Complex regulatory programs of multicellular organisms are controlled by regulatory elements, which can be located at large distances from their target genes. The scope of action of regulatory elements is constrained by the spatial chromatin organization, in particular within Topologically Associating Domains (TADs). Recent studies investigated how genomic variations that affect TAD boundaries lead to changes TAD structure and gene expression. However, those results remain limited to a small number of loci.

To globally assess the relationship between gene expression and chromatin organization, we utilized highly rearranged balancer chromosomes in Drosophila melanogaster. These chromosomes feature multiple types of genomic variations at different scales. We compared gene expression in developing embryos using RNA-seq between the rearranged chromosomes and their wild-type counterparts. Doing it in a heterozygous cross allowed us to intrinsically account for trans regulatory effects. We also quantified the differences in chromatin organization, using Hi-C.

In line with previous studies, we found that differential gene expression is correlated with local changes in genome topology. Surprisingly though, we observed that changes in large-scale chromatin organization do not globally correlate with changes in gene expression, despite the frequent disruption of TADs. Overall, our results are indicative of robust mechanisms buffering genomic variation.

12:40 PM-2:00 PM
Lunch Break
2:00 PM-2:30 PM
Keynote: Using Manifold learning to Gain Insights Into Complex Biological Systems from Single-Cell Data
Room: Grand Ballroom B
  • Smita Krishnaswamy, Yale School of Medicine, United States

Presentation Overview: Show

Recent advances in single-cell technologies enable deep insights into cellular development, gene regulation, and phenotypic diversity by measuring gene expression and epigenetic information for thousands of single cells in a single experiment. While these technologies hold great potential for improving our understanding of cellular states and progression, they also pose new challenges in terms of scale, complexity, noise and measurement artifact which require advanced mathematical and algorithmic tools to extract underlying biological signals. In this talk, I cover one of most promising techniques to tackle these problems: manifold learning, and the related manifold assumption in data analysis. Manifold learning provides a powerful framework for algorithmic approaches to process and batch-normalize and the data, visualize it, and understand dynamics and phenotypic heterogeneity. I will cover two alternative approaches to manifold learning, diffusion-based and deep learning-based and show results in several projects that involve imputation,  batch-normalization, denoising, visualization and learning of dynamics from various sorts of biological systems.

2:30 PM-2:40 PM
Single-cell RNA-seq reveals two types of tissue-specific promoters with different expression characteristics
Room: Grand Ballroom B
  • Vivekanandan Ramalingam, Stowers Institute, United States
  • Malini Natarajan, Stowers Institute, United States
  • Jeff Johnston, Stowers Institute, United States
  • Julia Zeitlinger, Stowers Institute, United States

Presentation Overview: Show

Embryonic development results in differentiated cells with diverse tissue-specific functions across an organism. While genetic studies in Drosophila have revealed regulatory networks that control embryonic development, the mechanisms that regulate the expression of tissue-specific genes in differentiated cells remain unclear. We identified and the tissue-specific genes in the late Drosophila embryo using single-cell RNA-seq. Our analysis revealed the presence of two promoter types among late-induced tissue-specific genes. One type is enriched for TATA motifs and has limited accessibility throughout the embryo, while the other type is depleted of TATA motifs, has higher promoter accessibility and shows paused RNA polymerase II throughout the embryo, independent of gene expression. Interestingly, the single-cell RNA-seq data reveal that the paused promoters have more robust expression when expressed, but also have higher basal expression in tissues where these genes are not expressed. TATA genes, on the other hand, are highly tissue-specific with little basal expression but result in larger variations among the cells that show expression. We conclude that tissue-specific genes have evolved to use different promoter types with distinct expression characteristics and propose that the higher accessibility of paused promoters makes them more responsive to activation.

2:40 PM-3:00 PM
Proceedings Presentation: Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge
Room: Grand Ballroom B
  • Sumit Mukherjee, University of Washington, United States
  • Yue Zhang, University of Washington, United States
  • Joshua Fan, University of Washington, United States
  • Georg Seelig, University of Washington, United States
  • Sreeram Kannan, University of Washington, United States

Presentation Overview: Show

Motivation: Single cell RNA-seq (scRNA-seq) data contains a
wealth of information which has to be inferred computationally from
the observed sequencing reads. As the ability to sequence more
cells improves rapidly, existing computational tools suffer from three
problems. (1) The decreased reads-per-cell implies a highly sparse
sample of the true cellular transcriptome. (2) Many tools simply
cannot handle the size of the resulting datasets. (3) Prior biological
knowledge such as bulk RNA-seq information of certain cell types
or qualitative marker information is not taken into account. Here we
present UNCURL, a preprocessing framework based on non-negative
matrix factorization for scRNA-seq data, that is able to handle varying
sampling distributions, scales to very large cell numbers and can
incorporate prior knowledge.
Results: We find that preprocessing using UNCURL consistently
improves performance of commonly used scRNA-seq tools for
clustering, visualization, and lineage estimation, both in the absence
and presence of prior knowledge. Finally we demonstrate that
UNCURL is extremely scalable and parallelizable, and runs faster
than other methods on a scRNA-seq dataset containing 1.3 million
Availability: Source code is available at https://github.com/

3:00 PM-3:20 PM
Bulk Regulatory Peak Deconvolution using Single Cell RNA-seq
Room: Grand Ballroom B
  • Michael Kleyman, Carnegie Mellon University, United States
  • Ziv Bar-Joseph, Carnegie Mellon University, United States

Presentation Overview: Show

Single cell genomic studies provide new information about cell state, heterogeneity, and regulation at an unprecedented scale. To date, most of these studies have focused on single cell rna-seq (scRNA-seq). However, to truly understand cell state and gene regulation, regulatory events such as chromatin accessibility, transcription factor binding, and histone markers must also be explored at the resolution of single cells. Unfortunately due to factors such as time scale, number of reads, and cost, single cell transcription regulatory data has been significantly more difficult to obtain compared to scRNA-seq. To address this problem we introduce a method that integrates bulk and single cell data from the same cell populations. Our method uses a probabilistic biclustering model, which jointly evaluates the likelihood of single cell expression data and bulk regulation information (for example ATAC-Seq or ChIP-Seq data). We next infer the transcriptional regulatory state of the cis regulatory region of each gene of interest in each cell using the gene and cell bicluster assignments which have been optimized using an expectation maximization algorithm. Analysis of simulated and real data indicates that the method can accurately determine single cell regulatory events even when these were only profiled in bulk.

3:20 PM-3:40 PM
Proceedings Presentation: Unsupervised embedding of single-cell Hi-C data (2)
Room: Grand Ballroom B
  • Jie Liu, University of Washington, United States
  • Dejun Lin, University of Washington, United States
  • Gurkan Yardimci, University of Washington, United States
  • William Noble, University of Washington, United States

Presentation Overview: Show

Single-cell Hi-C (scHi-C) data promises to enable scientists
to interrogate the 3D architecture of DNA in the nucleus of the
cell, studying how this structure varies stochastically or along
developmental or cell cycle axes. However, Hi-C data analysis
requires methods that take into account the unique characteristics
of this type of data. In this work, we explore whether methods that
have been developed previously for the analysis of bulk Hi-C data
can be applied to scHi-C data. We apply methods
designed for analysis of bulk Hi-C data to scHi-C data in
conjunction with unsupervised embedding. We find that one of these
methods, HiCRep, when used in conjunction with multidimensional
scaling (MDS), strongly outperforms three other methods, including a
technique that has been used previously for scHi-C analysis. We
also provide evidence that the HiCRep/MDS method is robust to
extremely low per-cell sequencing depth, that this robustness is
improved even further when high-coverage and low-coverage cells are
projected together, and that the method can be used to jointly embed
cells from multiple published datasets.

3:40 PM-4:00 PM
Manifold Alignment Reveals Correspondence between Single Cell Transcriptome and Epigenome Dynamics
Room: Grand Ballroom B
  • Joshua Welch, Broad Institute of MIT and Harvard, United States
  • Alexander Hartemink, Duke University, United States
  • Jan Prins, The University of North Carolina at Chapel Hill, United States

Presentation Overview: Show

Single cell genomic techniques promise to yield key insights into the dynamic interplay between gene expression and epigenetic modification. However, the experimental difficulty of performing multiple measurements on the same cell currently limits efforts to combine multiple genomic data sets into a united picture of single cell variation. We present an approach called MATCHER that computationally circumvents the challenges of performing multiple genomic measurements on a single cell by inferring single cell multi-omic profiles from single cell transcriptomic and epigenetic measurements performed on different cells of the same type. MATCHER works by first learning a separate manifold for the trajectory of each kind of genomic data, then aligning the manifolds to infer a shared trajectory in which cells measured using different techniques are directly comparable. Using scM&T-seq and sc-GEM data, we confirm that MATCHER accurately predicts true single cell correlations between DNA methylation and gene expression without using known cell correspondence information. We also used MATCHER to infer correlations among gene expression, chromatin accessibility, and histone modifications in single mouse embryonic stem cells and human induced pluripotent stem cells. These results represent a first step toward a united picture of heterogeneous transcriptomic and epigenetic states in single cells.

4:00 PM-4:40 PM
Coffee Break
4:40 PM-5:10 PM
Keynote: Chromatin boundaries are under strong negative selection
Room: Grand Ballroom B
  • Katie Pollard, University of California San Francisco, United States

Presentation Overview: Show

The potential impact of structural variants includes not only the duplication or deletion of coding sequences, but also the perturbation of non-coding DNA regulatory elements and structural chromatin features, including topological domains (TADs). Individual structural variants at TAD boundaries have been implicated both in cancer and developmental disease; this likely occurs via ‘enhancer hijacking’, whereby removal of the TAD boundary exposes enhancers to new target transcription start sites. We hypothesized that TAD boundaries would display evidence for negative selection against loss of their insulating function. To test this hypothesis systematically on a genome-wide scale, we quantified rates at which different genomic elements and chromatin states are covered by fixed deletions between primates and polymorphic deletions in human populations. These analyses of thousands of genomes demonstrated that active chromatin states are depleted in deletions, whereas repressed states are not. This signature of negative selection is as strong at TAD boundaries as at promoters of broadly expressed genes. Thus, the chromatin landscape constrains structural variation both within healthy humans and across primate evolution. In contrast, in patients with developmental delay, deletions occur remarkably uniformly across genomic features including TAD boundaries, suggesting a broad role for enhancer hijacking in human disease.

5:10 PM-5:20 PM
Hierarchical Domain Structure Reveals the Divergence of Activity among TADs and Boundaries
Room: Grand Ballroom B
  • Lin An, The Pennsylvania State University, United States
  • Tao Yang, The Pennsylvania State University, United States
  • Jiahao Yang, Tsinghua University, China
  • Johannes Nuebler, Massachusetts Institute of Technology, United States
  • Qunhua Li, The Pennsylvania State University, United States
  • Yu Zhang, The Pennsylvania State University, United States

Presentation Overview: Show

Mammalian genomes are organized into different levels. As one of the fundamental structural units, Topologically Associating Domains (TADs) play a key role in gene regulatory machinery. Recent studies found that hierarchical structures are also present within some TADs. However, precise identification of the locations of hierarchical TAD structures still remains challenging. Here we present HitHiC, a dynamic programming based method that can accurately and quickly uncover hierarchical TAD structures from Hi-C data. Through a systematic evaluation, we show that HitHiC has better accuracy, reproducibility and running speed than the existing methods. We applied HitHiC to high resolution Hi-C matrices and found TADs that have nested structures are in general more active than those that do not. Furthermore, we identified a group of boundaries that are shared by multiple TADs, which we call super boundaries. We showed that the super boundaries are highly enriched with active chromatin states and expressed genes. This observation of super boundaries potentially agrees with by the asymmetric movement in loop extrusion model. Altogether, our results reveal new insights towards understanding the complex system of gene regulation.

5:20 PM-5:40 PM
Proceedings Presentation: Predicting CTCF-mediated chromatin loops using CTCF-MP
Room: Grand Ballroom B
  • Ruochi Zhang, Carnegie Mellon University, United States
  • Yuchuan Wang, Carnegie Mellon University, United States
  • Yang Yang, Carnegie Mellon University, United States
  • Yang Zhang, Carnegie Mellon University, United States
  • Jian Ma, Carnegie Mellon University, United States

Presentation Overview: Show

The three dimensional organization of chromosomes within the cell nucleus is highly regulated. It is known that CTCF is an important architectural protein to mediate long-range chromatin loops. Recent studies have shown that the majority of CTCF binding motif pairs at chromatin loop anchor regions are in convergent orientation. However, it remains unknown whether the genomic context at the sequence level can determine if a convergent CTCF motif pair is able to form chromatin loop. In this paper, we directly ask whether and what sequence-based features (other than the motif itself) may be important to establish CTCF-mediated chromatin loops. We found that motif conservation measured by "branch-of-origin" that accounts for motif turn-over in evolution is an important feature. We developed a new machine learning algorithm called CTCF-MP based on word2vec to demonstrate that sequence-based features alone have the capability to predict if a pair of convergent CTCF motifs would form a loop. Together with functional genomic signals from CTCF ChIP-seq and DNase-seq, CTCF-MP is able to make highly accurate predictions on whether a convergent CTCF motif pair would form a loop in a single cell type and also across different cell types. Our work represents an important step further to understand the sequence determinants that may guide the formation of complex chromatin architectures.

5:40 PM-5:41 PM
Uncoupling of transcription and cytodifferentiation in mouse spermatocytes with impaired meiosis
Room: Grand Ballroom B
  • Alexander Fine, The Jackson Laboratory & Tufts University, United States
  • Robyn Ball, Stanford University, United States
  • Yasuhiro Fujiwara, The Jackson Laboratory, United States
  • Mary Ann Handel, The Jackson Laboratory & Tufts University, United States
  • Gregory Carter, The Jackson Laboratory & Tufts University, United States

Presentation Overview: Show

Cell differentiation is driven by changes in gene expression that manifest as changes in cellular phenotype or function. Altered cellular phenotypes, stemming from genetic mutations or other perturbations, are widely assumed to directly correspond to changes in the transcriptome and vice versa. Here, we use the cytologically well-defined Prdm9 mutant mouse as a model of developmental arrest to demonstrate that parallel programs of cellular differentiation and transcription can become dis-associated. By comparing cytological phenotype markers and transcriptomes in wild-type and mutant spermatocytes, we identified multiple instances of cellular and transcriptional uncoupling in Prdm9-/- mutants. Most notably, Prdm9-/- germ cells arrest cytologically in late-leptotene/zygotene but nevertheless develop gene expression signatures characteristic of later, post-arrest developmental substages. These findings suggest that transcriptome changes may not reliably map to cellular phenotypes in perturbed systems.

5:41 PM-5:42 PM
Module Analysis Captures Pancancer Genetically and Epigenetically Deregulated Cancer Driver Genes for Smoking and Antiviral Response
Room: Grand Ballroom B
  • Magali Champion, Paris Descartes University, France
  • Kevin Brennan, Stanford University, United States
  • Tom Croonenborghs, Broad Institute, KU Leuven, United States
  • Andrew Gentles, Stanford University, United States
  • Nathalie Pochet, Harvard University, United States
  • Olivier Gevaert, Stanford University, United States

Presentation Overview: Show

The availability of increasing volumes of multi-omics profiles across many cancers promises to improve our understanding of the regulatory mechanisms underlying cancer. The main challenge is to integrate these multiple levels of omics profiles and especially to analyze them across many cancers. Here we present AMARETTO, an algorithm that addresses both challenges in three steps. First, AMARETTO identifies potential cancer driver genes through integration of copy number, DNA methylation and gene expression data. Then AMARETTO connects these driver genes with co-expressed target genes that they control, defined as regulatory modules. Thirdly, we connect AMARETTO modules identified from different cancer sites into a pancancer network to identify cancer driver genes. Here we applied AMARETTO in a pancancer study comprising eleven cancer sites and confirmed that AMARETTO captures hallmarks of cancer. We also demonstrated that AMARETTO enables the identification of novel pancancer driver genes. In particular, our analysis led to the identification of pancancer driver genes of smoking-induced cancers and ‘antiviral’ interferon-modulated innate immune response.

5:42 PM-5:43 PM
The Epigenomic Landscape of Aberrant Splicing in Cancer
Room: Grand Ballroom B
  • Donghoon Lee, Yale University, United States
  • Jing Zhang, Yale University, United States
  • Mark Gerstein, Yale University, United States

Presentation Overview: Show

Nearly all protein-coding genes undergo alternative RNA splicing, which provides an important mean to expand transcriptome diversity beyond the scope of genomic information. While splicing is an elaborate process, it can be prone to errors that could become pathogenic. Unsurprisingly, aberrant splicing, which collectively refers to splicing events that could confer risk of a disease, is often implicated in cancer.
Recent studies have revealed splicing regulation is characterized by increased levels of nucleosome density and positioning, DNA methylation, and distinct histone modification patterns. However, most studies on aberrant splicing have largely focused on identifying genomic- and transcriptomic-level variations within splice sites, cis-acting splicing regulatory elements, and trans-acting splicing factors. The extent, nature, and effects of epigenomic dysregulation in aberrant splicing remain unsolved.
By systematically profiling the epigenomic landscape of aberrant splicing using transcriptomic and epigenomic data from the ENCODE and the Epigenome Roadmap projects, we aimed to (1) identify chromatin status and distinct epigenetic signatures that characterize aberrant splicing in cancer, (2) classify aberrant splicing by different class of epigenomic dysregulation, and (3) elucidate the role of epigenomic control in aberrant splicing. The proposed study will significantly advance our understanding of epigenomic contribution to aberrant splicing in cancer.

5:43 PM-5:44 PM
Comprehensive Analysis of Alternative Splicing Across Tumors from 8,705 Patients
Room: Grand Ballroom B
  • Andre Kahles, ETH Zurich, Switzerland
  • Kjong-Van Lehmann, ETH Zurich, Switzerland
  • Nora C Toussaint, ETH Zurich, NEXUS Personalized Health Technologies, Switzerland
  • Matthias Hüser, ETH Zurich, Switzerland
  • Stefan G Stark, ETH Zurich, Switzerland
  • Timo Sachsenberg, University of Tübingen, Germany
  • Oliver Stegle, EMBL-EBI, United Kingdom
  • Oliver Kohlbacher, University of Tübingen, Germany
  • Chris Sander, Harvard University, United States
  • Gunnar Rätsch, ETH Zurich, Switzerland

Presentation Overview: Show

Building on the tremendous resource of The Cancer Genome Atlas (TCGA), we carried out a comprehensive analysis of the transcriptomes of 8,705 cancer patients over a total of 32 cancer types. We uniformly processed both RNA and Whole Exome sequencing data from TCGA and extracted alternative splicing (AS) events and tumor variants. We observe thousands of AS events present in cancer samples that are absent from TCGA normals or GTEx samples and find a consistent increase of splicing in cancer vs normal (≈30%). In a genome-wide association of splicing and somatic variation we confirmed known trans-associations involving SF3B1 and U2AF1 and identified three additional trans-acting variants (IDH1, TADA1, PPP2R1A).
Integrating data from protein-MS for Breast and Ovarian Cancer samples, we were able to confirm on average ≈1.7 peptides derived from novel exon-exon junctions compared to ≈0.6 SNV-derived peptides per tumor sample, for peptides that were also predicted MHC-I binders. Hence, by including neoantigens derived from novel exon-exon junctions, the fraction of samples for which at least one putative neoantigen can be identified increases from 30% to 75%, presenting a new class of splicing-associated potential neoantigens that could be exploited for immunotherapy.

5:44 PM-5:45 PM
Imputed gene associations identify replicable trans-acting genes enriched in transcription pathways
Room: Grand Ballroom B
  • Heather Wheeler, Loyola University Chicago, United States
  • Sally Ploch, Loyola University Chicago, United States
  • Alvaro Barbeira, University of Chicago, United States
  • Hae Kyung Im, University of Chicago, United States

Presentation Overview: Show

Regulation of gene expression is an important mechanism through which genetic variation can affect complex traits. A substantial portion of gene expression variation can be explained by both local (cis) and distal (trans) genetic variation. Much progress has been made in uncovering cis-acting expression quantitative trait loci (cis-eQTL), but trans-acting eQTL have been more difficult to identify and replicate. Rather than testing every SNP for association with every gene, we first imputed the component of gene expression determined by local genetic variation. Then, we tested this imputed gene expression component for association with observed expression of genes on different chromosomes to identify trans-acting genes. Gene expression imputation models were trained by applying statistical machine learning to independent eQTL panels. We leverage a recent extension of PrediXcan called MulTiXcan, which is a gene level association method that aggregates imputation models across multiple eQTL panels, to identify 1159 trans-acting genes and their 1247 targets, for a total of 3657 trans-acting/target gene pairs (FDR < 0.05). Trans-acting genes identified by MulTiXcan are enriched in transcription and transcription factor pathways, which indicates our method uncovers genes of expected function.

5:45 PM-5:46 PM
RnBeads 2018 – comprehensive analysis of DNA methylation data
Room: Grand Ballroom B
  • Michael Scherer, Max-Planck Institute for Informatics, Germany
  • Fabian Müller, Max Planck Institute for Informatics, Germany
  • Yassen Assenov, German Cancer Research Center, Germany
  • Pavlo Lutsik, German Cancer Research Center, Germany
  • Jörn Walter, Saarland University, Germany
  • Thomas Lengauer, Max Planck Institute for Informatics, Germany
  • Christoph Bock, CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Austria

Presentation Overview: Show

DNA methylation is a well-studied epigenetic mark attributed with key roles in normal cell differentiation and cancer. Several software packages facilitate individual DNA methylation analysis steps, such as normalization and differential analysis. However, tools that provide a start-to-finish pipeline are rare. To fill this gap, we developed the RnBeads software package[1]. It is structured into the modules: import and export, quality control, preprocessing, covariate inference, and exploratory and differential analysis. Here, we present a substantially extended version of RnBeads with major improvements on each of the modules, including support of new data types (e.g. the Illumina EPIC array), new inference methods (such as epigenetic age prediction and estimating immune cell content of cancer samples) and improved usability by a new graphical user interface. We showcase this on four reproducible examples, each highlighting the new features: a large array-based blood data set, a whole-genome bisulfite sequencing data set on human hematopoiesis, a reduced-representation bisulfite sequencing data set on Ewing sarcoma and a benchmark data set for cross-platform integration. RnBeads represents a comprehensive tool for DNA methylation analysis and is available through R/Bioconductor.
[1] Assenov, Y. et al. Comprehensive analysis of DNA methylation data with RnBeads. Nat. Methods 11, 1138–1140 (2014).

5:46 PM-5:47 PM
Principal Component Region Set Analysis: Facilitating Interpretation of PCA Dimensions for DNA Methylation Data
Room: Grand Ballroom B
  • John Lawson, University of Virginia, United States

Presentation Overview: Show

Principal component analysis (PCA) is a widely used technique for dimensionality reduction and visualization in genomics, where the number of dimensions can be thousands or even hundreds of thousands. However, since each principal component (PC) is a linear combination of original dimensions, the meaning of the new dimensions can be hard to interpret. For PCA of DNA methylation data, the cytosines which are the original dimensions may not have a clear biological annotation, further hindering interpretation. Currently, there is a lack of methods for interpreting PCs of DNA methylation data. We present a method which annotates PCs using sets of genomic regions corresponding to a given biological annotation, such as transcription factor binding or histone modifications. We tested the method on DNA methylation data from breast cancer, confirming known associations, and data from the rare childhood cancer Ewing sarcoma, discovering novel associations. Our method is computationally efficient, scales well with increasing number of samples, and will fit well into existing analysis workflows. This method will be broadly useful to help researchers understand variation in DNA methylation among samples.

5:47 PM-5:48 PM
Learning a human-mouse functional genomics conservation score
Room: Grand Ballroom B
  • Soo Bin Kwon, University of California, Los Angeles, United States
  • Jason Ernst, University of California, Los Angeles, United States

Presentation Overview: Show

Identifying regions that are similar in function among species is crucial because they are likely to be functionally conserved and therefore important. In particular, finding functionally similar regions between human and mouse would help us make an informed use of murine models. Previous studies are limiting in that they require matching cell-types across species and that they measure conservation at the functional genomics level only for a small portion of the genome.

Thus we developed a score that quantifies conservation at the functional genomics level between human and mouse. Scores are generated at base-pair resolution by a neural network trained to learn the characteristics of functionally similar regions. Features consist of ChromHMM annotations and peak calls from DNase-seq, ChIP-seq, and CAGE experiments across human and mouse cell types. Unlike previous methods, our method does not require matching cell types, taking advantage of the wealth of publically available data.

Regulatory elements conserved in sequence or active in similar cell types in human and mouse score highly, suggesting that our score captures conservation of regulatory activity. Our score is moderately correlated with sequence conservation scores, which suggests that our method offers complementary information to existing genomic annotations based on sequence alignment.

5:48 PM-5:49 PM
SpectralTAD: identification of topologically associated domaints (TADs) using spectral clustering
Room: Grand Ballroom B
  • Kellen Cresswell, Virginia Commonwealth University, United States
  • John Stansfield, Virginia Commonwealth University, United States
  • Mikhail Dozmorov, Virginia Commonwealth University, United States

Presentation Overview: Show

Topologically associated domains (TADs) are regions of the genome defined by strong inter-TAD interaction patterns. TADs have been shown to be associated with a variety of genomic functions, including gene regulation. Despite this importance, there is no consensus on how to properly detect TADs from raw Hi-C contact matrices.

We propose a method that reframes TAD detection as a spectral clustering problem. To perform clustering, the contact matrix is interpreted as an adjacency matrix corresponding to a graph with weights indicating the number of inter-loci contacts. Spectral clustering is performed with each cluster corresponding to a unique TAD. Clusters are assigned based on the iterative discretization method described in (Yu and Shi 2003). We demonstrate how the eigengap, a common heuristic for determining the number of clusters for spectral clustering, fails when analyzing Hi-C matrices, and introduce a novel alternative for detecting the optimal number of TADs based on maximizing cluster-wise silhouette scores. Our method is implemented in SpectralTAD R package.

Results show that TAD boundaries identified with this method co-locate heavily with CTCF peaks. Additionally, this method produces TADs that have better separation when compared to other commonly used methods.

5:49 PM-5:50 PM
Prediction of complete Hi-C contact maps from genomic sequence
Room: Grand Ballroom B
  • Christopher Jf Cameron, McGill University, Canada
  • Josée Dostie, McGill University, Canada
  • Mathieu Blanchette, McGill University, Canada

Presentation Overview: Show

The three-dimensional (3D) organization of genomes plays a key role in the regulation of genes. High-throughput chromosome conformation capture (Hi-C) allows 3D genomic organization to be determined by capturing all chromatin contacts within a cell population. Much work has already been done to computationally predict these contacts using one or more types of biochemical data (e.g., nucleosome positioning, ChIP-seq, Hi-C) as input. Although informative, these models cannot be applied to cell types or genomes where the required input data is unavailable (e.g., ancestral genomes). Moreover, most studies only predict a subset of all genome-wide chromatin interactions, which are typically found at relatively short distances (<1 Mb). Here, we describe the supervised regression problem of predicting complete Hi-C contact maps from genomic sequence alone. To address this problem, we define multiple features derived from genomic sequence data that allow machine learning algorithms to fit the underlying distribution of a Hi-C contact map at restriction-fragment resolution. We show that our models provide (i) accurate predictions of Hi-C contact frequency by properly weighting input features relevant to a particular cell type as well as (ii) insight into potential factors contributing to chromatin architecture changes.

5:50 PM-5:51 PM
Identification of enhancer target genes by correlating gene expression and epigenetic modifications within topologically associated domains
Room: Grand Ballroom B
  • Konstantin Okonechnikov, German Cancer Research Center (DKFZ), Germany
  • Serap Erkek, Izmir Biomedicine and Genome Center, Turkey
  • Stephen C. Mack, Baylor College of Medicine, United States
  • Kristian W. Pajtler, German Cancer Research Center (DKFZ), Germany
  • Stefan M. Pfister, German Cancer Research Center (DKFZ), Germany
  • Lukas Chavez, University of California San Diego, United States

Presentation Overview: Show

Integrative analysis of histone modifications across diverse tissue types and diseases has uncovered the dependence of gene regulation on chromatin organization. High-throughput technologies for analyzing genome-wide chromosomal conformation have revealed that chromatin is arranged in topologically associated domains (TADs), which remain largely stable across cell types, while intra-TAD activities are cell type specific. Consequently, detailed knowledge about TAD boundaries can be utilized for associating epigenomic signals with their target genes. For example, we have recently identified enhancer-associated genes in 42 primary ependymoma brain tumors across six distinct molecular subgroups by H3K27ac ChIP-sequencing. Our TAD guided analysis leveraged Hi-C data previously generated from human fetal fibroblasts and revealed promising molecular targets for improved treatment of ependymoma tumors. We have now implemented our analysis strategy as an open-source R package that can be applied to any heterogeneous cohort of samples analyzed by a combination of gene expression and epigenetic profiling techniques with or without sample matched chromosomal conformation information. To investigate the impact of tumor specific TADs, we have generated chromosomal conformation data from patient derived ependymoma cell-lines. Our preliminary results confirm that enhancer-associated genes can largely be inferred by borrowing TADs information from unrelated reference samples.

5:51 PM-5:52 PM
Room: Grand Ballroom B
  • Aparna Gorthi, University of Texas Health San Antonio, United States
  • Alexander Bishop, University of Texas Health San Antonio, United States
  • Yidong Chen, UT Health Science Center at San Antonio, United States

Presentation Overview: Show

Ewing sarcoma is an aggressive pediatric cancer predominantly driven by EWS-FLI1. Little is known about the systemic impact of EWS-FLI1 and the underlying basis of its chemosensitivity. We probed a genome-wide RNAi screen and identified transcription, RNA metabolism and DNA damage response as being required for Ewing sarcoma viability. Interestingly, these processes were also altered in response to damage in expression profiles of Ewing sarcoma cell lines. We found a highly significant accumulation of R-loops (three-stranded RNA:DNA structures), a consequence of transcription dysregulation, in Ewing sarcoma. We developed an analysis pipeline in order to compare genome-wide R-loops and other ChIP-seq data and found a strong concordance between R-loops and RNA Polymerase II, as well as the DNA repair protein BRCA1 both in terms of peak height (depicting level of enrichment) as well as coverage. Importantly, BRCA1 co-localization was significantly higher at genes also bound by EWS-FLI1. Further, BRCA1 localization was diminished following damage in control cell lines but less so in Ewing sarcoma. Finally, these observations were confirmed by experimental evidence of impaired homologous recombination and sensitivity to PARP1 inhibitors. In conclusion, our study combines bioinformatics and experimental data to establish the underlying basis of Ewing sarcoma chemosensitivity.

Tuesday, July 10th
8:35 AM-8:40 AM
Welcome to RegSys Day 2
Room: Grand Ballroom B
8:40 AM-9:10 AM
Keynote: What are the big barriers to a complete understanding of gene regulation?
Room: Grand Ballroom B
  • Yoav Gilad, University of Chicago, United States

Presentation Overview: Show

Regulatory variation plays a central role in the genetics of complex traits, however it remains challenging to determine which variants in the genome impact gene regulation. Since 2007, we have used a panel of 70 unrelated HapMap Yoruba individuals to develop a model system for studying the genetic variants that drive differences in gene regulation between individuals. We have characterized a wide range of cellular phenotypes in these individuals, thus creating an unparalleled resource for studying links between genetic variation and regulation. Most recently, we established iPSCs from these 70 individuals and started to collecting single cell data from the iPSCs and differentiated cells. These data allow us to characterize dynamic associations between genotype and regulatory phenotypes. I will discuss the insights we have learned from this system and more generally, our current understanding of gene regulation and the gene regulatory code.

9:10 AM-9:20 AM
Long noncoding RNAs as sequence-specific DNA-binding factors
Room: Grand Ballroom B
  • Chao-Chung Kuo, RWTH Aachen Medical Faculty, Germany
  • Nevcin Senturk, DKFZ, Germany
  • Ingrid Grummt, DKFZ, Germany
  • Ivan G. Costa, RWTH Aachen University, Germany

Presentation Overview: Show

The importance of proteins-DNA interactions in gene regulation is indisputable. Yet, the role of RNA-DNA interactions in gene regulation have been poorly explored so far. We are interested in triple helices, where a single RNA strand binds to the major groove of a double helix and individual nucleobases form specific Hoogsteen hydrogen bonds with adenine or guanine residues of the purine-rich DNA strand. There is an increasing evidence on the use of triple helices binding in transcription regulation. So far, computational methods for triple helix detection are based on enumerating all triple helices, i.e. small sequences with high proportion of bases following the triple helix code, for a given pair of RNA and DNA sequence. We describe here a method for statistical characterization of set of RNAs to bind to particular DNA regions. Triplex Domain Finder indicates regions within the RNAs (DNA binding domains) with the highest potential for forming triple helices. Case studies on long noncoding RNAs known to form triple helices demonstrate that TDF is able to recover known regions of RNA and DNA forming triple helices. Moreover, sequencing confirms triple helix binding sites of a known and a novel Meg3 DNA binding domain.

9:20 AM-9:40 AM
Proceedings Presentation: Covariate-Dependent Negative Binomial Factor Analysis of RNA Sequencing Data
Room: Grand Ballroom B
  • Siamak Zamani Dadaneh, Texas A&M University, United States
  • Mingyuan Zhou, The University of Texas at Austin, United States
  • Xiaoning Qian, Texas A&M University, United States

Presentation Overview: Show

Motivation: High-throughput sequencing technologies, in particular RNA sequencing (RNA-seq), have
become the basic practice for genomic studies in biomedical research. In addition to studying genes
individually, for example, through differential expression analysis, investigating coordinated expression
variations of genes may help reveal the underlying cellular mechanisms to derive better understanding
and more effective prognosis and intervention strategies. Although there exists a variety of co-expression
network based methods to analyze microarray data for this purpose, instead of blindly extending these
methods for microarray data that may introduce unnecessary bias, it is crucial to develop methods well
adapted to RNA-seq data to identify the functional modules of genes with similar expression patterns.
Results: We have developed a fully Bayesian covariate-dependent negative binomial factor analysis
method—dNBFA—for RNA-seq count data, to capture coordinated gene expression changes, while
considering effects from covariates reflecting different influencing factors. Unlike existing co-expression
network based methods, our proposed model does not require multiple ad-hoc choices on data processing,
transformation, as well as co-expression measures, and can be directly applied to RNA-seq
data. Furthermore, being capable of incorporating covariate information, the proposed method can
tackle setups with complex confounding factors in different experiment designs. Finally, the natural model
parameterization removes the need for a normalization preprocessing step, as commonly adopted to
compensate for the effect of sequencing-depth variations. Efficient Bayesian inference of model parameters
is derived by exploiting conditional conjugacy via novel data augmentation techniques. Experimental
results on several real-world RNA-seq datasets on complex diseases suggest dNBFA as a powerful tool
for discovering the gene modules with significant differential expression and meaningful biological insight.
Availability: dNBFA is implemented in R language and is available at https://github.com/siamakz/dNBFA.

9:40 AM-10:15 AM
Coffee Break
10:15 AM-10:45 AM
Keynote: Predictive local sequence features can distinguish the binding specificity of transcription factor family members
  • Jun Song, University of Illinois at Urbana-Champaign, United States

Presentation Overview: Show

All members of a given transcription factor (TF) family may recognize highly similar core motifs; yet, they often have distinct binding patterns and functions. For example, the basic helix-loop-helix TF family consists of a diverse set of TFs that bind a consensus hexamer motif called the E-box and regulate important developmental and cellular processes.  Two well-known members of this family are the oncogenic transcription factors MYC and microphthalmia-associated transcription factor (MITF). MITF is the master regulator of melanocyte differentiation; paradoxically, it is also a proliferation-promoting lineage-restricted oncogene in melanomas and implicated in conferring drug resistance. MYC is another potent oncoprotein in multiple types of cancers. MITF and MYC not only share the core binding motif, but are also the two most highly expressed bHLH-Zip transcription factors in melanocytes, raising the possibility that they may compete for common binding sites. We built computational predictive models, including a novel convolutional boosted decision tree, that use genetic sequence features flanking E-boxes to accurately distinguish MITF vs. MYC-MAX binding sites. This finding demonstrates that specific combinatorial sequence features that interact with E-boxes play an important role in differentially recruiting MITF vs. MYC to target genes.

10:45 AM-11:00 AM
PREDICTD: PaRallel Epigenomics Data Imputation with Cloud-based Tensor Decomposition
Room: Grand Ballroom B
  • Timothy Durham, University of Washington, United States
  • Maxwell Libbrecht, University of Washington, United States
  • J Jeffry Howbert, University of Washington, United States
  • Jeffrey Bilmes, University of Washington, United States
  • William Noble, University of Washington, United States

Presentation Overview: Show

The Encyclopedia of DNA Elements (ENCODE) and the Roadmap Epigenomics Project seek to characterize the epigenome in diverse cell types using assays that identify, for example, genomic regions with modified histones or accessible chromatin. These efforts have produced thousands of datasets but cannot possibly measure each epigenomic factor in all cell types. To address this, we present a method, PaRallel Epigenomics Data Imputation with Cloud-based Tensor Decomposition (PREDICTD), to computationally impute missing experiments. PREDICTD leverages an elegant model called “tensor decomposition” to impute many experiments simultaneously. Tensor decomposition learns a low-rank representation of the epigenome that captures latent patterns in ChIP-seq and DNase-seq experiments from the Roadmap Epigenomics data corpus. Compared with the current state-of-the-art method, ChromImpute, PREDICTD produces lower overall mean squared error, and combining the two methods yields further improvement. We show that PREDICTD data captures enhancer activity at noncoding human-accelerated regions. PREDICTD provides reference imputed data and open-source software for investigating new cell types, and demonstrates the utility of tensor decomposition and cloud computing, both promising technologies for bioinformatics.

11:00 AM-11:20 AM
Systematic evaluation of multimodal approaches to predict in vivo transcription factor binding across cell types
Room: Grand Ballroom B
  • Akshay Balsubramani, Stanford University, United States
  • Anshul Kundaje, Stanford University, United States

Presentation Overview: Show

Genome-wide in vivo transcription factor (TF) binding landscapes vary widely across cell states. It is infeasible to assay binding for all TFs in all cell states, motivating complementary computational approaches to predict TF binding in unseen cell types. We conducted the ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge, an open community endeavor to assess state-of-the-art computational TF binding prediction methods. Forty international teams predicted whole-genome TF binding for 31 TFs in 13 cell types - including previously unreleased ENCODE data from primary cells - using DNA sequence, chromatin accessibility, and gene expression inputs. In this context, predicting binding on an unassayed cell type is a statistically difficult problem, requiring predictors to learn patterns of binding that generalize across cell types, but are not driven by sequence alone. We analyze in detail the computational techniques used to achieve this in the challenge, yielding robust insights into such patterns across tissues and top methods, and identifying especially difficult cell-type-specific binding landscapes modulated by cell-type-specific cofactors. Our data, methodology, and analysis provide a unique, standardized resource for research into practically useful genome-wide TF binding site prediction methods, which will be needed as the human regulatory network is mapped.

11:20 AM-11:40 AM
Deep neural networks for characterizing sequence and chromatin pre-determinants of transcription factor binding
Room: Grand Ballroom B
  • Shaun Mahony, The Pennsylvania State University, United States
  • Divyanshi Srivastava, The Pennsylvania State University, United States
  • Begüm Aydin, New York University, United States
  • Akshay Kakumanu, The Pennsylvania State University, United States
  • Esteban Mazzoni, New York University, United States

Presentation Overview: Show

In any given cell type, each transcription factor’s (TF) binding targets are determined by a complex interplay between its DNA sequence binding preference and the cell-specific chromatin environment. While several recent methods focus on understanding the relationships between TF binding and concurrent (i.e. same cell type) chromatin data, less attention has been paid to understanding chromatin pre-determinants of future TF binding in dynamic regulatory settings. Here, we present a deep neural network architecture that jointly models DNA sequence and prior chromatin data of heterogeneous types to predict future binding of induced TFs.

We will demonstrate our approach in the analysis of divergent neuronal subtype specification by two proneural bHLH factors, Ascl1 and Neurog2. While we express both factors in the same initial cell type, mouse embryonic stem cells, their divergent genomic binding patterns result in differential chromatin accessibility and activity landscapes that affect the future genomic binding of shared downstream neurogenic TFs. We will demonstrate that our neural networks can effectively characterize the degree to which DNA-binding of induced TFs in this system is affected by prior chromatin landscapes.

11:40 AM-12:00 PM
Fine Mapping of Chromatin Interactions via Neural Networks
Room: Grand Ballroom B
  • Artur Jaroszewicz, University of California, Los Angeles, United States
  • Jason Ernst, University of California, Los Angeles, United States

Presentation Overview: Show

High-throughput chromatin conformation assays such as Hi-C have enabled genome-wide detection of long-range chromatin contacts, which have been shown to be integral in various regulatory mechanisms. However, interactions from Hi-C experiments are typically identified at relatively coarse resolutions (e.g., 5-25kb) and thus do not robustly identify interactions at a fine-scale.

We present a novel computational method, Chromatin Interaction Siamese Convolutional Neural Net (ChISCNN), to fine map Hi-C detected interactions to their likely source at a high resolution. Using high resolution information within DNase-seq and ChIP-seq data for transcription factors and histone marks, we trained a Siamese Convolutional Neural Network (SCNN) to discriminate between true interactions and non-interactions. We then use a feature importance algorithm along with the SCNN to assign each pair of 100bp subregions a score that corresponds to its importance in the Hi-C interaction. We demonstrate the effectiveness of our approach both by comparing our predictions to independent genome annotations and the recovery of original Hi-C peaks after extending their boundaries. Finally, we discuss what signals give chromatin interactions their specificity.

12:00 PM-12:20 PM
Reconstructing differentiation networks and their regulation from time series single-cell expression data
Room: Grand Ballroom B
  • Jun Ding, Carnegie Mellon University, United States
  • Bruce Aronow, Cincinnati Children's Hospital, United States
  • Kaminski Naftali, Yale School of Medicine, United States
  • Joseph Kitzmiller, Cincinnati Children's Hospital, United States
  • Jeffrey Whitsett, Cincinnati Children's Hospital, United States
  • Ziv Bar-Joseph, Carnegie Mellon University, United States

Presentation Overview: Show

Generating detailed and accurate organogenesis models using single-cell RNA-seq data remains a major challenge. Current methods have relied primarily on the assumption that descendant cells are similar to their parents in terms of gene expression levels. These assumptions do not always hold for in vivo studies, which often include infrequently sampled, unsynchronized, and diverse cell populations. Thus, additional information may be needed to determine the correct ordering and branching of progenitor cells and the set of transcription factors (TFs) that are active during advancing stages of organogenesis. To enable such modeling, we have developed a method that learns a probabilistic model that integrates expression similarity with regulatory information to reconstruct the dynamic developmental cell trajectories. When applied to mouse lung developmental data, the method accurately distinguished different cell types and lineages. Existing and new experimental data validated the ability of the method to identify key regulators of cell fate.

12:20 PM-12:40 PM
Quantitating translational control: mRNA abundance-dependent and independent contributions and the mRNA sequences that specify them
Room: Grand Ballroom B
  • Jingyi Jessica Li, University of California, Los Angeles, United States
  • Guo-Liang Chew, Fred Hutchinson Cancer Research Center, United States
  • Mark D. Biggin, Lawrence Berkeley National Laboratory, United States

Presentation Overview: Show

Translation rate per mRNA molecule correlates positively with mRNA abundance. As a result, protein levels do not scale linearly with mRNA levels, but instead scale with the abundance of mRNA raised to the power of an ‘amplification exponent’. Here we show that to quantitate translational control, the translation rate must be decomposed into two components. One, TRmD, depends on the mRNA level and defines the amplification exponent. The other, TRmIND, is independent of mRNA amount and impacts the correlation coefficient between protein and mRNA levels. We show that in Saccharomyces cerevisiae TRmD represents ∼20% of the variance in translation and directs an amplification exponent of 1.20 with a 95% confidence interval [1.14, 1.26]. TRmIND constitutes the remaining ∼80% of the variance in translation and explains ∼5% of the variance in protein expression. We also find that TRmD and TRmIND are preferentially determined by different mRNA sequence features: TRmIND by the length of the open reading frame and TRmD both by a ∼60 nucleotide element that spans the initiating AUG and by codon and amino acid frequency. Our work provides more appropriate estimates of translational control and implies that TRmIND is under different evolutionary selective pressures than TRmD.

12:40 PM-2:00 PM
Lunch Break
2:00 PM-2:30 PM
Keynote: From Genetics To Therapeutics: Uncovering And Manipulating The Circuitry Of Non-coding Disease Variants
Room: Grand Ballroom B

Presentation Overview: Show

Perhaps the greatest surprise of human genome-wide association studies (GWAS) is that 90% of disease-associated regions do not affect proteins directly, but instead lie in non-coding regions with putative gene-regulatory roles. This has increased the urgency of understanding the non-coding genome, as a key component of understanding human disease. To address this challenge, we generated maps of genomic control elements across 127 primary human tissues and cell types, and tissue-specific regulatory networks linking these elements to their target genes and their regulators. We have used these maps and circuits to understand how human genetic variation contributes to disease and cancer, providing an unbiased view of disease genetics and sometimes re-shaping our understanding of common disorders. For example, we find evidence that genetic variants contributing to Alzheimer’s disease act primarily through immune processes, rather than neuronal processes. We also find that the strongest genetic association with obesity acts via a master switch controlling energy storage vs. energy dissipation in our adipocytes, rather than through the control of appetite in the brain. We also combine genetic information with regulatory annotations and epigenomic variation across patients and healthy controls to discover new disease genes and regions with roles in Alzheimer’s disease, heart disease, prostate cancer, and to understand cellular diversity through single-cell RNA-seq and pleiotropic effects by integration with rich intermediate phenotypes and electronic health records. Lastly, we develop systematic technologies for systematically manipulating these circuits by high-throughput reporter assays, genome editing, and gene targeting in human cells and in mice, demonstrating tissue-autonomous therapeutic avenues in Alzheimer’s disease, obesity, and cancer. These results provide a roadmap for translating genetic findings into mechanistic insights and ultimately therapeutic treatments for complex disease and cancer.

2:40 PM-3:00 PM
Consensus architectures of regulatory DNA actuation across 420 human cell types and states
Room: Grand Ballroom B
  • Wouter Meuleman, Altius Institute for Biomedical Sciences, United States
  • Alexander Muratov, Altius Institute for Biomedical Sciences, United States
  • Eric Rynes, Altius Institute for Biomedical Sciences, United States
  • Sean Thomas, Altius Institute for Biomedical Sciences, United States
  • Eric Haugen, Altius Institute for Biomedical Sciences, United States
  • Richard Sandstrom, Altius Institute for Biomedical Sciences, United States
  • Rajinder Kaul, Altius Institute for Biomedical Sciences, United States
  • John Stamatoyannopoulos, Altius Institute for Biomedical Sciences, United States

Presentation Overview: Show

The human genome encodes vast numbers of non-coding elements whose combined actuation patterns reflect regulatory processes across cellular states and conditions. Comprehensively annotated high-resolution maps of regulatory regions and their inter-cell type dynamics have been lacking.

To address this issue, we applied a joint experimental and computational approach, integrating deeply sequenced DNaseI hypersensitivity assays spanning 420 distinct human cell types and states. These data enable a systematic and principled approach to studying regulatory architecture and dynamics on a global scale. We define a common coordinate system for regulatory DNA marked by DNase I hypersensitive sites, encompassing over 3 million elements defined with unprecedented resolution and confidence.

Through systematic analysis of the dynamics of these regulatory regions across samples, we derive a collection of regulatory flavors, providing a novel multi-component annotation of the human regulome. Using admixtures of multiple components, we decompose biological features of cell and tissue samples and define the extent to which individual regulatory elements contribute to broader cellular regulatory programs.

These previously unappreciated features readily reveal the functional properties of genes and pathways based on their regulatory landscape. Moreover, they provide a fundamentally new framework for understanding how disease-associated variation maps to genome function.

3:00 PM-3:20 PM
Continuous-trait probabilistic model for comparing multi-species functional genomic data
Room: Grand Ballroom B
  • Yang Yang, Carnegie Mellon University, United States
  • Quanquan Gu, University of Virginia, United States
  • Yang Zhang, Carnegie Mellon University, United States
  • Takayo Sasaki, Florida State University, United States
  • Julianna Crivello, University of Connecticut, United States
  • Rachel O'Neill, University of Connecticut, United States
  • David Gilbert, Florida State University, United States
  • Jian Ma, Carnegie Mellon University, United States

Presentation Overview: Show

A large amount of multi-species functional genomic data from high-throughput assays are becoming available to help understand the molecular mechanisms for phenotypic diversity across species. However, continuous-trait probabilistic models, which are key to such comparative analysis, remain under-explored. Here we develop a new model, called phylogenetic hidden Markov Gaussian processes (Phylo-HMGP), to simultaneously infer heterogeneous evolutionary states of functional genomic features in a genome-wide manner. Both simulation studies and real data application demonstrate the effectiveness of Phylo-HMGP. Importantly, we applied Phylo-HMGP to analyze a new cross-species DNA replication timing (RT) dataset from the same cell type in five primate species (human, chimpanzee, orangutan, gibbon, and green monkey). We demonstrate that our Phylo-HMGP model enables discovery of genomic regions with distinct evolutionary patterns of RT. Our method provides a generic framework for comparative analysis of multi-species continuous functional genomic signals to help reveal regions with conserved or lineage-specific regulatory roles.

3:20 PM-3:40 PM
Dynamic Regulatory Module Networks for integrative inference of cell type-specific regulatory programs
Room: Grand Ballroom B
  • Deborah Chasman, University of Wisconsin-Madison, United States
  • Rupa Sridharan, University of Wisconsin--Madison, United States
  • Sushmita Roy, University of Wisconsin-Madison, United States

Presentation Overview: Show

Changes in transcriptional regulatory networks can significantly alter cell fate. To gain insight into transcriptional dynamics, several studies have profiled transcriptomes and epigenomes at different stages of a developmental process. However, we lack systematic methods to integrate these data across multiple cell types to infer cell type regulatory networks.

We present a novel approach, Dynamic Regulatory Module Networks (DRMNs), to model regulatory network dynamics on a cell lineage. DRMNs represent a cell type specific network by a set of expression modules and associated regulatory programs, and probabilistically model the transitions between cell types. DRMNs learn a cell type’s regulatory networks from input expression and epigenomic profiles using multi-task learning to exploit cell type relatedness.

We applied DRMNs to two datasets measuring transcriptomic and epigenomic profiles for multiple stages of cellular reprogramming in mouse. Compared to a baseline method that does not model cell type relationships, DRMNs more faithfully capture the similarity of regulatory networks inferred for each cell type, while maintaining the same or better quality of predicted expression. DRMN modules are enriched for known pluripotency regulators and identify key genes that transition in expression during reprogramming, suggesting that DRMNs can successfully infer cell type specific regulatory networks.

3:40 PM-4:00 PM
Assessing the Gene Regulatory Landscape in 1,188 Human Tumors
Room: Grand Ballroom B
  • Kjong-Van Lehmann, ETH Zurich, Switzerland
  • Claudia Calabrese, EMBL-EBI, United Kingdom
  • Lara Urban, EMBL-EBI, United Kingdom
  • Fenglin Liu, Peking University, China
  • Serap Erkek, EMBL, Germany
  • Nuno Fonseca, EMBL-EBI, United Kingdom
  • Andre Kahles, ETH Zurich, Switzerland
  • Helena Kilpinen, University College London, United Kingdom
  • Julia Markowski, Max Delbrück Center for Molecular Medicine, Germany
  • Sebastian Waszak, EMBL, Germany
  • Jan Korbel, EMBL, Germany
  • Zemin Zhang, Peking University, China
  • Alvis Brazma, EMBL-EBI, United Kingdom
  • Gunnar Rätsch, ETH Zurich, Switzerland
  • Roland Schwarz, Max Delbrück Center for Molecular Medicine, Germany
  • Oliver Stegle, EMBL-EBI, United Kingdom

Presentation Overview: Show

To better understand the regulatory effects of germline and somatic variants on gene expression, we have analyzed matched whole-genome sequencing and RNA-seq data of 1,188 cancers from the ICGC PanCancer Analysis of Whole Genomes (PCAWG) across 27 cancer types. We have created a germline as well as a somatic eQTL map. To analyze the effect of somatic variants on gene expression we applied multiple models to account for cancer specific factors (e.g.: purity, low recurrence within population). We identified 2,509 germline eQTL as well as 649 somatic eQTL. The latter were enriched for poised promoters and enhancer regions. To assess the regulatory role of somatic copy number alterations, germline variants, and somatic variants, we analyzed their effect in context of changes in gene expression and allele specific expression (ASE). We found that most of the gene expression variability observed is explained by somatic copy number alteration. Finally, we have also analyzed global mutational signatures in context of gene expression and germline variants. This work represents the first large-scale assessment of the effects of both germline and somatic genetic variation on gene expression in cancer and creates a valuable resource cataloguing these effects.

4:20 PM-4:40 PM
Proceedings Presentation: Quantifying the similarity of topological domains across normal and cancer human cell types
Room: Grand Ballroom B
  • Natalie Sauerwald, Carnegie Mellon University, United States
  • Carl Kingsford, Carnegie Mellon University, United States

Presentation Overview: Show

Motivation: Three-dimensional chromosome structure has been increasingly shown to influence various levels of cellular and genomic functions. Through Hi-C data, which maps contact frequency on chromosomes, it has been found that structural elements termed topologically associating domains (TADs) are involved in many regulatory mechanisms. However, we have little understanding of the level of similarity or variability of chromosome structure across cell types and disease states. In this work we present a method to quantify resemblance and identify structurally similar regions between any two sets of TADs.
Results: We present an analysis of 23 human Hi-C samples representing various tissue types in normal and cancer cell lines. We quantify global and chromosome-level structural similarity, and compare the relative similarity between cancer and non-cancer cells. We find that cancer cells show higher structural variability around commonly mutated pan-cancer genes than normal cells at these same locations.

4:40 PM-5:00 PM
Coffee Break (on the go) to Closing Keynote