Time | Title | Authors |
10:15 AM - 10:20 AM | 3D Genome: Session Overview and Introductions | |
10:20 AM - 11:00 AM | Impact of structural variants on 3D genome structure in cancer cells | Feng Yue, Penn State, PA
, United States
|
11:00 AM - 11:10 AM | Identification of locus-specific changes in chromosome conformation between cell types reveals enrichment of enhancers | Lila Rieber, The Pennsylvania State University, United States
|
11:10 AM - 11:20 AM | Identification of differential TADs across conditions and cell lines | Tao Yang, The Pennsylvania State University, United States
|
11:20 AM - 11:30 AM | Unlocking the TAD: Chromatin folding with CHROMATIX | Alan Perez-Rathke, University of Illinois at Chicago, United States
|
11:30 AM - 11:40 AM | SPADIS: An Algorithm for Selecting Predictive and Diverse SNPs in GWAS | Serhan Yılmaz, Bilkent University, Turkey
|
11:40 AM - 11:50 AM | A graph-regularized non-negative matrix factorization method to discover organizational units of chromosomes | Da-Inn Lee, University of Wisconsin-Madison, United States
|
11:50 AM - 12:00 PM | TreeHiC: Hierarchical testing for differential chromatin interaction analysis | Ye Zheng, University of Wisconsin-Madison, United States
|
12:00 PM - 12:40 PM | Continuous-trait probabilistic model for comparing nuclear genome organization of multiple species | |
2:00 PM - 2:40 PM | High-resolution analysis of chromatin conformation capture data | Mathieu Blanchette, McGill University
, Canada
|
2:40 PM - 2:45 PM | FIND: difFerential chromatin INteractions Detection using a spatial Poisson process | Mohamed Nadhir Djekidel, Tsinghua University, China
|
2:45 PM - 2:50 PM | Chromatin interaction networks revealed unique connectivity patterns of broad H3K4me3 domains and super enhancers in 3D chromatin | Asa Thibodeau, Jackson Laboratory, United States
|
2:50 PM - 2:55 PM | HiCAGE: an R package for large-scale annotation and visualization of 3C-based genomic data | Michael Workman, Cedars-Sinai Medical Center, United States
|
2:55 PM - 3:00 PM | CompartmentExplorer: an accurate method for genomic compartments prediction from 3D genome data | Haitham Ashoor, The Jackson Laboratory for Genomic Medicine, United States
|
3:00 PM - 3:10 PM | FitHiChIP: Statistical analysis of high-resolution HiChIP and PLAC-seq data | Sourya Bhattacharyya, La Jolla Institute for Allergy and Immunology, United States
|
3:10 PM - 3:20 PM | Conserved CTCF binding sites act as allosteric hotspots: A computational knock-out study using nC-SAC model | Gamze Gursoy, Yale University, United States
|
3:20 PM - 4:00 PM | Modeling and predicting the 3D genome | William Noble, University of Washington, United States
|
4:40 PM - 5:20 PM | Connections between the structure and function of 3D genome folding
| Geoff Fudenberg, UC San Francisco, CA
, United States
|
5:20 PM - 5:30 PM | Nuclear topology modulates the mutational landscapes of cancer genomes | Subhajyoti De, Rutgers University, United States
|
5:30 PM - 5:40 PM | Computational Detecting Cancer-associated Disorders of Chromatin Interactions from ChIA-PET data | Yong Chen, The University of Texas at Dallas, United States
|
5:40 PM - 5:45 PM | In silico modelling of longevity in Drosophila – a network approach | Bethany Hall, Nottingham Trent University, United Kingdom
|
5:45 PM - 5:50 PM | Chromosomal dynamics predicted by an elastic network model explains genome-wide accessibility and long-range couplings | She Zhang, University of Pittsburgh, United States
|
5:50 PM - 5:55 PM | mHi-C: robust leveraging of multi-mapping reads in Hi-C analysis | Ye Zheng, University of Wisconsin-Madison, United States
|
5:55 PM - 6:00 PM | In silico prediction of high resolution chromosomal contact counts in multiple cell lines | Brittany Baur, University of Wisconsin-Madison, United States
|
Cell-type-specific variation in chromosome conformation may be related to biological function. However, the specific loci that drive this variation have not been identified. Metrics for comparing Hi-C experiments either calculate global measures of similarity between experiments or identify differential pairwise contact frequencies. However, there is currently no method for identifying locus-specific changes in localization. We developed a method, MultiMDS, to simultaneously infer and align 3D structures inferred from two Hi-C datasets. The output of the method is two aligned 3D structures, which are used to calculate locus-specific changes between the datasets. By applying MultiMDS to Hi-C data from GM12878 and K562, we quantified the degree to which cell-type-specific structural changes occurred along the A/B compartment axis, representing the nuclear interior/nuclear periphery axis. On average 42% of changes occurred along this axis (compared to 33% expected by chance), demonstrating the importance of lamina-associated repression in cell type identity. Focusing on changes within the A compartment, we found that loci that undergo intra-compartment changes were enriched for enhancers in one or both cell types. For loci with a cell-type-specific enhancer, we found that loci in the cell type that lacks the enhancer gain interactions with loci enriched for polycomb repression.
Topologically associating domains (TADs) are essential in constraining the activity of transcriptional regulatory elements. Previous studies have observed that changes in TADs structures are associated with altered transcriptional outcome, suggesting that architectural changes may play an important role in regulating gene expression. Identification of differential TADs structures across conditions will provide insights on condition-specific regulatory mechanisms, helping identify potential pharmacologic targets. However, so far little work has been done on detecting differential TADs structures.
Here we present a novel statistical method that can accurately and quickly uncover differential TADs structures from Hi-C data. Through a systematic evaluation, we show that our method outperforms the existing methods. To validate the identifications, we applied our method to a Hi-C dataset obtained from a knockout experiment that depleted a critical transcription regulator that co-localizes with CTCF, and identified the changes in TADs structures between the wild type and the knockout. Our results show that the identified differential TADs structures correspond well with the depleted sites, confirming the biological relevance of our identifications. We further used our methods to study the differentiation of TADs and its relationship with gene expression for two cell lines in the hemopoiesis lineage. It reveals interesting biological insights.
Phenotypic heritability of complex traits and diseases is seldom explained by individual genetic variants. Many methods have been developed to select a subset of variant loci, which are associated with or predictive of the phenotype. Selecting SNPs that are close on SNP-SNP networks have been proven successful in finding biologically interpretable and predictive SNPs. However, we argue that the connectedness constraint favors selecting redundant features that affect similar biological processes and therefore does not necessarily yield better predictive performance. In this paper, we propose a novel method called SPADIS that favors the selection of remotely located SNPs that are associated with the phenotype. This is achieved by maximizing a submodular set function with a greedy algorithm that ensures a constant factor (1 − 1/e) approximation. We compare SPADIS to the state-of-the-art method SConES, on a dataset of Arabidopsis Thaliana. SPADIS has better average phenotype prediction performance in 15 out of 17 phenotypes when same number of SNPs are selected, it identifies more candidate genes and runs faster. We investigate the use of Hi-C data to construct SNP-SNP network in the context of SNP selection problem for the first time and it yields consistent improvements in regression performance.
Hi-C sequencing technology provides key insights into the 3D structures of the human genome. Although peak detection from Hi-C experiments is a well-studied problem, quantitative comparison of Hi-C (often referred as "differential (interaction) analysis" problem) across different cellular conditions largely depends on methods borrowed from RNA-seq data analysis. Such comparisons have critical shortcomings involving testing a large collection of hypotheses in large-scale Hi-C studies. As a result, these existing strategies for detecting differential interactions fail to control the rate of false discovery for reported findings in many simulations and experimental Hi-C studies, hindering their comparative analysis.
Here, we present TreeHiC, the first hierarchical testing procedure for quantitative comparison applied to Hi-C. We demonstrate that this framework can detect differential interactions while assuring control of the FDR in complex large-scale Hi-C studies under a wide range of settings. It also is considerably more powerful than existing methods, especially in sparse testing problems where number of hypotheses could be millions with a weak signal-to-noise ratio. Additionally, while the current version of TreeHiC implements methodology pertaining to Hi-C differential analysis, it is easily extendable for other similar data such as ChIA-PET and HiChIP. TreeHiC is open source, and can be downloaded from \url{https://github.com/duydnguyen/TreeHiC}.
Broad domain promoters and super enhancers are regulatory elements that govern cell-specific functions and harbor disease-associated sequence variants. These elements are characterized by distinct epigenomic profiles, such as expanded deposition of histone marks H3K27ac for super enhancers and H3K4me3 for broad domains, however little is known about how they interact with each other and the rest of the genome in three-dimensional chromatin space. Using network theory methods, we studied chromatin interactions between broad domains and super enhancers in three ENCODE cell lines (K562, MCF7, GM12878) obtained via ChIA-PET, Hi-C, and Hi-CHIP assays. In these networks, broad domains and super enhancers interact more frequently with each other compared to their typical counterparts. Network measures and graphlets revealed distinct connectivity patterns associated with these regulatory elements that are robust across cell types and alternative assays. Machine learning models showed that these connectivity patterns could effectively discriminate broad domains from typical promoters and super enhancers from typical enhancers. Finally, targets of broad domains in these networks were enriched in disease-causing SNPs of cognate cell types. Taken together these results suggest a robust and unique organization of the chromatin around broad domains and super enhancers: loci critical for pathologies and cell-specific functions.
Originally, HiC data has been used to predict A/B compartments and sub-compartments. Genomic regions falling into the same compartments (or sub-compartment) are more likely to interact with each other compared to regions falling into different compartments. It has been reported that compartments (or sub-compartments) show distinct genomic and epigenomic features.
We present CompartmentExplorer, the first method that predicts genomic compartments from ChIA-PET data and sub-compartments from HiC and ChIA-PET data. In addition to traditional A/B compartment prediction method, CompartmentExplorer implements an efficient method based on graph-embedding followed by k-means clustering to predict sub-compartments from 3D genome data.
Using CompartmentExplorer, we show that combined ChIA-PET data from CTCF and RNAPII factors can be utilized to predict A/B compartments with comparable accuracy to HiC data (91% agreement). Further, we compared CompartmentExplorer sub-compartment predictions with 3 different methods based on hidden Markov models (HMMs), K-means clustering, spectral clustering. Using different validation techniques, including network centrality measures, and functional assessment by epigenome and transcriptome data, we show CompartmentExplorer outperforms other methods. For instance, CompartmentExploere shows significantly (p-value <2.2e-16) lower closeness centrality and higher betweenness centrality within each sub-compartment than other methods. CompartmentExpolorer provides the most distinct epigenetic and transcriptomic profiles to different sub-compartment.
HiChIP or PLAC-seq techniques combine chromatin immunoprecipitation (ChIP) with genome-wide chromosome conformation capture (Hi-C) methods to identify chromatin contacts associated with a protein or histone modification of interest. Employing Hi-C specific computational methods on HiChIP data to identify significant interactions exhibits low recall and/or low precision. Our proposed statistical method and overall pipeline, FitHiChIP, detects significant interactions by characterizing distance-dependent decay and assay-specific biases. FitHiChIP identifies 1D peaks from HiChIP data, supports external peaks, and normalizes peaks and non-peaks separately for their coverage. Results on published data from three cell lines and three primary immune cell types (Fang 2016, Mumbach 2016, 2017) show that FitHiChIP recovers high percentage of reported interactions from ChIA-PET and Hi-C data, while reducing the number of false negatives originally reported for these datasets (e.g. enhancer contacts from IL2RA promoter in T cells). For H3K27ac-based HiChIP experiments from naïve CD4 T cells, FitHiChIP identifies nearly 90k significant interactions from each biological replicate, majority of which are links between and within enhancers and promoters, and 77% (66%) of which are reproducible between replicates of the same (different) donor. FitHiChIP supports multiple input formats and multiprocessing, and processes very high resolution (e.g., 1kb) HiChIP data within hours.
The 3D conformation of DNA in the nucleus impacts gene expression, DNA replication, and multiple human diseases. The field of genome-wide chromosome conformation studies has advanced dramatically over the last decade, driven by advances in sequencing technology and development of novel assays such as Hi-C. In this talk, I will discuss some of the computational challenges posed by this type of large-scale data, including questions of statistical significance of observed contacts as well as inference of 3D structures. I will also describe how single-cell variants of the Hi-C assay can help us to understand cell-to-cell variability in 3D structure along developmental and cell cycle time scales
Nuclear organization of genomic DNA affects processes of DNA damage and repair, yet its effects on mutational landscapes in cancer genomes remain unclear. Here we analyzed genome-wide somatic mutations from 366 samples of six cancer types. We found that lamina-associated regions, which are typically localized at the nuclear periphery, displayed higher somatic mutation frequencies than did the interlamina regions at the nuclear core. This effect was observed even after adjustment for features such as GC percentage, chromatin, and replication timing. Furthermore, mutational signatures differed between the nuclear core
and periphery, thus indicating differences in the patterns of DNA-damage or DNA-repair processes. For instance, smoking and UV-related signatures, as well as substitutions at certain motifs, were more enriched in the nuclear periphery. Thus, the nuclear architecture may influence mutational landscapes in cancer genomes beyond the previously described effects of chromatin structure and replication timing.
To determine genetic factors, causing variation in longevity, several genome-wide association studies (GWAS) have been carried out on panels of long-lived individuals. Most studies tend to have little impact due to small sample sizes. For this reason model organisms such as Drosophila melanogaster have become increasingly important in identifying genetic factors affecting longevity.
In this study a network approach was used for predicting novel genes/genomic regions/single nucleotide polymorphisms (SNPs), playing a role in longevity, by integrating three-dimensional (3D) chromosome interaction data and two GWAS datasets (Burke et al. 2013; Ivanov et al. 2015). We hypothesise that 3D architecture of the Drosophila genome dictates the co-location of specific genes/genomic regions. Genes and/or SNPs, residing within these co-located genomic regions, may influence longevity either independently or have a cumulative effect on longevity. To identify influential nodes, the properties of networks were calculated (clustering, modularity and Page Rank). These nodes were further analysed using Gene Ontology.
References
Burke, M.K. et al. 2013. Genome-wide association study of extreme longevity in Drosophila melanogaster. Genome biology and evolution, 6:1-11.
Ivanov, D.K. et al. 2015. Longevity GWAS using the Drosophila genetic reference panel. Journals of Gerontology Series A: Biomedical Sciences and Medical Sciences, 70: 1470-1478.
Understanding the three-dimensional (3D) architecture of chromatin and its relation to gene expression and regulation is fundamental to understanding how the genome functions. Advances in Hi-C technology now permit us to study 3D genome organization, but we still lack an understanding of the structural dynamics of chromosomes. The dynamic couplings between regions separated by large genomic distances (>50 Mb) have yet to be characterized. We adapted a well-established protein-modeling framework, the Gaussian Network Model (GNM), to model chromatin dynamics using Hi-C data. We show that the GNM can identify spatial couplings at multiple scales: it can quantify the correlated fluctuations in the positions of gene loci, find large genomic compartments and smaller topologically-associating domains (TADs) that undergo en bloc movements, and identify dynamically coupled distal regions along the chromosomes. We show that the predictions of the GNM correlate well with genome-wide experimental measurements. We use the GNM to identify novel cross-correlated distal domains (CCDDs) representing pairs of regions distinguished by their long-range dynamic coupling and show that CCDDs are associated with increased gene co-expression. Together, these results show that GNM provides a mathematically well-founded unified framework for modeling chromatin dynamics and assessing the structural basis of genome-wide observations.
Regulatory sequence elements such as enhancers can regulate the expression level of a gene hundreds of kilobases away through chromosomal looping, which brings regulatory elements in three-dimensional proximity to target genes. Although a large number of Chromosome Conformation Capture (3C) technologies have emerged, they have been limited to well-characterized model cell lines due to sequencing costs and the required number of cells for making reliable measurements at high resolution. We have developed a regression-based method, HiC-Reg, to generate in silico contact counts using one-dimensional regulatory signals. Our predicted counts are able to recapitulate significant interactions identified in true count data, are enriched for CTCF bi-directional motifs and can be used to identify topologically associated domains. Additionally, we leveraged chromatin mark data from the Roadmap Epigenomics Mapping Consortium to generate predictions in 22 human cell lines. To verify our predictions in a completely independent cell line, we compared our predicted counts in embryonic stem cells, which have low resolution Hi-C data and found good agreement in the true and predicted counts. Taken together our regression-based framework can generate high resolution profiles of contact counts that capture individual pair-level as well as higher-order organizational units of chromosome conformation.