RegSys COSI

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
All times listed are in CDT
Wednesday, July 13th
10:30-11:10
Keynote Presentation: Integrating prior knowledge and single-cell multi-omics to understand cellular regulation
Room: Madison A
Format: Live-stream

Moderator(s): Alejandra Medina-Rivera

  • Julio Saez-Rodriguez, Heidelberg University, Germany
11:10-11:30
Proceedings Presentation: Do-calculus enables estimation of causal effects in partially observed biomolecular pathways
Room: Madison A
Format: Live from venue

Moderator(s): Alejandra Medina-Rivera

  • Sara Mohammad Taheri, Northeastern Univerusity, United States
  • Jeremy Zucker, Pacific Northwest National Laboratory, United States
  • Charles Tapley Hoyt, Laboratory of Systems Pharmacology, Harvard Medical School, United States
  • Karen Sachs, Next Generation Analytics, United States
  • Vartika Tewari, Northeastern Univerusity, United States
  • Robert Ness, Microsoft Research, United States
  • Olga Vitek, Northeastern Univerusity, United States


Presentation Overview: Show

Estimating causal queries, such as changes in protein abundance in response to a perturbation, is a fundamental task in the analysis of biomolecular pathways.
The estimation requires experimental measurements on the pathway components. However, in practice, many pathway components are left unobserved (latent) because they are either unknown or difficult to measure. Latent variable models (LVMs) are well-suited for such estimation. Unfortunately, LVM-based estimation of causal queries can be inaccurate when parameters of the latent variables are not uniquely identified, or when the number of latent variables is misspecified. This has limited the use of LVMs for causal inference in biomolecular pathways. In this manuscript, we propose a general and practical approach for LVM-based estimation of causal queries.
We prove that, despite the challenges above, LVM-based estimators of causal queries are accurate if the queries are identifiable according to Pearl's do-calculus, and describe an algorithm for its estimation. We illustrate the breadth and the practical utility of this approach for estimating causal queries in four synthetic and two experimental case studies, where structures of biomolecular pathways challenge the existing methods for causal query estimation.

11:30-11:50
Chromatin dynamics from genetic perturbations are associated with transcription and reveal a larger gene regulatory network
Room: Madison A
Format: Live from venue

Moderator(s): Alejandra Medina-Rivera

  • Kevin Moyung, Duke University, United States
  • Yulong Li, Duke University, United States
  • Alexander Hartemink, Duke University, United States
  • David MacAlpine, Duke University, United States


Presentation Overview: Show

Epigenetic mechanisms contribute to gene regulation by altering the accessibility of the chromatin, resulting in transcription factor (TF) and nucleosome occupancy changes throughout the genome. A major challenge is defining and dissecting this complex chromatin-mediated code to model regulation and predict gene expression. Existing methods like ChIP-seq have only focused on single factors and expression-based studies alone cannot distinguish between direct/indirect regulation.

We address this by employing a factor-agnostic, reverse-genetics approach to capture genome-wide TF and nucleosome occupancies in response to the individual deletion of 201 transcriptional regulators in yeast using MNase-seq. Well-established pathways were recapitulated by analyzing differences in TF and nucleosome organization. We found major chromatin changes associated with differential expression, and ascertained their direct/indirect regulation by incorporating motif/binding evidence among the TFs and target genes. Analysis of differential chromatin revealed potentially novel interactions that were not captured in expression-based studies, such as locus-specific chromatin changes which may serve to prime the locus for an efficient transcriptional response. Overall, this approach allows us to closely examine the interplay between TFs and nucleosomes genome-wide and generate a larger, more complete regulatory network, providing a deeper understanding into the complex relationship between chromatin organization and gene regulation.

11:50-12:10
Dictys: dynamic gene regulatory network inference from single-cell multi-omics
Room: Madison A
Format: Live from venue

Moderator(s): Alejandra Medina-Rivera

  • Lingfei Wang, Massachusetts General Hospital, Harvard Medical School, United States
  • Nikolaos Trasanidis, Imperial College London, United Kingdom
  • Guanlan Dong, Harvard Medical School, United States
  • Luca Pinello, Massachusetts General Hospital, Harvard Medical School, United States


Presentation Overview: Show

Gene regulatory network (GRN) determines cell function and is rewired during development. GRN inference directly quantifies gene regulatory activity for which expression level is a proxy. Single-cell transcriptome and chromatin accessibility measurements may help infer context specific GRNs at a practical cost, but also face single-cell and network inference dual challenges. We developed Dictys for context specific and dynamic single-cell GRN inference to address challenges in direct causality, feedback loop, and data sparsity with Transcription Factor footprinting, stochastic kinetic models, and probabilistic programming. We proposed regulation marker gene discovery and differential regulation analyses for cell-type specific GRNs to capture regulatory program shifts hidden in mean expression-based analyses, including expression marker genes and differential expression. We reconstructed dynamic GRNs along developmental processes with kernel smoothing of static GRNs. With its time-resolved regulatory profiles, dynamic GRN enables unbiased lineage-determining gene discovery throughout haematopoiesis with a single experiment. Dictys recovered monotonic and transient modes of gene regulatory programs in waves at their corresponding stages, as well as complex regulatory shifts unobservable in mean expression. Dictys enables such in-depth analyses with customizable movie-based Integrative Network Viewer. Single-cell dynamic GRN provides unique biological insights with time resolution and direct quantification of gene regulatory activity.

12:10-12:30
GAZE: A single-cell gene regulatory inference framework from transcriptomics and epigenomics data
Room: Madison A
Format: Live from venue

Moderator(s): Alejandra Medina-Rivera

  • Fatemeh Behjati, Uniklinikum and Goethe University Frankfurt, Germany
  • Shamim Ashrafiyan, Uniklinikum and Goethe University Frankfurt, Germany
  • Dennis Hecker, Uniklinikum and Goethe University Frankfurt, Germany
  • Marcel Schulz, Uniklinikum and Goethe University Frankfurt, Germany


Presentation Overview: Show

Single-cell sequencing has become a prevalent approach to interrogate cell-type specific signatures and cellular heterogeneity, which assists researchers to unravel the underlying complexities of diseases. This, however, creates a need for integrating single-cell omics data through building specialized machine learning approaches that are capable of inferring key regulatory players at single-cell granularity. Although there have been numerous methods proposed for discovering transcriptional regulation on the basis of scRNA-seq data, they lack delivering a comprehensive view of the whole regulatory landscape.
Here, we address these limitations by incorporating diverse single-cell modalities. We have established a versatile statistical framework, called GAZE, that guarantees a comprehensive analysis of single-cell data in an integrative fashion.
This allows us to broaden the current understanding of transcriptional regulatory mechanisms through identifying the key players involved in differential regulation of various cell types.
Interrogating these models (regression coefficients or SHAP values) enables us to reveal interesting and novel regulatory aspects. Additionally, we designed adept tests for investigating the inferred regulatory activities to identify key genes or TFs driving cell regulation.
Finally, we have implemented an R shiny application to easily visualize and retrieve important regulators at single-cell or meta-cell level.

14:30-15:10
Keynote Presentation: Understanding the interaction between sleep and chromatin regulation in Autism
Room: Madison A
Format: Live from venue

Moderator(s): Saurabh Sinha

  • Lucia Peixoto
15:10-15:30
CREMA: Extracting Gene Regulation Mechanisms from Single Cell Multi-omics Assays
Room: Madison A
Format: Live from venue

Moderator(s): Saurabh Sinha

  • Zidong Zhang, Princeton University, United States
  • Frédérique Ruf-Zamojski, Icahn School of Medicine at Mount Sinai, United States
  • Daniel Bernard, McGill University, Canada
  • Stuart Sealfon, Icahn School of Medicine at Mount Sinai, United States
  • Olga Troyanskaya, Princeton University, Flatiron Institute, United States


Presentation Overview: Show

Inferring gene regulation mechanisms is critical to understanding metazoan cell differentiation and diversity. Recent breakthrough of single cell multi-omics technologies enables simultaneous measurement of the transcriptome and epigenetic states of the genome in the same cell. Here we present CREMA (Control of Regulation Extracted from Multi-omics Assays), the first computational framework for extracting regulatory mechanisms including transcription factors (TFs) and regulatory domains from single cell multi-omics datasets. CREMA is advantageous in two aspects: 1) it incorporates chromatin accessibility to identify direct TF-target relations rather than indirect correlations; 2) it identifies regulatory domains in both the proximal and distal regions. We showed CREMA’s superior performance in reconstructing regulatory networks and identifying distal regulatory domains compared to the inference from only one modality of the multi-omics dataset. We validated CREMA’s predictions using mutation datasets and physical evidence from Chip-seq and Hi-C profiles. Finally, we applied CREMA to identify the regulatory TFs in both the proximal and distal regulatory domains that explained the cell type specific expression of genes in the mouse pituitary tissue. Overall, CREMA is a powerful framework for interpreting gene regulation mechanisms from single cell multi-omics datasets.

16:00-16:40
Keynote Presentation: Integrated analysis of single cell data across technologies and modalities
Room: Madison A
Format: Live-stream

Moderator(s): Xiuwei Zhang

  • Rahul Satija


Presentation Overview: Show

Mapping single-cell sequencing profiles to comprehensive reference datasets represents a powerful alternative to unsupervised analysis. Reference datasets, however, are predominantly constructed from single-cell RNA-seq data, and cannot be used to annotate datasets that do not measure gene expression. Here we introduce ‘bridge integration’, a method to harmonize singlecell datasets across modalities by leveraging a multi-omic dataset as a molecular bridge. Each cell in the multi-omic dataset comprises an element in a ‘dictionary’, which can be used to reconstruct unimodal datasets and transform them into a shared space. We demonstrate that our procedure can accurately harmonize transcriptomic data with independent single cell measurements of chromatin accessibility, histone modifications, DNA methylation, and protein levels. Moreover, we demonstrate how dictionary learning can be combined with sketching techniques to substantially improve computational scalability, and harmonize 8.6 million human immune cell profiles from sequencing and mass cytometry experiments. Our approach aims to broaden the utility of single-cell reference datasets and facilitate comparisons across diverse molecular modalities.

16:40-17:00
Proceedings Presentation: MOJITOO: a fast and universal method for integration of multimodal single cell data
Room: Madison A
Format: Live from venue

Moderator(s): Xiuwei Zhang

  • Mingbo Cheng, RWTH Aachen, Germany
  • Zhijian Li, RWTH Aachen, Germany
  • Ivan G. Costa, RWTH Aachen, Germany


Presentation Overview: Show

Motivation: The advent of multi-modal single cell sequencing techniques have shed new light on molecular mechanisms by simultaneously inspecting transcriptomes, epigenomes and proteomes of the same cell. However, to date, the existing computational approaches for integration of multimodal single cell data are either computationally expensive, require the delineation of parameters or can only be applied to particular modalities.

Results: Here we present a single cell multi-modal integration method, named MOJITOO (Multi-mOdal Joint IntegraTion of cOmpOnents). MOJITOO uses canonical correlation analysis for a fast and parameter free detection of a shared representation of cells from multimodal single cell data. Moreover, estimated canonical components can be used for interpretation, i.e. association of modality specific molecular features with the latent space. We evaluate MOJITOO using bi- and tri-modal single cell data sets and show that MOJITOO outperforms existing methods regarding computational requirements, preservation of original latent spaces and clustering.

Availability: The software is available at https://github.com/CostaLab/MOJITOO

17:00-17:20
Liam tackles complex multimodal single-cell data integration challenges
Room: Madison A
Format: Live from venue

Moderator(s): Xiuwei Zhang

  • Pia Rautenstrauch, Humboldt-Universität zu Berlin & Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association, Germany
  • Uwe Ohler, Humboldt-Universität zu Berlin & Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association, Germany


Presentation Overview: Show

Paired multimodal single-cell sequencing data provides unprecedented insight into the molecular state of cells but poses complex challenges for data integration. Data with complementary information and distinct statistical properties need to be combined (vertical integration), and intricate study designs and meta-analyses require sophisticated non-linear batch effect removal (horizontal integration).

Here, we present liam (leveraging information across modalities), a deep generative model for vertical and horizontal integration of paired multimodal single-cell data. Liam learns a joint low-dimensional representation of two single-cell modalities while accounting for batch effects. It supports pairwise combinations of gene expression, chromatin accessibility, and cell surface protein measurements. Liam ranked 2nd and 4th for CITE-seq and Multiome data (online training) in the Multimodal Single-Cell Data Integration NeurIPS competition, 2021. Additionally, we apply liam to a treatment-control dataset with replicates. We disentangle technical from biological variation and retain selected treatment effects in the data representation by integrating the data choosing distinct batch variables, illustrating liam's batch effect removal capabilities and flexibility concerning study design.

Our results demonstrate that liam successfully extends previous approaches for unimodal batch integration to the multimodal setting and will help unlock the potential of paired multimodal single-cell sequencing data.

17:20-17:40
The rise of sparser single-cell RNAseq datasets: consequences and opportunities
Room: Madison A
Format: Live from venue

Moderator(s): Xiuwei Zhang

  • Gerard Bouland, Delft University of Technology, Netherlands
  • Ahmed Mahfouz, Leiden University Medical Center, Netherlands
  • Marcel Reinders, Delft University of Technology, Netherlands


Presentation Overview: Show

Continuous developments in single-cell RNA-sequencing (scRNA-seq) technology results in datasets with increasingly more cells. Despite the value of larger datasets, for example in boosting statistical power, additional challenges arise as the datasets become sparser. The sparsity of scRNA-seq has generally been seen as a problem, especially because standard count distribution models fail to explain the observed excess of zeros.

Using 52 datasets, published between 2015 and 2021, we show that scRNA-seq datasets grow with 12,089 cells on average each year. In addition, we observed a strong negative correlation between number cells and degree of sparsity (Pearson’s r = -0.72, p-value = 1.23 × 10-9). As the field moves towards sparser datasets, it is vital to discuss the consequences of the ever-increasing abundance of zero measurements.

Through experiments, we show that regardless of the level of sparsity the majority of the signal is captured in binarized expression profile (a zero represents a zero count and a one a non-zero count). Also, we show that discarding counts and using a binarized representation of scRNAseq data does not result in lower performance across diverse analysis tasks, including cell type identification and differential expression analysis. Finally, we discuss the benefits of binarization.

17:40-18:00
Characterizing cellular heterogeneity in chromatin state with scCUT&Tag-pro and scChromHMM
Room: Madison A
Format: Live from venue

Moderator(s): Xiuwei Zhang

  • Avi Srivastava, New York Genome Center, United States


Presentation Overview: Show

Technologies that profile chromatin modifications at single-cell resolution offer enormous promise for functional genomic characterization, but the sparsity of the measurements and integrating multiple binding maps represent substantial challenges. Here we introduce single-cell (sc)CUT&Tag-pro, a multimodal assay for profiling protein–DNA interactions coupled with the abundance of surface proteins in single cells. In addition, we introduce single-cell ChromHMM, which integrates data from multiple experiments to infer and annotate chromatin states based on combinatorial histone modification patterns. We apply these tools to perform an integrated analysis across nine different molecular modalities in circulating human immune cells. We demonstrate how these two approaches can characterize dynamic changes in the function of individual genomic elements across both discrete cell states and continuous developmental trajectories, nominate associated motifs and regulators that establish chromatin states and identify extensive and cell-type-specific regulatory priming. Finally, we demonstrate how our integrated reference can serve as a scaffold to map and improve the interpretation of additional scCUT&Tag datasets.

Thursday, July 14th
10:15-10:55
Keynote Presentation: Deciphering gene regulatory networks of cell fate
Room: Madison A
Format: Live from venue

Moderator(s): Shaun Mahony

  • Sushmita Roy


Presentation Overview: Show

Cell fate specification is a dynamic process during which gene regulatory networks (GRNs) transition between different states and define cell type-specific patterns of gene expression. Identifying such cell type-specific gene regulatory networks is important for understanding how cells differentiate to diverse lineages from a pluripotent state, how differentiated cells can be reprogrammed to a pluripotent state, and how these networks get disrupted in diseases such as cancer and developmental disorders. Advances in sequencing technologies enable us to perform high-throughput molecular phenotyping in bulk populations and thousands of individual cells at different omic levels. However, limitations of existing tools to effectively integrate these datasets to analyze GRN dynamics during cell fate specification is a major hurdle. Furthermore, validation of GRN predictions remains a major bottleneck. In this talk I will present some of our ongoing efforts towards mapping and validating genome-scale regulatory networks as well defining cell type-specific regulatory networks. Application of our approach to hematopoietic differentiation and mouse cellular reprogramming predicted key regulatory nodes and network components likely important for establishing different cell-type specific expression programs.

10:55-11:15
scMoMaT: mosaic integration of single cell multi-omics data using matrix trifactorization
Room: Madison A
Format: Live from venue

Moderator(s): Shaun Mahony

  • Ziqi Zhang, Georgia Institute of Technology, United States
  • Haoran Sun, Georgia Institute of Technology, United States
  • Xiuwei Zhang, Georgia Institute of Technology, United States


Presentation Overview: Show

Data integration tasks in single cell genomics can be categorized into horizontal (the same modality is measured in different batches), vertical (multiple modalities are jointly measured in the same cells), diagonal (different modalities are measured in different sets of cells), and mosaic (multiple batches and multiple modalities with any arrangement) scenarios. In order to integrate as many as possible datasets for the same biological system, methods that can perform integration of any combinations of the scenarios are needed. We propose scMoMaT, a method that integrates an arbitrary number of single-cell multi-omics data matrices covering all four integration scenarios. scMoMaT tri-factorizes each data matrix into latent factors of two entities and one association matrix between entities. We test scMoMaT on four datasets that cover various data integration scenarios, and the results show that scMoMaT can uncover the cell type specific bio-markers across modalities while learning a unified cell representation. The test result also shows the robustness of scMoMaT when integrating datasets with disproportionate cell type composition between batches.

11:15-11:35
HOCMO: A higher-order correlation model to deconvolute epigenetic microenvironment in breast cancer
Room: Madison A
Format: Live from venue

Moderator(s): Shaun Mahony

  • Min Shi, Washington University in St. Louis School of Medicine, United States
  • Rintsen Sherpa, Washington University in St. Louis School of Medicine, United States
  • Klindziuk Liubou, Washington University in St. Louis School of Medicine, United States
  • Stefanie Kriel, Washington University in St. Louis School of Medicine, United States
  • Shamim Mollah, Washington University in St. Louis School of Medicine, United States


Presentation Overview: Show

An in-depth understanding of epithelial breast cell response to growth-promoting ligands is required to elucidate how signals from the microenvironment affect cell-intrinsic regulatory networks and their resultant cellular phenotypes, such as cell growth, progression, and differentiation. Understanding the cellular response to these signals is particularly important in understanding the mechanisms of breast cancer initiation and progression. There is increasing evidence that aberrant epigenetic marks are present in cells of the breast tumor microenvironment and are known to affect primary cellular processes like proliferation, differentiation, and apoptosis. However, the mechanisms by which epigenetic microenvironment signals influence these cellular phenotypes are complex and currently not well established. To deconvolute the complexity of the epigenetic microenvironment signals in breast cancer, we developed a novel correlation model: HOCMO (Higher-Order Correlation Model), using proteomics data to reveal the regulatory dynamics among signaling proteins, histones, and growth-promoting ligands in the breast epithelial cells. In the proposed model, we used a non-negative tensor factorization model to generate a correlation score to reveal the 3-way correlations among ligands, histones, and proteins. Our method revealed the onset of specific protein-histone signatures in response to growth ligands contributing to distinct cellular phenotypes indicative of breast cancer initiation and progression.

11:35-11:55
Computational modeling of mRNA degradation dynamics using deep neural networks
Room: Madison A
Format: Live-stream

Moderator(s): Shaun Mahony

  • Ofir Yaish, Ben-Gurion University, Israel
  • Yaron Orenstein, Ben-Gurion University, Israel


Presentation Overview: Show

Motivation: Messenger RNA (mRNA) degradation plays critical roles in post-transcriptional gene regulation. A major component of mRNA degradation is determined by 3′-UTR elements. Hence, researchers are interested in studying mRNA dynamics as a function of 3′-UTR elements. A recent study measured the mRNA degradation dynamics of tens of thousands of 3′-UTR sequences using a massively parallel reporter assay. However, the computational approach used to model mRNA degradation was based on a simplifying assumption of a linear degradation rate. Consequently, the underlying mechanism of 3′-UTR elements is still not fully understood.

Results: Here, we developed deep neural networks to predict mRNA degradation dynamics and interpreted the networks to identify regulatory elements in the 3′-UTR and their positional effect. Given an input of a 110 nt-long 3′-UTR sequence and an initial mRNA level, the model predicts mRNA levels of eight consecutive time points. Our deep neural networks significantly improved prediction performance of mRNA degradation dynamics compared with extant methods for the task. On the interpretability front, by using Integrated Gradients, our convolutional neural networks (CNNs) models identified known and novel cis-regulatory sequence elements of mRNA degradation. Moreover, using a mutagenesis analysis, we newly discovered the positional effect of various 3′-UTR elements.

11:55-12:15
A Vision Transformer-based approach for identifying key markers in chromatin state associated with transcription
Room: Madison A
Format: Live from venue

Moderator(s): Shaun Mahony

  • Alexander Hartemink, Duke University, United States
  • Trung Tran, Duke University, United States


Presentation Overview: Show

In eukaryotic cells, the chromatin exists as a complex and constantly changing state.
This chromatin state is partially determined by the dynamic binding of proteins and
other DNA binding factors (DBFs)—including histones, transcription factors (TFs), and
polymerases—that interact with one another, the genome, and other molecules in an
exceedingly many possible configurations. Understanding how changing chromatin
configurations associate with transcription remains a fundamental research problem.
To address this problem, we developed a neural network model based on Vision
Transformers to predict gene expression from chromatin state alone. We trained
our models on high-resolution chromatin state, captured using MNase-seq, to predict
strand-specific gene expression. While the flexibility of our models allows for easy
extensibility to other chromatin data sets, we were able accurately predict transcript
levels, without overfitting, with an R^2 of 0.6 using MNase-seq alone. We utilized the
learned attention weights in our transformer networks to identify novel chromatin
features that precisely classify modes of gene expression.

13:15-13:55
Keynote Presentation: Chromatin spatial reorganization during erythroid differentiation
Room: Madison A
Format: Live-stream

Moderator(s): Ferhat Ay

  • Mayra Furlan-Magaril, Institute of Cellular Physiology, UNAM, Mexico City
13:55-14:15
Detecting higher-order structural changes in 3D genome organization with multi-task matrix factorization
Room: Madison A
Format: Live from venue

Moderator(s): Ferhat Ay

  • Da-Inn Lee, University of Wisconsin-Madison, United States
  • Sushmita Roy, University of Wisconsin-Madison, United States


Presentation Overview: Show

Three-dimensional (3D) genome organization, which determines how the DNA is packaged inside the nucleus, has emerged as a key regulatory mechanism of cellular processes. High-throughput chromosomal conformation capture (Hi-C) technologies have enabled the study of 3D genome organization by experimentally measuring interactions among genomic regions in 3D space. Analysis of Hi-C data has revealed higher-order organizational units such as topologically associating domains (TADs). Changes or disruptions to such units have been associated with disease, development, and evolution. Therefore, a key problem in regulatory genomics is to systematically detect higher-order structural changes across Hi-C datasets from multiple conditions. Existing computational methods either do not model higher-order structural units or only compare pairs of Hi-C datasets. We address these limitations with Tree-Guided Integrated Factorization (TGIF), a new multi-task Non-negative Matrix Factorization (NMF) approach. TGIF models complex relationships among multiple Hi-C datasets as a tree such that closely related Hi-C datasets have similar lower-dimensional representation. TGIF provides a statistically significant set of differential TAD boundaries with higher precision than existing approaches. Application to a cardiomyocyte differentiation timecourse dataset identified time-point specific TAD boundaries overlapping a retrotransposon element previously shown to be important for cell fate specification in humans and apes.

14:15-14:35
Identifying differential chromatin contacts from HiChIP data
Room: Madison A
Format: Live from venue

Moderator(s): Ferhat Ay

  • Sourya Bhattacharyya, La Jolla Institute for Immunology, United States
  • Daniela Salgado-Figueroa, La Jolla Institute for Immunology, United States
  • Ferhat Ay, La Jolla Institute for Immunology, United States


Presentation Overview: Show

Chromatin loops from HiChIP/PLAC-seq data report regulatory and structural interactions at high-resolution. Differential chromatin loops between two conditions (e.g., different cell types, before/after perturbation) help us annotate condition-specific activities of genes, regulatory elements and genetic variants. Existing differential HiChIP loop callers employ count-based tests from edgeR or DESeq2 that are mainly used for RNA-seq analysis. These methods, by default, cannot model the exponential distance decay of HiChIP contacts and fail to detect differences in long-range loops (>500Kb). To counter the distance decay, stratification approaches have been proposed but they still exhibit lower statistical power in detecting differences in long-range loops. We implemented DiffHiChIP, a comprehensive framework to assess these differential HiChIP loop calling models. DiffHiChIP incorporates both DESeq2 and edgeR, utilizes independent hypothesis weighting corrected p-values, and implements stratification by equal-occupancy binning to properly analyze long-range loops. DiffHiChIP further incorporates edgeR with a generalized linear model and defines four additional models employing either of quasi-likelihood F-test, likelihood ratio test, or fold-change specific thresholds (TREAT). The GLM-based models show higher precision in recovering short and long-range differential contacts, which are highly supported by the respective Hi-C backgrounds. Overall, DiffHiChIP provides a comprehensive benchmark for differential HiChIP loop analysis.

14:35-14:55
scGAD: single-cell gene associating domain scores for exploratory analysis of scHi-C data
Room: Madison A
Format: Live from venue

Moderator(s): Ferhat Ay

  • Siqi Shen, University of Wisconsin - Madison, United States
  • Ye Zheng, Fred Hutchinson Cancer Research Center, United States
  • Sündüz Keleş, University of Wisconsin - Madison, United States


Presentation Overview: Show

Quantitative tools are needed to leverage the unprecedented resolution of single-cell high-throughput chromatin conformation (scHi-C) data and integrate it with other single-cell data modalities. We present single-cell gene associating domain (scGAD) scores as a dimension reduction and exploratory analysis tool for scHi-C data. scGAD enables summarization at the gene unit while accounting for inherent gene-level genomic biases. Low-dimensional projections with scGAD capture clustering of cells based on their 3D structures. Significant chromatin interactions within and between cell types can be identified with scGAD. We further show that scGAD facilitates the integration of scHi-C data with other single-cell data modalities by enabling its projection onto reference low-dimensional embeddings. This multi-modal data integration provides an automated and refined cell-type annotation for scHi-C data.

14:55-15:15
Base-resolution deep learning models of chromatin accessibility reveal combinatorial sequence motif syntax and regulatory variation
Room: Madison A
Format: Live from venue

Moderator(s): Shaun Mahony

  • Anusri Pampari, Stanford University, United States
  • Anna Shcherbina, Stanford University, United States
  • Surag Nair, Stanford University, United States
  • Avanti Shrikumar, Stanford University, United States
  • Aman Patel, Stanford University, United States
  • Austin Wang, Stanford University, United States
  • Soumya Kundu, Stanford University, United States
  • Anshul Kundaje, Stanford University, United States


Presentation Overview: Show

Chromatin accessibility profiles (DNASE-seq and ATAC-seq) exhibit multi-resolution shapes and spans regulated by cooperative binding of transcription factors (TFs). This landscape is challenging to mine because of confounding bias from assay-specific enzymes (DNASE-I/Tn5). Existing methods struggle to account for enzyme bias and base-resolution complexity, thus missing the high-resolution architecture of TF profiles. Here we introduce ChromBPNet to address both these aspects.

ChromBPNet is an optimized convolutional neural network architecture that models the influence of genomic sequence context on base-resolution chromatin accessibility profiles. ChromBPNet trained on five ENCODE canonical cell lines achieved superior predictive performance in held-out chromosomes, while optimally regressing out DNase-I/Tn5 enzyme bias. The models are highly performant over a range of sequencing depths, while de-noising and de-sparsifying low coverage signal profiles at individual cREs.

We improved interpretation methods for de-novo inference of contribution of individual nucleotides across all putative cREs in the genome, thereby revealing predictive motif instances and their combinatorial interaction effects on base-resolution profiles.

Finally, we developed a new variant effect score which predicts the impact of non-coding variants on the strength and shape of base-resolution chromatin profiles. Our models accurately predict quantitative trait loci associated with binding and accessibility in lymphoblastoid cell lines.

15:45-16:25
Keynote Presentation: Landscapes of Human cis-regulatory Elements and Transcription Factor Binding Sites Evolutionarily Constrained in the Mammalian Lineage
Room: Madison A
Format: Live-stream

Moderator(s): Shaun Mahony

  • Zhiping Weng


Presentation Overview: Show

Understanding the regulatory landscape of the human genome has been a long-standing goal of modern biology. Contemporary approaches identify regulatory elements using biochemical signals including epigenetic marks and transcription factor (TF) occupancy; evolutionary constraint of the resulting elements varies. The Zoonomia consortium’s 241 genomes are sufficient to achieve single-base resolution of evolutionary constraint in placental mammals. We used Zoonomia’s reference-free genome alignment and conservation score to characterize the human regulatory landscape, examining roughly one million candidate cis-regulatory elements (cCREs), 21 thousand core-promoters, and 15.6 million sites bound by 367 TFs (TFBSs). We identified a group of cCREs (439,461, occupying 4% of the human genome) and TFBSs (2,024,062; 0.8% of the genome) under mammalian constraint. Genes near constrained elements function in fundamental cellular processes like metabolism and development, and these elements yield high heritability enrichment for a panel of 69 diverse human traits. Unconstrained elements lie near genes that allow mammals to negotiate their environment (odor perception, immune response, and transposon repression), and 132 TFs are enriched in binding to genomic repeats. Our annotated elements should help interpret the regulatory landscape of the human genome.

16:25-16:45
Dirichlet variational autoencoders for de novo motif discovery from accessible chromatin
Room: Madison A
Format: Live-stream

Moderator(s): Shaun Mahony

  • Meghana Kshirsagar, Microsoft, United States
  • Han Yuan, Calico Labs, United States
  • Juan Lavista Ferres, Microsoft, United States
  • Christina Leslie, Memorial Sloan-Kettering Cancer Center, United States


Presentation Overview: Show

We present a novel unsupervised deep learning approach called BindVAE, based on Dirichlet variational autoencoders, for jointly decoding multiple TF binding signals from open chromatin regions. BindVAE can disentangle an input DNA sequence into distinct latent factors that encode: TFs that are generally expressed, composite patterns for TFs involved in cooperative binding and genomic context surrounding the binding sites, and learns cell type-specific in vivo binding signals. On the task of retrieving expressed TF motifs for a given cell-type, we find that BindVAE has a higher precision compared to other motif discovery approaches.