Posters - Schedules

Posters Home

View Posters By Category

Monday, July 11 and Tuesday, July 12 between 12:30 PM CDT and 2:30 PM CDT
Wednesday July 13 between 12:30 PM CDT and 2:30 PM CDT
Session A Poster Set-up and Dismantle Session A Posters set up:
Monday, July 11 between 7:30 AM CDT - 10:00 AM CDT
Session A Posters dismantle:
Tuesday, July 12 at 6:00 PM CDT
Session B Poster Set-up and Dismantle Session B Posters set up:
Wednesday, July 13 between 7:30 AM - 10:00 AM CDT
Session B Posters dismantle:
Thursday. July 14 at 2:00 PM CDT
Virtual: A High-resolution Peak Caller for ATAC-seq
COSI: RegSys
  • Liangti Dai, Centre for Computational Biology, MRC WIMM, University of Oxford;Doctoral Training Centre, University of Oxford, United Kingdom
  • Gerton Lunter, Department of Epidemiology, University of Groningen;Centre for Computational Biology, MRC WIMM, University of Oxford, Netherlands


Presentation Overview: Show

Genome-wide chromatin accessibility provides extensive information of gene regulation, where accessible genomic regions comprise active cis-regulatory elements (e.g. promoters and enhancers) that regulate the activity of gene expression. To profile chromatin accessibility, ATAC-seq is a fast, sensitive and popular technique. However, the unstable signal-to-noise ratio, unavailable background control and single-cell level heterogeneity all bring challenges in the data analysis. A core step in processing ATAC-seq data is peak calling, which is to identify openly accessible regions (peaks) from aggregated ATAC-seq signals. Existing peak calling algorithms are adopted from models designed for control-guided frameworks or modelling based on prior knowledge of nucleosome-size information, with reported drawbacks of low sensitivity in capturing rare signals and/or being computationally expensive. To achieve higher resolution of peak calling by making optimal use of data, we proposed a statistical model based on the observation of both coverage and size of ATAC-seq fragments. In particular, We explicitly modeled the mechanism of the generation of ATAC-seq fragments, implemented using a Hidden Markov Model in which the hidden states correspond to different levels of accessibility at per base-pair resolution. Evaluation on various bulk and single-cell ATAC-seq datasets have shown an improved sensitivity and resolution in retrieving active regulatory regions, detecting cell type-specific DNA binding motifs, and capturing rare signals.

Virtual: Accurate estimation of intrinsic biases for improved analysis of bulk and single-cell chromatin accessibility sequencing data with SELMA
COSI: RegSys
  • Shengen Shawn Hu, University of Virginia, United States
  • Lin Liu, Shanghai Jiao Tong University, China
  • Qi Li, Tsinghua University, China
  • Wenjing Ma, University of Virginia, United States
  • Michael Guertin, University of Connecticut, United States
  • Clifford Meyer, Dana-Farber Cancer Institute, United States
  • Ke Deng, Tsinghua University, China
  • Tingting Zhang, University of Pittsburgh, United States
  • Chongzhi Zang, University of Virginia, United States


Presentation Overview: Show

Genome-wide profiling of chromatin accessibility by DNase-seq or ATAC-seq has been widely used to identify regulatory DNA elements and transcription factor binding sites. However, enzymatic DNA cleavage exhibits intrinsic sequence biases that confound chromatin accessibility profiling data analysis. Existing computational tools are limited in their ability to account for such intrinsic biases and not designed for analyzing single-cell data. Here, we present Simplex Encoded Linear Model for Accessible Chromatin (SELMA), a computational method for systematic estimation of intrinsic cleavage biases from genomic chromatin accessibility profiling data. We demonstrate that SELMA yields accurate and robust bias estimation from both bulk and single-cell DNase-seq and ATAC-seq data. SELMA can utilize internal mitochondrial DNA data to improve bias estimation. We show that transcription factor binding inference from DNase footprints can be improved by incorporating estimated biases using SELMA. Furthermore, we show the strong effect of intrinsic biases in single-cell ATAC-seq data, and develop the first single-cell ATAC-seq intrinsic bias correction model to improve cell clustering. SELMA can enhance the performance of existing bioinformatics tools and improve the analysis of both bulk and single-cell chromatin accessibility sequencing data.

Virtual: An allele-specific expression analysis pipeline to unravel regulatory mechanisms in complex genetic phenotypes
COSI: RegSys
  • Daan van Beek, MaCSBio, Maastricht University, Netherlands
  • Job Verdonschot, Maastricht University Medical Centre+, Netherlands
  • Kasper Derks, MUMC+ Maastricht University, Netherlands
  • Han Brunner, Clinical Genetics, Netherlands
  • Theo de Kok, Maastricht University, Netherlands
  • Ilja Arts, Maastricht Centre for Systems Biology (MaCSBio), Netherlands
  • Stephane Heymans, Maastricht University, Netherlands
  • Martina Kutmon, Maastricht University, Netherlands
  • Michiel Adriaens, Maastricht University, Netherlands


Presentation Overview: Show

Allele-specific expression (ASE) analysis detects the relative abundance of alleles at heterozygous loci as a proxy for cis-regulatory variation, which affects the personal transcriptome and proteome. This study demonstrates the development and application of an ASE analysis pipeline on a unique cohort of 87 patients from the Maastricht Cardiomyopathy Registry with dilated cardiomyopathy (DCM), a complex genetic disorder with a remaining gap in explained heritability. The pipeline offers RNA sequencing data processing, individual and population level ASE analyses as well as group comparisons and several intuitive visualizations such as Manhattan plots and protein-protein interaction networks. We found an overrepresentation of known DCM-associated genes among the significant results across the cohort. This indicates that cis-regulatory variation could partially explain the phenotypic heterogeneity of the disease. In addition, we were able to find genes of interest, like pseudogenes and non-structural genes that have not been associated with DCM through conventional methods such as genome-wide association or differential gene expression studies. This indicates that ASE analysis offers an additional layer to conventional genomic and transcriptomic analyses for candidate gene identification and biological insight.

Virtual: Characterization of gene regulatory and interaction networks across cell types during SARS-CoV-2 infection using single cell RNA sequencing data
COSI: RegSys
  • Leo J. Arteaga-Vázquez, International Laboratory for Human Genome Research, Mexico
  • Monica Padilla-Galvez, International Laboratory for Human Genome Research, Mexico
  • Alejandra Medina-Rivera, International Laboratory for Human Genome Research, Mexico
  • Ana B. Villaseñor-Altamirano, 1 Pulmonary and Critical Care Medicine, Brigham and Women's Hospital 2 Harvard Medical School, United States
  • Yalbi I. Balderas-Martinez, National Institute for Respiratory Diseases Ismael Cosío Villegas, Mexico
  • Daniel Blanco-Melo, Fred Hutch, United States


Presentation Overview: Show

The SARS-CoV-2 genome sequence and the genes involved in its infection process have been described. Knowing how the virus affects the cells and how these respond provides a broader vision for developing future treatments. One way to understand the phenomena that occur during a viral infection is to analyze gene regulatory networks (GRNs) and interaction networks (INs). Here, we collected three single cell RNA sequencing datasets publicly available and inferred GRNs and INs to characterize the genetic regulatory mechanisms of cells when infected by SARS-CoV-2. Our results revealed previously described and new genes involved in the infection process. We observed increased activity of ATF transcription factors in B cells (B), epithelial cell (EC), macrophages, mast cells (mast), natural killer cells (NK), T cells (T), dendritic cells (DC), megakaryocytes, monocytes, and plasma cells (PC) in both, bronchoalveolar lavage fluid (BALF) and peripheral blood mononuclear cells (PBMCs). Moreover, we observed increased ligand-receptor interactions between CD74 and APP, COPA, and MIF, with macrophages expressing CD74 and EC, DC, B, mast, NK, PC, T expressed the ligands in BALF, and B, DC, monocytes, and PC expressing CD74 and B, T, DC, macrophages, megakaryocytes, monocytes, neutrophils, PC, and NK expressing the ligands in PBMCs.

Virtual: Heterogeneity of enhancers embodies shared and representative functional groups underlying developmental and cell type-specific gene regulation
COSI: RegSys
  • Wei Song, National Institutes of Health, United States
  • Ivan Ovcharenko, National Institutes of Health, United States


Presentation Overview: Show

Enhancers in a particular tissue coordinately fulfill the regulatory functions. In this work we identified distinct enhancer subclasses linked to development, differentiation, and cellular identity through multiple tissues and cell lines. Enhancer functional heterogeneity during development encompasses formation of ubiquitous enhancers (11%), non-classical regulatory mechanisms (62%), and chromatin structure (12%). Ubiquitously active enhancers stay active across multiple cell lines and comprise 10% of differentiated cell enhancers. Ubiquitous enhancers are accompanied by a large enhancer subclass (ranging from 33% to 63% of mature cell enhancers) with functions specific to the corresponding lineage. The remaining enhancers (27-40%) partake in establishing regulatory chromatin structure and facilitating interactions of cell type-specific enhancers with their target promoters. In addition to observing specialized functions of enhancers within a cell type, we show that proper accounting of enhancer heterogeneity leads to a 10% increase in accuracy of enhancer classification, which significantly improve the modeling of enhancers and identification of underlying regulatory mechanisms. In summary, our observations suggest that although the cell type-specific enhancers are heterogeneous and coordinate different regulatory programs, enhancers from different cell lines maintain common categories of functional groups across development and differentiation stages, indicating a higher order rule followed by enhancer-gene regulation.

Virtual: Identifying the regulatory networks that module the transcriptional responses to SARS-CoV-2 infection in humans.
COSI: RegSys
  • Monica Padilla-Galvez, International Laboratory in Human's Genome Research, Mexico
  • Leo J. Arteaga-Vazquez, International Laboratory in Human's Genome Research, Mexico
  • Ana B. Villaseñor-Altamirano, Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Harvard Medical School, United States
  • Yalbi I Balderas-Martinez, Instituto Nacional de Enfermedades Respiratorias, Mexico
  • Daniel Blanco-Melo, Fred Hutch, United States
  • Javier De-Las-Rivas, Centro de Investigación del Cáncer (CiC-IBMCC, Universidad de Salamanca-CSIC), Spain
  • Alejandra Medina-Rivera, Universidad Nacional Autónoma de México, Mexico


Presentation Overview: Show

A better understanding of the pathophysiology underlying COVID-19 across tissues and within cells upon SARS-CoV-2 infection is necessary. While main characters in this process have been identified (receptor, cofactors, interferon, and the cytokine cascade), a consensus over how these elements ensemble into gene regulatory networks (GRNs) and change across different tissues hasn’t been reached. Moreover, discovered networks should be compared across studies to evaluate reproducibility. Here, we perform an integrative and robust GRN analysis, simultaneously exploring SARS-CoV-2 infection compared to other viruses and COVID-19 across tissues. For each condition, we searched for enrichment and differential activation of regulons, therefore identifying the specific sets of critical transcription factors and gene targets. As expected, we find that the most enriched upregulated regulons drive a pro-inflammatory response. Furthermore, we observe an enrichment of novel regulons: ZNF595, ZNF816, and POU2F3 for SARS-CoV-2 infection. Our tissue analysis showed that 195 regulons are shared in at least 3 tissues, with most similarities between lung, liver and heart, where tissue-specific interactions could explain extrapulmonary manifestations of COVID-19. Our study corroborates previous findings and brings further insight into the regulatory mechanisms instigating COVID-19 that could be relevant for alternative treatment options development for patients.

Virtual: MPRAsnakeflow: A Snakemake based framework for MPRA data analysis
COSI: RegSys
  • Pyaree Mohan Dash, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Germany
  • Martin Kircher, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Germany
  • Max Schubach, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Germany


Presentation Overview: Show

Nucleotide variation in gene regulatory elements can have significant effects on gene expression and phenotype. These non-coding variants are found to be a major cause of human disease. Hence, the need for accurate quantification of variant effects has given rise to high throughput technologies such as Massively Parallel Reporter Assays (MPRAs) that enable simultaneous testing of thousands of candidate regulatory elements (CREs).
We have built a computational tool based on the workflow manager Snakemake, called MPRAsnakeflow, for MPRA data analysis. Our tool overcomes the limitations of custom MPRA data-analysis pipelines tailored to a specific type of MPRA or specific computing environments. MPRAsnakeflow is an open-source, fast, reproducible, user-friendly tool available on https://github.com/kircherlab/MPRAsnakeflow for measuring CRE activity and its variant effects. It is compatible with various MPRA experimental designs including lenti- and episomal-based MPRAs. It covers the inference of a barcode to reporter-sequence assignment alongside count sequencing steps and computes variant effects by contrasting reference and alternative sequences. Moreover, it encapsulates rich quality measurements and additional analyses, for example, barcode overlap statistics between replicates and various downsampling approaches to assess confidence in CRE activity. MPRAsnakeflow assists in deriving a better understanding of MPRAs and helps identify possible bottlenecks and experimental errors.

Virtual: Multiple target identification against Mycobacterium tuberculosis active infection
COSI: RegSys
  • Praveena Koyyada, University of Hyderabad, India
  • Dr Seema Mishra, University of Hyderabad, India


Presentation Overview: Show

The anthropogenic evolution of Mycobacterium tuberculosis H37Rv into multi-drug resistant form has led to the evasion of new class of antitubercular drug, bedaquiline. Thus, a transformative drug regimen against tuberculosis infection is necessary to mitigate the infection. Intended for a paradigm shift in treatment strategy against Mycobacterium, our study has adapted a mathematical and statistical approach proposing transcription regulators Rv0452 (TetR regulator), Rv0880 (MarR family regulator), Rv1027c, Rv1176c (PadR family), Rv2034 (ArsR repressor) and Rv3862c (WhiB6) as potential targets for multi-TF targeting action. These coregulators may contribute a regulatory role towards functional genes involved in virulence and adaptation, host cell cytoplasmic component metabolism, Oxido-reductase activity and cellular encapsulation. Constitutively, extending the statistical study on Mycobacterium infected human cord blood CD34+ cells revealed that integrin subunit alpha genes, ITGAM and ITGAX, complement C3b/C1b receptor CR1 gene, S100A8 and FGR (tyrosine-protein kinase, Src2) genes can be playing an act during pathogenesis of Mycobacterial in host. With these cues of understanding on both Mycobacterium tuberculosis H37Rv (pathogen) and human cord blood CD34+ (host), a host-pathogen interaction model has been deduced depicting the cross-talk between them during infection.

Virtual: Transcriptional signatures of cell-cell interactions are dependent on cellular context
COSI: RegSys
  • Brendan Innes, University of Toronto, Canada
  • Gary Bader, University of Toronto, Canada


Presentation Overview: Show

Cell-cell interactions are often predicted from single-cell transcriptomics data based on observing receptor and corresponding ligand transcripts in cells. These predictions could theoretically be improved by inspecting the transcriptome of the receptor cell for evidence of gene expression changes in response to the ligand. It is commonly expected that a given receptor, in response to ligand activation, will have a characteristic downstream gene expression signature. However, this assumption has not been well tested. We used ligand perturbation data from both the high-throughput Connectivity Map resource and published transcriptomic assays of cell lines and purified cell populations to determine whether ligand signals have unique and generalizable transcriptional signatures across biological conditions. Most of the receptors we analyzed did not have such characteristic gene expression signatures – instead these signatures were highly dependent on cell type. Cell context is thus important when considering transcriptomic evidence of ligand signaling, which makes it challenging to build generalizable ligand-receptor interaction signatures to improve cell-cell interaction predictions.

Virtual: Varimax-PCA reveals strain-specific differences in rat single-cell transcriptomics map
COSI: RegSys
  • Delaram Pouyabahar, The Donnelly Centre, University of Toronto, Canada
  • Sai Chung, Ajmera Transplant Centre, Toronto General Hospital Research Institute, Canada
  • Olivia Pezzutti, Ajmera Transplant Centre, Toronto General Hospital Research Institute, Canada
  • Catia Perciani, Ajmera Transplant Centre, Toronto General Hospital Research Institute, Canada
  • Sherry Wang, Ajmera Transplant Centre, Toronto General Hospital Research Institute, Canada
  • Xue-Zhong Ma, Ajmera Transplant Centre, Toronto General Hospital Research Institute, Canada
  • Chao Jiang, Ajmera Transplant Centre, Toronto General Hospital Research Institute, Canada
  • Trevor Chung, Ajmera Transplant Centre, Toronto General Hospital Research Institute, Canada
  • Manmeet Sekhon, Ajmera Transplant Centre, Toronto General Hospital Research Institute, Canada
  • Justin Manuel, Ajmera Transplant Centre, Toronto General Hospital Research Institute, Canada
  • Xu-Chun Chen, Ajmera Transplant Centre, Toronto General Hospital Research Institute, Canada
  • Ian McGilvray, Multi-Organ Transplant Program, Toronto General Hospital Research Institute, Canada
  • Sonya MacParland, Ajmera Transplant Centre, Toronto General Hospital Research Institute, University of Toronto, Canada
  • Gary Bader, The Donnelly Centre, University of Toronto, Lunenfeld-Tanenbaum Research Institute, Canada


Presentation Overview: Show

Single-cell RNA-sequencing is able to identify the gene expression heterogeneity within complex biological systems, though interpretation is challenging due to a mix of biological and technical factors. Previous studies have demonstrated the utility of reduced dimensional representations to identify shared cellular attributes and unique biological processes across single-cell datasets. However, in many cases, the inferred dimensions from standard matrix factorization methods may not align with biologically meaningful gene expression programs, and nonlinear methods often suffer from a lack of interpretability. Here, we developed a computational pipeline based on varimax-PCA to identify and interpret the biological and technical sources of variation in single-cell transcriptomics maps. We demonstrate the utility of this pipeline on a novel single-cell map of the healthy rat liver, leading to key insights on strain-specific differences in this liver transplantation model. Our pipeline guided the cell-type annotation of a single-cell rat liver map that was confounded by ambient RNA and highlighted the enrichment of pro-inflammatory signals in myeloid cells of the Lewis rat strain, which is important for understanding the biology of this model system. These findings were experimentally validated by performing ex vivo LPS stimulation experiments followed by intracellular cytokine staining to measure myeloid cell inflammatory response.

X-001: Characterizing cellular heterogeneity in chromatin state with scCUT&Tag-pro and scChromHMM
COSI: RegSys
  • Avi Srivastava, New York Genome Center, United States


Presentation Overview: Show

Technologies that profile chromatin modifications at single-cell resolution offer enormous promise for functional genomic characterization, but the sparsity of the measurements and integrating multiple binding maps represent substantial challenges. Here we introduce single-cell (sc)CUT&Tag-pro, a multimodal assay for profiling protein–DNA interactions coupled with the abundance of surface proteins in single cells. In addition, we introduce single-cell ChromHMM, which integrates data from multiple experiments to infer and annotate chromatin states based on combinatorial histone modification patterns. We apply these tools to perform an integrated analysis across nine different molecular modalities in circulating human immune cells. We demonstrate how these two approaches can characterize dynamic changes in the function of individual genomic elements across both discrete cell states and continuous developmental trajectories, nominate associated motifs and regulators that establish chromatin states and identify extensive and cell-type-specific regulatory priming. Finally, we demonstrate how our integrated reference can serve as a scaffold to map and improve the interpretation of additional scCUT&Tag datasets.

X-002: Paperfly: ab initio binding site reconstruction
COSI: RegSys
  • Kateřina Faltejsková, Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Czechia
  • Jiří Vondrášek, Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Czechia


Presentation Overview: Show

The specific recognition of a DNA locus is widely studied issue. It is generally agreed that the recognition can be determined not only by the binding motif, but also influenced by the larger context of the DNA binding site. In order to study the binding site including full sequential context of the binding motif, we introduce PAPerFly: the Partial Assembly-based Peak Finder, a new tool capable of reconstructing the binding site from ChIP-seq or similar experimental data.

Using a novel heuristic algorithm that utilizes approaches used in the genome assembly, Paperfly can reconstruct the unique binding sites captured in a sequencing experiment without using the reference genome. Additionally, we show that Paperfly can be combined with the standard data processing pipeline to link the unique sequences of the binding site and their respective abundance over the genome.

The source code of the tool is freely available at https://github.com/Caeph/paperfly or at
https://doi.org/10.5281/zenodo.6379332.

X-003: Differential analysis of super enhancers by accounting for internal dynamics
COSI: RegSys
  • Xiang Liu, H. Lee Moffitt Cancer Center and Research Institute, United States
  • Mingxiang Teng, H. Lee Moffitt Cancer Center and Research Institute, United States


Presentation Overview: Show

Super enhancers (SEs) are broad regulatory domains that hold elevated activities in gene regulation. SEs usually span a minimum of thousands of base pairs and contain multiple constituent enhancers (CEs). The alteration of CEs within SEs is found to be highly associated with disease dysregulation. Here, we propose a novel computational method to identify differential SEs by weighting the effects of CE activities and locations. In addition to overall activity changes, our method detects four extra categories of differential SEs with distinct CE structural alterations. By linking the altered activity of these differential SE categories to transcription factor (TF) binding and gene expression with 3D chromatin interactions, we found that each SE category regulates gene expression with distinct impacts, highlighting the differentiated regulatory roles of these unexplored SE features. When compared to the existing method, our method improved identification of differential SEs that maximizes the discernment of cell identities and functional interpretation within the same cancer type.

X-004: Regulatory Network Inference Using Nascent RNA Sequencing Data
COSI: RegSys
  • Rutendo Sigauke, University of Colorado - Anschutz Medical Campus, United States
  • Lynn Sanford, University of Colorado - Boulder, United States
  • Zachary Maas, University of Colorado - Boulder, United States
  • Robin Dowell, University of Colorado - Boulder, United States


Presentation Overview: Show

Gene regulatory networks (GRNs) aim to link co-regulated genes by employing compendia of gene expression datasets. Traditional GRNs typically do not include non-coding enhancers regions which are too lowly transcribed and unstable to be detectable by RNA-seq. By leveraging accessibility assays such as ATAC-seq and binding experiments such as ChIP-seq, recent work has begun to include enhancer regions into regulatory networks. However, these assays are noisy and lack temporal resolution. Here we use nascent RNA sequencing data, which gives an immediate readout on transcription at both enhancers and genes in a single experiment, to build transcription regulatory networks (TRNs). We manually curated over 2,800 published nascent RNA sequencing experiments across 20 organisms. These data were assessed for quality and processed using a standard pipeline. Given an organism, we identified highly correlated enhancer and gene transcripts yielding putative enhancer-gene pairs. Our preliminary results were able to identify known interacting enhancer gene pairs as well as novel interactions and expanded our knowledge of TRNs.

X-005: Annotating scATAC-seq pseudobulk clusters
COSI: RegSys
  • Aybuge Altay, Max Planck Institute of Molecular Genetics, Germany
  • Yufei Zhang, Max Planck Institute for Molecular Genetics, Germany
  • Martin Vingron, Max Planck Institute for Molecular Genetics, Germany


Presentation Overview: Show

Chromatin structure can control the accessibility of potential gene regulatory elements in a dynamic and cell-type specific manner and therefore plays a critical role in gene regulation. Although genome-wide chromatin structure can be measured by technologies like ATAC-seq in ‘bulk’, measurements in single-cell (sc) resolution (e.g. scATAC-seq) suffer from abounding zeros in the resulting data. Annotating the cell-types in scATAC-seq data remains a challenge mainly due to the lack of marker open regions to characterize cell-types and the sparsity of the scATAC-seq data. To overcome these limitations, we create scATAC-seq pseudobulk clusters by summing up the reads in a scATAC-seq cluster. We then co-embed these pseudobulks with FACS-sorted bulk ATAC-seq in PCA space and annotate each pseudobulk by the closest bulk cell-type. We couple our approach with transcription factor (TF) footprinting analysis and train a classifier with bulk ATAC-seq TF footprinting profiles to predict the cell-types of pseudobulks. This strategy provides a feasible way to overcome sparsity and leverages a large number of characterized ATAC-seq data. Our pipeline noticeably resolves the cell-type annotations when applied to human primary blood and brain data and performs comparably well with the existing methods demonstrating the strength of our strategy.

X-006: Transcriptomic and network analyses identify novel regulators of placental development
COSI: RegSys
  • Ha T. H. Vu, Iowa State University, United States
  • Haninder Kaur, Iowa State University, United States
  • Rebekah R. Starks, Iowa State University, United States
  • Geetu Tuteja, Iowa State University, United States


Presentation Overview: Show

The placenta is critical during pregnancy, impacting the health of the mother and the fetus. However, our understanding of the molecular mechanisms underlying its early development are limited. Due to ethical constraints associated with studying early human pregnancy, we used a mouse model to identify novel regulators of placental development through network analysis.
RNA-seq data was generated from mouse placental tissue at embryonic day (e) 7.5, e8.5, and e9.5. We identified 922 e7.5-specific genes, 915 e8.5-specific genes and 1952 e9.5-specific genes using hierarchical clustering and differential expression analysis. Gene interaction networks were inferred at each timepoint using GENIE3 and then subclustered using the GLay algorithm. This approach led to the identification of subnetworks comprised of genes that had cell-type specific expression according to human single-cell RNA-seq data and performed similar to other reference-free deconvolution approaches. We further defined 117 hub genes, from which we selected two candidates that have not previously been studied in placental development. We experimentally validated that the candidates indeed regulate processes necessary for placental function.
In conclusion, we inferred gene interaction networks, determined network modules that represented specific placental cell populations, and validated that our approaches can be used to identify novel regulators of placental development.

X-007: Accurately and efficiently rescuing multi-mapped reads in ChIP-seq analysis pipelines
COSI: RegSys
  • Alexis Morrissey, The Pennsylvania State University, United States
  • Shaun Mahony, The Pennsylvania State University, United States


Presentation Overview: Show

Several studies have demonstrated that transposable elements (TEs) and other repetitive regions can harbor gene regulatory elements such as transcription factor binding sites. Unfortunately, repetitive regions pose problems for short-read sequencing assays such as ChIP-seq. The same TE can exist in multiple genomic regions, creating what are known as multi-mapped reads. In most ChIP-seq analysis pipelines, reads that align to multiple genomic locations are discarded during preprocessing and thus regulatory signals occurring in repetitive regions have largely been overlooked. Here, we develop an approach to allocate multi-mapped ChIP-seq reads in an accurate and user-friendly manner. Our method, Allo, combines the probabilistic mapping of ChIP-seq reads with a convolutional neural network that recognizes the read distribution features of potential ChIP-seq peaks. Allo not only provides increased accuracy in multi-mapped read assignment over alternative methods, it also allows for read level output in the form of a corrected alignment file. Therefore, the output of our method can be input into any peak-caller downstream and is easily added to existing pipelines with very few modifications. The goal of our work is to encourage the study of repetitive regions in gene regulation by providing a readily usable approach for resolving multi-mapped read locations.

X-008: scGAD: single-cell gene associating domain scores for exploratory analysis of scHi-C data
COSI: RegSys
  • Siqi Shen, University of Wisconsin - Madison, United States
  • Ye Zheng, Fred Hutchinson Cancer Research Center, United States
  • Sündüz Keleş, University of Wisconsin - Madison, United States


Presentation Overview: Show

Quantitative tools are needed to leverage the unprecedented resolution of single-cell high-throughput chromatin conformation (scHi-C) data and integrate it with other single-cell data modalities. We present single-cell gene associating domain (scGAD) scores as a dimension reduction and exploratory analysis tool for scHi-C data. scGAD enables summarization at the gene unit while accounting for inherent gene-level genomic biases. Low-dimensional projections with scGAD capture clustering of cells based on their 3D structures. Significant chromatin interactions within and between cell types can be identified with scGAD. We further show that scGAD facilitates the integration of scHi-C data with other single-cell data modalities by enabling its projection onto reference low-dimensional embeddings. This multi-modal data integration provides an automated and refined cell-type annotation for scHi-C data.

X-009: Identification of disease-specific developmental gene regulatory networks at cellular resolution
COSI: RegSys
  • Kalpana Hanthanan Arachchilage, University of Wisconsin-Madison, United States
  • Daifeng Wang, University of Wisconsin-Madison, United States


Presentation Overview: Show

Understanding gene regulation in disease development remains elusive, especially at cellular resolution. To address this, we developed a computational framework to predict disease-specific developmental gene regulatory networks (GRNs) at the cell-type level. This framework first infers possible cell trajectories using single-cell transcriptomic data from both disease and control samples and identifies conserved and disease-specific trajectories which suggest underlying cellular developmental processes. Second, it estimates the cell pseudotimes and orders cells along the trajectories. It then applies Convergent Cross Mapping (CCM) to infer causal gene-gene relationships for various trajectories. Further, it uses the relationships from transcription factor genes to form developmental GRNs. Thirdly, we conducted a disease specificity analysis of these GRNs along the developmental trajectories to identify disease-specific subnetworks. We applied this framework to emerging scRNA-seq datasets in neurodevelopmental diseases such as Down Syndrome (DS). We compared our results with existing state-of-the-art methods such as SCENIC. In addition to many consistent predictions across methods, we revealed the developmental changes of disease-specific GRNs at the cell-type level, providing deeper mechanistic insights on gene regulation in disease development.

X-010: The rise of sparser single-cell RNAseq datasets: consequences and opportunities
COSI: RegSys
  • Gerard Bouland, Delft University of Technology, Netherlands
  • Ahmed Mahfouz, Leiden University Medical Center, Netherlands
  • Marcel Reinders, Delft University of Technology, Netherlands


Presentation Overview: Show

Continuous developments in single-cell RNA-sequencing (scRNA-seq) technology results in datasets with increasingly more cells. Despite the value of larger datasets, for example in boosting statistical power, additional challenges arise as the datasets become sparser. The sparsity of scRNA-seq has generally been seen as a problem, especially because standard count distribution models fail to explain the observed excess of zeros.

Using 52 datasets, published between 2015 and 2021, we show that scRNA-seq datasets grow with 12,089 cells on average each year. In addition, we observed a strong negative correlation between number cells and degree of sparsity (Pearson’s r = -0.72, p-value = 1.23 × 10-9). As the field moves towards sparser datasets, it is vital to discuss the consequences of the ever-increasing abundance of zero measurements.

Through experiments, we show that regardless of the level of sparsity the majority of the signal is captured in binarized expression profile (a zero represents a zero count and a one a non-zero count). Also, we show that discarding counts and using a binarized representation of scRNAseq data does not result in lower performance across diverse analysis tasks, including cell type identification and differential expression analysis. Finally, we discuss the benefits of binarization.

X-011: Analysis of Whole-Methylome in Alzheimer’s Disease Cohort Using Statistics and Deep Learning
COSI: RegSys
  • Coleman Breen, University of Wisconsin, Madison, United States
  • Reid Alisch, University of Wisconsin, Madison, United States
  • Kirk Hogan, University of Wisconsin, Madison, United States
  • Sunduz Keles, University of Wisconsin, Madison, United States


Presentation Overview: Show

DNA methylation (DNAm) is one of the most important sources of epigenomic variability and plays an underappreciated and nuanced role in gene expression. Differential methylation is involved in many diseases, including neurodegenerative diseases such as Late-Onset Alzheimer’s Disease (LOAD). While the appreciation for epigenetic contributions to LOAD has grown in recent years, further work is needed to understand DNAm’s role in the disease. Using a large prospective cohort composed of patients with LOAD, Mild Cognitive Impairment (MCI), and age-matched controls, we seek to understand the relationship between DNAm and LOAD. Two primary questions guide our analysis.
First, how do the methylation signatures of LOAD, MCI, and control patients differ from one another? To answer this we identify differentially methylated regions using classical statistical techniques with appropriate false-discovery rate control. Early results reveal a suite of differentially methylated regions, many of which are as yet undiscovered.
Second, how do the presence of cis-regulatory elements (CREs) interact with single-nucleotide variants (SNVs) to modulate methylation levels? We use a deep learning model to predict methylation levels given a participant’s genotype paired with local CREs. Preliminary analysis suggests that genotypic data paired with reference cCREs can recapitulate an individual’s DNAm profile.

X-012: Modular Hybrid Systems for Gene Regulatory System Analysis
COSI: RegSys
  • Kārlis Čerāns, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Lelde Lace, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Gatis Melkus, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Juris Viksna, Institute of Mathematics and Computer Science, University of Latvia, Latvia
  • Sandra Siliņa, Institute of Mathematics and Computer Science, University of Latvia, Latvia


Presentation Overview: Show

We introduce a novel modular hybrid system formalism that allows the identification of independent system components while modelling both discrete and continuous aspects of a hybrid system behaviour. We use the introduced formalism in modelling the biologically well-known lambda phage system to demonstrate the possibilities for the model analysis at different levels of abstraction, as well as apply it to several new phage virus models. We provide analytical and visual tools allowing to capture the essential aspects of the hybrid system behaviour space and to study its dependence on the quantitative model parameter relations. The new contributions include methods for the identification of stable regions in model state spaces and switching conditions that irrevocably lead the system to reach a single region of stability. Notably, we introduce the notion of hybrid system projections that allows to build and analyse models in a modular way.

X-013: Liam tackles complex multimodal single-cell data integration challenges
COSI: RegSys
  • Pia Rautenstrauch, Humboldt-Universität zu Berlin & Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association, Germany
  • Uwe Ohler, Humboldt-Universität zu Berlin & Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association, Germany


Presentation Overview: Show

Paired multimodal single-cell sequencing data provides unprecedented insight into the molecular state of cells but poses complex challenges for data integration. Data with complementary information and distinct statistical properties need to be combined (vertical integration), and intricate study designs and meta-analyses require sophisticated non-linear batch effect removal (horizontal integration).

Here, we present liam (leveraging information across modalities), a deep generative model for vertical and horizontal integration of paired multimodal single-cell data. Liam learns a joint low-dimensional representation of two single-cell modalities while accounting for batch effects. It supports pairwise combinations of gene expression, chromatin accessibility, and cell surface protein measurements. Liam ranked 2nd and 4th for CITE-seq and Multiome data (online training) in the Multimodal Single-Cell Data Integration NeurIPS competition, 2021. Additionally, we apply liam to a treatment-control dataset with replicates. We disentangle technical from biological variation and retain selected treatment effects in the data representation by integrating the data choosing distinct batch variables, illustrating liam's batch effect removal capabilities and flexibility concerning study design.

Our results demonstrate that liam successfully extends previous approaches for unimodal batch integration to the multimodal setting and will help unlock the potential of paired multimodal single-cell sequencing data.

X-014: Chromatin dynamics from genetic perturbations are associated with transcription and reveal a larger gene regulatory network
COSI: RegSys
  • Kevin Moyung, Duke University, United States
  • Yulong Li, Duke University, United States
  • Alexander Hartemink, Duke University, United States
  • David MacAlpine, Duke University, United States


Presentation Overview: Show

Epigenetic mechanisms contribute to gene regulation by altering the accessibility of the chromatin, resulting in transcription factor (TF) and nucleosome occupancy changes throughout the genome. A major challenge is defining and dissecting this complex chromatin-mediated code to model regulation and predict gene expression. Existing methods like ChIP-seq have only focused on single factors and expression-based studies alone cannot distinguish between direct/indirect regulation.

We address this by employing a factor-agnostic, reverse-genetics approach to capture genome-wide TF and nucleosome occupancies in response to the individual deletion of 201 transcriptional regulators in yeast using MNase-seq. Well-established pathways were recapitulated by analyzing differences in TF and nucleosome organization. We found major chromatin changes associated with differential expression, and ascertained their direct/indirect regulation by incorporating motif/binding evidence among the TFs and target genes. Analysis of differential chromatin revealed potentially novel interactions that were not captured in expression-based studies, such as locus-specific chromatin changes which may serve to prime the locus for an efficient transcriptional response. Overall, this approach allows us to closely examine the interplay between TFs and nucleosomes genome-wide and generate a larger, more complete regulatory network, providing a deeper understanding into the complex relationship between chromatin organization and gene regulation.

X-015: A Vision Transformer-based approach for identifying key markers in chromatin state associated with transcription
COSI: RegSys
  • Alexander Hartemink, Duke University, United States
  • Trung Tran, Duke University, United States


Presentation Overview: Show

In eukaryotic cells, the chromatin exists as a complex and constantly changing state.
This chromatin state is partially determined by the dynamic binding of proteins and
other DNA binding factors (DBFs)—including histones, transcription factors (TFs), and
polymerases—that interact with one another, the genome, and other molecules in an
exceedingly many possible configurations. Understanding how changing chromatin
configurations associate with transcription remains a fundamental research problem.
To address this problem, we developed a neural network model based on Vision
Transformers to predict gene expression from chromatin state alone. We trained
our models on high-resolution chromatin state, captured using MNase-seq, to predict
strand-specific gene expression. While the flexibility of our models allows for easy
extensibility to other chromatin data sets, we were able accurately predict transcript
levels, without overfitting, with an R^2 of 0.6 using MNase-seq alone. We utilized the
learned attention weights in our transformer networks to identify novel chromatin
features that precisely classify modes of gene expression.

X-016: De novo patient subtype and phenotype discovery from single-cell data using UDON
COSI: RegSys
  • Kairavee Thakkar, University of Cincinnati, United States
  • Emely Verweyen, Cincinnati Children's Hospital Medical Center, United States
  • Grant Schulert, Cincinnati Children's Hospital Medical Center, United States
  • Nathan Salomonis, Cincinnati Children's Hospital Medical Center, United States


Presentation Overview: Show

Single-cell genomics has yielded impressive insights into shared disease programs among patients with similar phenotypes. However, many diseases are inherently heterogenous with differing underlying genetics and gene regulatory programs across patients with similar clinical phenotypes. Hence, defining patient-specific subtypes and phenotypic associations is likely to uncover new disease programs and targets for precision therapy. We describe a new approach called Unsupervised Discovery Of Novel disease programs (UDON) from larger single-cell patient cohorts. Using sparse non-negative matrix factorization, UDON defines clusters from patient pseudobulks normalized against healthy controls. Applied to peripheral blood mononuclear cells (PBMCs) from five controls and 21 Systemic Juvenile Idiopathic Arthritis (SJIA) patients, with heterogenous disease states, UDON finds shared patient subtypes in distinct immune cell populations, associated with prior defined disease pathways, including heterogenous induction of complement activation in SJIA lung disease (SJIA-LD) patients and interferon-gamma signaling in macrophage activation syndrome (SJIA-MAS). Using an associated new approach called Statistical Association Test for ClinicAl PhenotYpes (SATAY-UDON), we find that these clusters frequently associate with different clinical covariates which could be experimentally validated directly from donor serum. Extension of UDON to additional diseases provides significant new insights into the molecular, cellular, and patient heterogeneity in diverse pathological conditions.

X-017: Global TF Network Inference in Humans
COSI: RegSys
  • Sandeep Acharya, Washington University in St Louis, United States
  • Woo Jung, Washington University in St Louis, United States
  • Thomas Westbrook, Washington University in St Louis, United States


Presentation Overview: Show

We report on a new TF network map of the direct-functional targets of human transcription factors (TFs). Direct-functional regulators of a gene are TFs that bind in the gene’s regulatory DNA and modulate its transcription rate. We use data on TF binding locations and gene expression levels in response to TF perturbation. We aim to build a comprehensive global TF network map by performing large-scale integration of binding and perturbation data and overcoming challenges in combining evidence from two data types.

One such challenge is the lack of overlap between genes in whose regulatory DNA a TF binds and the genes that respond to the TF perturbation. This problem is partly an artifact of using arbitrary significance thresholds, which we improve using Dual Threshold Optimization (DTO). We further improve convergence of evidence using inferred network edges from TF network mapping algorithm NetProphet with DTO. We build a global TF network map by combining TF binding data from ReMap 2020 and TF perturbation data from several human TF perturbation studies across cell-types/tissues. We perform GO Enrichment Analysis on the gene set regulated by each TF in our network and other state-of-the-art networks to evaluate the biological relevance of our network.

X-018: Identifying cell context-specific regulatory programs from spatial transcriptomics data
COSI: RegSys
  • April Sagan, University of Pittsburgh, United States
  • Hatice Osmanbeyoglu, University of Pittsburgh, United States


Presentation Overview: Show

Elaborately orchestrated transcriptional programs distinguish specialized cell types and define their functionality. Combinations of transcription factors (TFs) drive these transcriptional programs and control cellular identity and functional state. Other types of cells in close proximity are also critical for instructing cell-specific transcriptional programs. Rapid advances in spatial technologies offer highly multiplex profiling of RNAs, while preserving spatial context of the tissue. Multiple computational methods have been introduced to analyze ST data, including the identification of spatial patterns of gene expression. However, it is not yet clear how to best leverage these omics datasets to systematically estimate TF activities influencing cell states related to human health and disease.

We develop a linear mixed effect model for integrating gene expression, cis-regulatory information, spatial data, and imaging data to reveal cell context-specific transcriptional programs. Combining the features at each spot and its neighboring spots, we infer spatially resolved spot-specific transcription factor activities. We apply our method to a publicly available breast cancer spatial transcriptomics data to identify spatially variable transcription factors, cluster spots into regions with distinct regulatory features, and quantify the spatial relationship between TF activities and cell types.

X-019: Detecting higher-order structural changes in 3D genome organization with multi-task matrix factorization
COSI: RegSys
  • Da-Inn Lee, University of Wisconsin-Madison, United States
  • Sushmita Roy, University of Wisconsin-Madison, United States


Presentation Overview: Show

Three-dimensional (3D) genome organization, which determines how the DNA is packaged inside the nucleus, has emerged as a key regulatory mechanism of cellular processes. High-throughput chromosomal conformation capture (Hi-C) technologies have enabled the study of 3D genome organization by experimentally measuring interactions among genomic regions in 3D space. Analysis of Hi-C data has revealed higher-order organizational units such as topologically associating domains (TADs). Changes or disruptions to such units have been associated with disease, development, and evolution. Therefore, a key problem in regulatory genomics is to systematically detect higher-order structural changes across Hi-C datasets from multiple conditions. Existing computational methods either do not model higher-order structural units or only compare pairs of Hi-C datasets. We address these limitations with Tree-Guided Integrated Factorization (TGIF), a new multi-task Non-negative Matrix Factorization (NMF) approach. TGIF models complex relationships among multiple Hi-C datasets as a tree such that closely related Hi-C datasets have similar lower-dimensional representation. TGIF provides a statistically significant set of differential TAD boundaries with higher precision than existing approaches. Application to a cardiomyocyte differentiation timecourse dataset identified time-point specific TAD boundaries overlapping a retrotransposon element previously shown to be important for cell fate specification in humans and apes.

X-020: Genome-wide Transcriptional Regulatory Network of Human Immunological Memory Using Single-Cell Multiome-seq
COSI: RegSys
  • Alexander Katko, Cincinnati Children's, United States
  • Michael Kotliar, Cincinnati Children's, United States
  • Svetlana Korinfskaya, Cincinnati Children's, United States
  • Joseph Wayman, Cincinnati Children's, United States
  • Leah Kottyan, Cincinnati Children's, United States
  • Artem Barski, Cincinnati Children's, United States
  • Emily Miraldi, Cincinnati Children's, United States


Presentation Overview: Show

Immunological memory is induced by an initial exposure to a pathogen or vaccination and allows the organism to respond more rapidly and efficiently to the repeat encounter with the same pathogen. In T cells, effector memory cells can start secreting cytokines within minutes of reexposure compared to days for naive cells. Previous work demonstrated that memory correlates with the priming of genomic regulatory elements proximal to rapid-recall genes; however, causality is not yet established.

Here, we present a transcriptional regulatory network (TRN) of immunological memory using human CD4+ T cell single-cell multiome-seq data (parallel gene expression and chromatin accessibility). TRNs explain cellular behavior by describing interactions between transcription factors and their gene targets. Pairing gene expression and chromatin accessibility profiles will improve cell type identification and TRN inference. The experimental strategy includes a “spike-in” control that will be utilized to distinguish true biological signal from batch effect. The TRN will be used to delineate the transcription factors that induce and maintain the memory-dependent “poised” regulatory elements near rapid-recall genes, identifying potential molecular mediators of long-term maintenance of immune memory. These models will help improve understanding of immune memory and will guide the development of new vaccinations and therapies for autoimmunity.

X-021: Global transcriptional profiling of the effects of sleep deprivation at the gene, transcript, and single-nuclear level
COSI: RegSys
  • Kaitlyn Ford, Department of Translational Medicine and Physiology, Washington State University, United States
  • Elena Zuin, University of Padova, Italy
  • Dario Righelli, University of Padova, Italy
  • Christine Muheim, Department of Translational Medicine and Physiology, Washington State University, United States
  • Alexander Popescu, Department of Translational Medicine and Physiology, Washington State University, United States
  • Elizabeth Medina, Department of Translational Medicine and Physiology, Washington State University, United States
  • Hannah Schoch, Department of Translational Medicine and Physiology, Washington State University, United States
  • Kristan Singletary, Department of Translational Medicine and Physiology, Washington State University, United States
  • Stephanie Hicks, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, United States
  • Davide Risso, University of Padova, Italy
  • Lucia Peixoto, Department of Translational Medicine and Physiology, Washington State University, United States


Presentation Overview: Show

It is well established that sleep deprivation (SD) has broad effects on cortical gene expression (Franken et al., 1999; Gerstner et al., 2016; Hor et al., 2019; Mackiewicz et al., 2007; Maret et al., 2007). However, it is unclear how three-, five- and six-hours of SD differ from one another, what is occurring at the transcript level or which cell types mediate these changes within the cortex. We studied how different effects of SD correlate to molecular changes, while maintaining reproducibility across studies. We leverage our own data with public data to answer this question with a comprehensive and reproducible lens. We detected differences in molecular changes dependent on the amount of SD and response to recovery sleep. Differences in alternative splicing events were also detected. Additionally, we discovered that SD preferentially affects glutamatergic neurons. This comprehensive view enables us to further understand the deleterious effects of SD.

X-022: Systems-level transcriptional regulation of C. elegans metabolism
COSI: RegSys
  • Shivani Nanda, University of Massachusetts Medical School, United States
  • Wen Wang, University of Minnesota, United States
  • Xuhang Li, University of Massachusetts Medical School, United States
  • Hefei Zhang, University of Massachusetts Medical School, United States
  • Chad Myers, University of Minnesota, United States
  • Lutfu Safak Yilmaz, University of Massachusetts Medical School, United States
  • Albertha Jm Walhout, University of Massachusetts Medical School, United States


Presentation Overview: Show

Metabolism is precisely controlled to ensure organismal development and homeostasis under fluctuating dietary and environmental conditions. There are multiple mechanisms by which metabolism is regulated, including allostery and transcription of metabolic genes. However, the extent to which transcription contributes to metabolic control at a network level is not known because existing studies have focused on individual metabolic genes and pathways. In this study, we used available gene expression datasets to study transcriptional regulation of metabolism in the nematode Caenorhabditis elegans at a network level. We found that more than two-thirds of C. elegans metabolic genes exhibit differential gene expression during development, and across different tissues and conditions. Using a supervised approach, we found that many metabolic pathways are transcriptionally coregulated. We also used an unsupervised approach to define pathway boundaries within the metabolic network. Finally, we exploited pathway-level coexpression of genes to connect orphan genes to the metabolic network and annotate transcription factors to coexpressed pathways. Our findings show that transcriptional regulation of metabolism plays a much greater role in varied biological processes than commonly believed. This study broadens our understanding of systems-level transcriptional regulation of metabolism and provides a blueprint for similar studies in humans.

X-023: Base-resolution deep learning models of chromatin accessibility reveal combinatorial sequence motif syntax and regulatory variation
COSI: RegSys
  • Anusri Pampari, Stanford University, United States
  • Anna Shcherbina, Stanford University, United States
  • Surag Nair, Stanford University, United States
  • Avanti Shrikumar, Stanford University, United States
  • Aman Patel, Stanford University, United States
  • Austin Wang, Stanford University, United States
  • Soumya Kundu, Stanford University, United States
  • Anshul Kundaje, Stanford University, United States


Presentation Overview: Show

Chromatin accessibility profiles (DNASE-seq and ATAC-seq) exhibit multi-resolution shapes and spans regulated by cooperative binding of transcription factors (TFs). This landscape is challenging to mine because of confounding bias from assay-specific enzymes (DNASE-I/Tn5). Existing methods struggle to account for enzyme bias and base-resolution complexity, thus missing the high-resolution architecture of TF profiles. Here we introduce ChromBPNet to address both these aspects.

ChromBPNet is an optimized convolutional neural network architecture that models the influence of genomic sequence context on base-resolution chromatin accessibility profiles. ChromBPNet trained on five ENCODE canonical cell lines achieved superior predictive performance in held-out chromosomes, while optimally regressing out DNase-I/Tn5 enzyme bias. The models are highly performant over a range of sequencing depths, while de-noising and de-sparsifying low coverage signal profiles at individual cREs.

We improved interpretation methods for de-novo inference of contribution of individual nucleotides across all putative cREs in the genome, thereby revealing predictive motif instances and their combinatorial interaction effects on base-resolution profiles.

Finally, we developed a new variant effect score which predicts the impact of non-coding variants on the strength and shape of base-resolution chromatin profiles. Our models accurately predict quantitative trait loci associated with binding and accessibility in lymphoblastoid cell lines.

X-024: Regulation of capsule production in C. neoformans through changes in TF activity
COSI: RegSys
  • Cynthia Z. Ma, Washington University in St Louis, United States
  • Michael R. Brent, Washington University in St Louis, United States


Presentation Overview: Show

Cryptococcus neoformans is a pathogenic fungus, and a major virulence factor is its capsule, which protects it from a host's immune system. Certain transcription factors (TFs) are known to affect capsule production, as India ink imaging of TF knock-out (KO) mutants show differences in capsule phenotype compared to wild-type strains. However, phenotyping KO mutants alone is insufficient to clarify the role of TFs and their activity patterns in the signaling pathway that triggers capsule production in response to host-like conditions.

In this work, we infer the TF activity (TFA) levels from measured target gene expression, first in samples from TF knockout (TFKO) strains to objectively validate that TFA inference can identify meaningful activity changes, and then in samples subjected to combinations of host-like environmental signals, such as changes in temperature, CO2 concentration, and nutrient availability. The inferred activity values, as well as measured capsule phenotypes, are used to model the relationship between external signals and TFA, as well as between TFA and capsule production. We show how certain host-like conditions trigger changes in the activity of certain TFs, allowing us to hypothesize more specific links between signals and TFs, as well as make capsule phenotype predictions for knock-out mutants that had yet to be imaged. We report experimental data showing that some of those TF knockouts did in fact alter capsule size.

X-025: CREMA: Extracting Gene Regulation Mechanisms from Single Cell Multi-omics Assays
COSI: RegSys
  • Zidong Zhang, Princeton University, United States
  • Frédérique Ruf-Zamojski, Icahn School of Medicine at Mount Sinai, United States
  • Daniel Bernard, McGill University, Canada
  • Stuart Sealfon, Icahn School of Medicine at Mount Sinai, United States
  • Olga Troyanskaya, Princeton University, Flatiron Institute, United States


Presentation Overview: Show

Inferring gene regulation mechanisms is critical to understanding metazoan cell differentiation and diversity. Recent breakthrough of single cell multi-omics technologies enables simultaneous measurement of the transcriptome and epigenetic states of the genome in the same cell. Here we present CREMA (Control of Regulation Extracted from Multi-omics Assays), the first computational framework for extracting regulatory mechanisms including transcription factors (TFs) and regulatory domains from single cell multi-omics datasets. CREMA is advantageous in two aspects: 1) it incorporates chromatin accessibility to identify direct TF-target relations rather than indirect correlations; 2) it identifies regulatory domains in both the proximal and distal regions. We showed CREMA’s superior performance in reconstructing regulatory networks and identifying distal regulatory domains compared to the inference from only one modality of the multi-omics dataset. We validated CREMA’s predictions using mutation datasets and physical evidence from Chip-seq and Hi-C profiles. Finally, we applied CREMA to identify the regulatory TFs in both the proximal and distal regulatory domains that explained the cell type specific expression of genes in the mouse pituitary tissue. Overall, CREMA is a powerful framework for interpreting gene regulation mechanisms from single cell multi-omics datasets.

X-026: Exploring the role of genetic ancestry in EWS-FLI1 induced epigenetic and transcriptional programs in Ewing Sarcoma tumorigenesis
COSI: RegSys
  • Rachel M. Moss, University of Minnesota, United States
  • Kelsie Becklin, University of Minnesota, United States
  • Lauren J. Mills, University of Minnesota, United States
  • Branden S. Moriarity, University of Minnesota, United States
  • Beau R. Webber, University of Minnesota, United States
  • Logan G. Spector, University of Minnesota, United States


Presentation Overview: Show

Ewing Sarcoma (ES) is a rare but deadly bone tumor, with little improvement in survival for decades despite knowing the driving fusion oncoprotein EWS-FLI1. ES incidence in children of European ancestry is nearly ten times that found in children with primarily African ancestry. To better understand how EWS-FLI1 transforms the transcriptome, we leveraged this ancestry difference to study ES tumorigenesis. Eight iPSC lines were obtained with a range of % African ancestry, differentiated into neural crest cells, then transduced with a lentivirus expressing GFP-2A-EWS/FLI1. Interestingly, we identified 3,189 DEGs by % African ancestry at 96 hours post induction including ancestry specific changes in the expression of known EWS-FLI1 target CCND1. At 96 hours post induction we observed 663 differentially EWS-FLI1 bound peaks by ancestry, mapping to 364 unique genes. Eighty of these genes are both differentially expressed and differentially bound based on African ancestry and may be some of the early critical targets that start the cascade of molecular changes in the transcriptome and chromatin remodeling in ES. As EWS-FLI1 itself has proven elusive to direct targeting, studying its immediate downstream effects has the potential for establishing new druggable biologic pathways for treatment of ES.

X-027: Dirichlet variational autoencoders for de novo motif discovery from accessible chromatin
COSI: RegSys
  • Meghana Kshirsagar, Microsoft, United States
  • Han Yuan, Calico Labs, United States
  • Juan Lavista Ferres, Microsoft, United States
  • Christina Leslie, Memorial Sloan-Kettering Cancer Center, United States


Presentation Overview: Show

We present a novel unsupervised deep learning approach called BindVAE, based on Dirichlet variational autoencoders, for jointly decoding multiple TF binding signals from open chromatin regions. BindVAE can disentangle an input DNA sequence into distinct latent factors that encode: TFs that are generally expressed, composite patterns for TFs involved in cooperative binding and genomic context surrounding the binding sites, and learns cell type-specific in vivo binding signals. On the task of retrieving expressed TF motifs for a given cell-type, we find that BindVAE has a higher precision compared to other motif discovery approaches.

X-028: Collocation analysis of genomic intervals
COSI: RegSys
  • Tao Ma, Mayo Clinic, United States
  • Liguo Wang, Mayo Clinic, United States


Presentation Overview: Show

Significantly collocated genomic intervals suggest functional relevance, dependency, and genetic interaction. Collocation analysis of genomic intervals is widely used for annotation, integration, and exploration of genomic variants and features identified from genome-wide studies. Different from the overlapping of gene sets, collocation analysis of two sets of genomic intervals is nebulous and usually done in an arbitrary, suboptimal, and over-simplified manner. Here, we introduce and evaluate new metrics to rigorously and quantitatively measure the colocalization of genomic intervals. We demonstrated that these new approaches could successfully rediscover co-factors and master regulators (that are missed by traditional methods) from ChIP-seq, ATAC-seq, and single-cell ATAC-seq data. Source code and documentations are freely available from https://cobind.readthedocs.io/en/latest/.

X-029: preciseTAD: a machine-learning framework for predicting boundaries of 3D genomic elements
COSI: RegSys
  • Mikhail Dozmorov, Virginia Commonwealth University, United States
  • Spiro Stilianoudakis, Virginia Commonwealth University, United States
  • Maggie Marshall, Virginia Commonwealth University, United States


Presentation Overview: Show

Chromosome conformation capture technologies (Hi-C) revealed extensive DNA folding into discrete 3D domains, such as Topologically Associating Domains and chromatin loops. The correct binding of CTCF and cohesin at domain boundaries is integral in maintaining the proper structure and function of these 3D domains. 3D domains have been mapped at the resolutions of 1 kilobase and above. However, it has not been possible to define their boundaries at the resolution of boundary-forming proteins.
To predict domain boundaries at base-pair resolution, we developed preciseTAD, an optimized transfer learning framework trained on high-resolution genome annotation data. In contrast to current TAD/loop callers, preciseTAD-predicted boundaries are strongly supported by experimental evidence. Importantly, this approach can accurately delineate boundaries in cells without Hi-C data. preciseTAD provides a powerful framework to improve our understanding of how genomic regulators are shaping the 3D structure of the genome at base-pair resolution.
preciseTAD is an R/Bioconductor package available at https://bioconductor.org/packages/preciseTAD/.

X-030: Alternative splicing in single cell RNA-seq data
COSI: RegSys
  • Marie Van Hecke, UGent-IMEC, Belgium
  • Kathleen Marchal, UGent-IMEC, Belgium


Presentation Overview: Show

Alternative splicing is an important regulatory process within cells, it plays a role in many diseases and is abundant in cancer. Because of it’s cell type specific nature, it remained underexposed for a long time. Due to the rise of single cell technologies, RNA-seq studies can now be performed at the resolution of single cells. Contamination of different cell types in bulk data, no longer obscures splice variant characterization. The increasingly large data sizes and additional noise, e.g. drop-outs, inherent to single cell data, call for dedicated single cell methods. Efficient and robust methods tailored to single cell data are required. We propose a graph-based method that constructs gene-specific splice graphs from the reference genome and enrich them with additional splice events observed from the data. Next, quantification of splice variation in the graphs is done by computing percent spliced in (PSI) values for each component in the graph. Probabilistic graphical models are used to accurately and robustly estimate these PSI-values.

X-031: Identifying cell-specific cis-regulatory interactions from single cell multi-omics data
COSI: RegSys
  • Sunduz Keles, University of Wisconsin-Madison, United States
  • Shuyang Chen, University of Wisconsin-Madison, United States


Presentation Overview: Show

Single cell multi-omics experiments that jointly profile the gene expression and chromatin accessibility from the same cell provide new opportunities for studying enhancer-gene interactions among cell populations. Existing methods have limited applicability and are limited by their assumption of a common network across all the cells within a cell type without accounting for the cellular heterogeneity. Inspired by cell-specific co-expression networks constructed from scRNA-seq data, we develop a method, named GEES, to identify cell specific cis-regulatory interactions from multi-modal single cell data.
When evaluated on benchmark datasets, GEES yields better performance than the state-of-the-art methods for identifying enhancer-gene interactions. Detailed interrogation of cell-specific interacting gene-enhancer pairs reveals that while these interactions tend to be shared by similar cell types, a gene can be regulated by different enhancers in a cell-type specific manner. In addition to baseline analysis which elucidates interacting gene-enhancer pairs, GEES enables downstream analysis of the multi-modal single cell data by leveraging gene-enhancer pairs as units. GEES’s ability to quantify heterogeneity of enhancer-gene interactions even within a single cell type makes it especially appealing for studying cell differentiation processes and cellular mechanisms of diseases.

X-032: Leveraging epigenomes and three-dimensional genome organization for interpreting regulatory variation
COSI: RegSys
  • Jacob Schreiber, Stanford University, United States
  • William Stafford Noble, University of Washington, United States
  • Sushmita Roy, University of Wisconsin-Madison, United States
  • Brittany Baur, University of Wisconsin-Madison, United States
  • Junha Shin, University of Wisconsin-Madison, United States
  • Shilu Zhang, University of Wisconsin-Madison, United States
  • Yi Zhang, University of Illinois at Urbana-Champaign, United States
  • Mohith Manjunath, University of Illinois Urbana–Champaign, United States
  • Jun Song, University of Illinois at Urbana-Champaign, United States


Presentation Overview: Show

Understanding the impact of regulatory variants on complex phenotypes is a significant challenge because the genes and pathways that are targeted by such variants are typically unknown. Furthermore, a regulatory variant can influence a particular gene’s expression in a cell type or tissue-specific manner. Cell-type-specific long-range regulatory interactions that occur between a distal regulatory sequence and a gene can be used to examine the impact of regulatory variants on complex phenotypes. However, high-resolution maps of such long-range interactions are available only for a handful of cell types. To address this challenge, we have developed L-HiC-Reg, a Random Forests regression method to predict high-resolution contact counts in new cell types, and a network-based framework to identify candidate cell-type-specific gene networks targeted by variants from a genome-wide association study (GWAS). We generated a compendium of predicted interactions in 55 Roadmap Epigenomics Mapping Consortium cell types, which we used to interpret regulatory SNPs in the NHGRI GWAS catalogue and identify downstream pathways across cell types for fifteen phenotypes. The compendium of long-range interactions and downstream pathways can be queried at https://pages.discovery.wisc.edu/~bbaur/Roadmap_RegulatoryVar/Roadmap/ along with a tool for visualizing SNP target genes and pathways across all cell types and phenotypes.

X-033: Dysregulation of transcription factor networks unveils new targets in different BCP-ALL rearrangements
COSI: RegSys
  • Saloe Bispo, Faculdades Pequeno Príncipe and Instituto de Pesquisa Pelé Pequeno Príncipe, Brazil
  • Roberto Rosati, Faculdades Pequeno Príncipe and Instituto de Pesquisa Pelé Pequeno Príncipe, Brazil
  • Gabriela Canalli, Faculdades Pequeno Príncipe and Instituto de Pesquisa Pelé Pequeno Príncipe, Brazil
  • Sara Alves, Faculdades Pequeno Príncipe and Instituto de Pesquisa Pelé Pequeno Príncipe, Brazil
  • Liana Oliveira, Instituto de Pesquisa Pelé Pequeno Príncipe, Brazil


Presentation Overview: Show

Pediatric B-cell precursor acute lymphoblastic leukemia (BCP-ALL) is the most common cancer in children. In current treatment protocols, the intensity of treatment is stratified by risk classification models that include an ever growing series of molecular characteristics. Computational approaches that integrate data from RNAseq, transcriptional regulatory networks (TRN), SNV and non-coding RNA can greatly assist in the understanding of complex diseases such as BCP-ALL. non-coding RNA are widely expressed in cancers and changes in the expression levels of these genes have been shown to be relevant factors for tumorigenesis. However, differences in the expression of non-coding RNA between subtypes of BCP-ALL in the context of regulatory networks and SNV in TFs have not been described to date. In this scenario, we designed TRN with expression profiles from transcription factors with or without non-silent exome mutations, as key regulators of various target non-coding RNA. Using publicly available RNAseq and exome sequencing data from a cohort of 195 BCP-ALL cases, we identified a regulatory network involving non-coding RNA already described with aberrant expression in different types of cancers, transcription factors harboring non-silent mutations appear to be involved in regulatory networks involving coding and non-coding genes with differential expression between the evaluated subtypes.

X-034: Tree-structured Matrix factorization for integration and analysis of heterogeneously related datasets
COSI: RegSys
  • Junha Shin, Wisconsin Institute for Discovery, University of Wisconsin-Madison, United States
  • Erika Da-Inn Lee, Wisconsin Institute for Discovery, University of Wisconsin-Madison, United States
  • Saptarshi Pyne, Wisconsin Institute for Discovery, University of Wisconsin-Madison, United States
  • Elizabeth E. Capowski, Waisman Center, University of Wisconsin-Madison, United States
  • David Gamm, Waisman Center, University of Wisconsin-Madison, United States
  • Sushmita Roy, Wisconsin Institute for Discovery, University of Wisconsin-Madison, United States


Presentation Overview: Show

Single cell sequencing technologies are revolutionizing biological and biomedical research by providing high-throughput molecular phenotyping of individual cells at different omic levels including the transcriptome and epigenome. A major challenge is to systematically integrate datasets across heterogeneously related datasets that can arise from a time course or hierarchically related contexts. Although a number of approaches exist to integrate and compare multiple samples, most project samples into a shared lower dimensional space which might not be optimal for samples from a time course or hierarchy. To integrate and compare diverse heterogeneously related datasets, we have developed Tree-structured Matrix Factorization (TMF). TMF extends Non-Negative matrix factorization, to a multi-task learning framework that jointly factors multiple datasets at the same time. We compared TMF to several existing algorithms on three datasets, representing different platforms, biological samples and species. Compared to existing methods TMF has the best overall rank for both clustering and batch effect correction. We analyzed the TMF results from the multi-species dataset, which represented different brain regions and identified several conserved clusters enriched for immune function and cell cycle regulation. These results suggests that TMF is an effective approach for integration of single cell omic datasets, especially for hierarchically-related contexts.

X-035: Transcriptional dynamics of sleep deprivation and subsequent recovery sleep in the murine cortex
COSI: RegSys
  • Alexander Popescu, Department of Translational Medicine and Physiology, Elson S. Floyd College of Medicine, Washington State University, United States
  • Kaitlyn Ford, Department of Translational Medicine and Physiology, Elson S. Floyd College of Medicine, Washington State University, United States
  • Christine Muheim, Department of Translational Medicine and Physiology, Elson S. Floyd College of Medicine, Washington State University, United States
  • Elizabeth Medina, Department of Translational Medicine and Physiology, Elson S. Floyd College of Medicine, Washington State University, United States
  • Hannah Schoch, Department of Translational Medicine and Physiology, Elson S. Floyd College of Medicine, Washington State University, United States
  • Kristan Singletary, Department of Translational Medicine and Physiology, Elson S. Floyd College of Medicine, Washington State University, United States
  • Stephanie Hicks, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, United States
  • Davide Risso, University of Padova, Italy
  • Lucia Peixoto, Department of Translational Medicine and Physiology, Elson S. Floyd College of Medicine, Washington State University, United States


Presentation Overview: Show

Sleep loss regulates gene expression throughout the brain and impacts learning and memory. However, the molecular consequences of sleep deprivation and the ability of sleep to restore baseline gene expression remain underexplored. Our goal here is to provide a genome-wide overview of in vivo transcriptional dynamics in the cortex in response to sleep loss and subsequent recovery sleep. We integrated data from our group with publicly-available RNA-seq data to investigate reproducible gene expression changes across multiple time points: 3, 5, and 6 hours of sleep deprivation (SD3, SD5, SD6) and 2 and 6 hours of recovery sleep (RS2 and RS6). To correct for variability in sleep deprivation procedures and other batch effects, we used the RUVSeq package in R/Bioconductor, along with a gold standard of positive and negative control genes following 5 hours of sleep deprivation we had previously assembled. We identified over 5000 genes differentially expressed with sleep deprivation and were able to recover ~80% of positive controls. Based on gene expression patterns over time points, we performed clustering and functional annotation, allowing us to develop a model of transcriptional regulation in response to sleep that is both physiologically and biologically relevant.

X-036: Dynamic regulatory module networks for examining context-specific gene regulatory networks
COSI: RegSys
  • Thomas Irving, Departments of Bacteriology and Agronomy, University of Wisconsin-Madison, Madison, WI, United States
  • Matias Kirst, School of Forest Resources and Conservation, University of Florida, Gainesville, FL, United States
  • James Thomson, Morgridge Institute for Research, Madison, WI, United States
  • Ron Stewart, Morgridge Institute for Research, Madison, WI, United States
  • Scott Swanson, Morgridge Institute for Research, Madison, WI, United States
  • Rupa Sridharan, Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, WI, United States
  • Pamela Soltis, Florida Museum of Natural History, University of Florida, Gainesville, FL, United States
  • Doug Soltis, Department of Biology, University of Florida, Gainesville, FL, United States
  • Robert Guralnick, Florida Museum of Natural History, University of Florida, Gainesville, FL, United States
  • Ryan Folk, Department of Biological Sciences at Mississippi State University, MS, United States
  • Sushmita Roy, Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States
  • Lucas Gontijo Silva Maia, Departments of Bacteriology and Agronomy, University of Wisconsin-Madison, Madison, WI, United States
  • Morten Seirup, Morgridge Institute for Research, Madison, WI, United States
  • Sanhita Chakraborty, Departments of Bacteriology and Agronomy, University of Wisconsin-Madison, Madison, WI, United States
  • Daniel Conde, School of Forest Resources and Conservation, University of Florida, Gainesville, FL, United States
  • Alireza Fotuhi Siahpirani, Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, United States
  • Sara Knaack, Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, United States
  • Jean-Michel Ane, Departments of Bacteriology and Agronomy, University of Wisconsin-Madison, Madison, WI, United States
  • Deborah Chasman, Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, United States


Presentation Overview: Show

Changes in transcriptional regulatory networks significantly alter the function of a cell, affecting both normal and disease processes. To examine transcriptional dynamics, several studies have profiled transcriptomes and epigenomes at different stages of a developmental process. Integration of these data across multiple time points to infer context-specific regulatory networks is a significant challenge due to the small number of measurements. We present Dynamic Regulatory Module Networks (DRMNs), to predict regulatory networks in a cell type-/time point-specific manner by leveraging their relationships in a multi-task learning approach. We applied DRMN to datasets in plant and mammalian systems, with different temporal dynamics including a cellular reprogramming, cellular differentiation and host-microbe symbiosis. These datasets measure expression and a variety of regulatory signals such as accessibility and chromatin marks. We used DRMNs to examine the transcriptional and chromatin dynamics of Nitrogen-fixing symbiosis in Medicago truncatula roots. We experimentally validated DRMN predicted regulators establishing novel regulatory relationships of transcription factors in symbiosis. Application of DRMN to these diverse datasets identified known and novel regulators underlying the major transcriptional dynamics of a process and demonstrated that DRMN is a broadly applicable approach for analysis of regulatory genomic time course data.

X-037: Do-calculus enables estimation of causal effects in partially observed biomolecular pathways
COSI: RegSys
  • Sara Mohammad Taheri, Northeastern Univerusity, United States
  • Jeremy Zucker, Pacific Northwest National Laboratory, United States
  • Charles Tapley Hoyt, Laboratory of Systems Pharmacology, Harvard Medical School, United States
  • Karen Sachs, Next Generation Analytics, United States
  • Vartika Tewari, Northeastern Univerusity, United States
  • Robert Ness, Microsoft Research, United States
  • Olga Vitek, Northeastern Univerusity, United States


Presentation Overview: Show

Estimating causal queries, such as changes in protein abundance in response to a perturbation, is a fundamental task in the analysis of biomolecular pathways.
The estimation requires experimental measurements on the pathway components. However, in practice, many pathway components are left unobserved (latent) because they are either unknown or difficult to measure. Latent variable models (LVMs) are well-suited for such estimation. Unfortunately, LVM-based estimation of causal queries can be inaccurate when parameters of the latent variables are not uniquely identified, or when the number of latent variables is misspecified. This has limited the use of LVMs for causal inference in biomolecular pathways. In this manuscript, we propose a general and practical approach for LVM-based estimation of causal queries.
We prove that, despite the challenges above, LVM-based estimators of causal queries are accurate if the queries are identifiable according to Pearl's do-calculus, and describe an algorithm for its estimation. We illustrate the breadth and the practical utility of this approach for estimating causal queries in four synthetic and two experimental case studies, where structures of biomolecular pathways challenge the existing methods for causal query estimation.