Posters

Preparing your Poster - Information and Poster Size
Poster Schedule
Print your poster in Chicago
Poster Categories

View Posters By Category

Session A: (July 7 and July 8)	Session B: (July 9 and July 10)
3Dgeomics (Special Session) CAMDA CompMS Education Evolution and Comparative Genomics Function HiTSeq MLCSB NetBio RNA SysMod TransMed VarI General Computational Biology	3DSIG Bio-Ontologies BioVis CAMDA Evolution and Comparative Genomics HiTSeq MICROBIOME MLCSB RegSys SCANGEN: Single-cell cancer genomics (Special Session) General Computational Biology

B-690: Functional impact of genomic rearrangements on gene expression and chromatin organization

COSI: RegSys

Yad Ghavi-Helm, Institute of Functional Genomics of Lyon, France
Aleksander Jankowski, EMBL, Germany
Sascha Meiers, EMBL, Germany
Rebecca Rodríguez Viales, EMBL, Germany
Jan Korbel, EMBL, Germany
Eileen E. M. Furlong, EMBL, Germany

Short Abstract: Complex regulatory programs of multicellular organisms are controlled by regulatory elements, which can be located at large distances from their target genes. The scope of action of regulatory elements is constrained by the spatial chromatin organization, in particular within Topologically Associating Domains (TADs). Recent studies investigated how genomic variations that affect TAD boundaries lead to changes TAD structure and gene expression. However, those results remain limited to a small number of loci. To globally assess the relationship between gene expression and chromatin organization, we utilized highly rearranged balancer chromosomes in Drosophila melanogaster. These chromosomes feature multiple types of genomic variations at different scales. We compared gene expression in developing embryos using RNA-seq between the rearranged chromosomes and their wild-type counterparts. Doing it in a heterozygous cross allowed us to intrinsically account for trans regulatory effects. We also quantified the differences in chromatin organization, using Hi-C. In line with previous studies, we found that differential gene expression is correlated with local changes in genome topology. Surprisingly though, we observed that changes in large-scale chromatin organization do not globally correlate with changes in gene expression, despite the frequent disruption of TADs. Overall, our results are indicative of robust mechanisms buffering genomic variation.

B-692: Differential Analysis of Regulatory Elements Based on ChIP-seq Data

COSI: RegSys

Verena Heinrich, Max Planck Institute for Molecular Genetics, Germany
Anna Ramisch, Max-Planck-Institut fuer Molekulare Genetik, Germany
Martin Vingron, Max Planck Institut fuer molekulare Genetik, Germany

Short Abstract: Enhancers are critical for gene regulation not only in differentiation processes but also during disease development. It remains a challenge to identify these regulatory elements in a cell-type or even disease-state dependent manner. Thus, rather than comparing separated epigenetic signature tracks we propose an approach to computationally map and compare enhancers across different samples and conditions. Here we present a two-step framework to predict and assign condition dependent enhancers solely based on ChIP-seq histone modification data. To this end, a random forest based classifier is trained on a set of high confidence regions and used for enhancer prediction. We will demonstrate that the presented approach can be applied across different tissues and species without the need of re-training. In a second step, all regions are assigned to different biological conditions by applying a permutation test directly to enhancer probability values and are subsequently formed into regulatory units by incorporating topologically associated domains (TADs). We have applied our strategy to several projects which encompass different numbers and types of conditional states and were able to prioritize candidate enhancer regions that are correlated to the respective biological question.

B-693: Network-based identification of disease genes in expression data: the GeneSurrounder method

COSI: RegSys

Sahil Shah, Northwestern University, United States
Rosemary Braun, Northwestern University, United States

Short Abstract: The ability to profile the expression levels of thousands of genes simultaneously and identify the genes associated with a disease has opened new avenues in understanding disease mechanisms and developing precision medicine interventions. Since the organization of physical and functional cellular networks into databases, it has been possible to develop methods that analyze expression data in the context of these networks. A key challenge is to combine the expression data with the systems-level information and still obtain specific molecular targets. We present a new analysis technique, which we call GeneSurrounder, that identifies specific disease-associated genes and takes into account the complex network of cellular interactions. GeneSurrounder identifies genes that (i) appear to influence nearby genes on the network that (ii) themselves are dysregulated and associated with the disease under study. We apply GeneSurrounder to three distinct ovarian cancer studies using a global KEGG network and show that our method yields more consistent results across multiple studies of the same phenotype than competing methods. These methods can open up new avenues of precision medicine by identifying disease-associated genes.

B-694: Comprehensive Analysis of Alternative Splicing Across Tumors from 8,705 Patients

COSI: RegSys

Andre Kahles, ETH Zurich, Switzerland
Kjong-Van Lehmann, ETH Zurich, Switzerland
Nora C Toussaint, ETH Zurich, NEXUS Personalized Health Technologies, Switzerland
Matthias Hüser, ETH Zurich, Switzerland
Stefan G Stark, ETH Zurich, Switzerland
Timo Sachsenberg, University of Tübingen, Germany
Oliver Stegle, EMBL-EBI, United Kingdom
Oliver Kohlbacher, University of Tübingen, Germany
Chris Sander, Harvard University, United States
Gunnar Rätsch, ETH Zurich, Switzerland

Short Abstract: Building on the tremendous resource of The Cancer Genome Atlas (TCGA), we carried out a comprehensive analysis of the transcriptomes of 8,705 cancer patients over a total of 32 cancer types. We uniformly processed both RNA and Whole Exome sequencing data from TCGA and extracted alternative splicing (AS) events and tumor variants. We observe thousands of AS events present in cancer samples that are absent from TCGA normals or GTEx samples and find a consistent increase of splicing in cancer vs normal (≈30%). In a genome-wide association of splicing and somatic variation we confirmed known trans-associations involving SF3B1 and U2AF1 and identified three additional trans-acting variants (IDH1, TADA1, PPP2R1A). Integrating data from protein-MS for Breast and Ovarian Cancer samples, we were able to confirm on average ≈1.7 peptides derived from novel exon-exon junctions compared to ≈0.6 SNV-derived peptides per tumor sample, for peptides that were also predicted MHC-I binders. Hence, by including neoantigens derived from novel exon-exon junctions, the fraction of samples for which at least one putative neoantigen can be identified increases from 30% to 75%, presenting a new class of splicing-associated potential neoantigens that could be exploited for immunotherapy.

B-695: Identification of enhancer target genes by correlating gene expression and epigenetic modifications within topologically associated domains

COSI: RegSys

Konstantin Okonechnikov, German Cancer Research Center (DKFZ), Germany
Serap Erkek, Izmir Biomedicine and Genome Center, Turkey
Stephen C. Mack, Baylor College of Medicine, United States
Kristian W. Pajtler, German Cancer Research Center (DKFZ), Germany
Stefan M. Pfister, German Cancer Research Center (DKFZ), Germany
Lukas Chavez, University of California San Diego, United States

Short Abstract: Integrative analysis of histone modifications across diverse tissue types and diseases has uncovered the dependence of gene regulation on chromatin organization. High-throughput technologies for analyzing genome-wide chromosomal conformation have revealed that chromatin is arranged in topologically associated domains (TADs), which remain largely stable across cell types, while intra-TAD activities are cell type specific. Consequently, detailed knowledge about TAD boundaries can be utilized for associating epigenomic signals with their target genes. For example, we have recently identified enhancer-associated genes in 42 primary ependymoma brain tumors across six distinct molecular subgroups by H3K27ac ChIP-sequencing. Our TAD guided analysis leveraged Hi-C data previously generated from human fetal fibroblasts and revealed promising molecular targets for improved treatment of ependymoma tumors. We have now implemented our analysis strategy as an open-source R package that can be applied to any heterogeneous cohort of samples analyzed by a combination of gene expression and epigenetic profiling techniques with or without sample matched chromosomal conformation information. To investigate the impact of tumor specific TADs, we have generated chromosomal conformation data from patient derived ependymoma cell-lines. Our preliminary results confirm that enhancer-associated genes can largely be inferred by borrowing TADs information from unrelated reference samples.

B-696: The use of a new approach to identify methylation patterns in specific different tumor types: a preliminary data in breast cancer.

COSI: RegSys

Gabriel Wajnberg, Atlantic Cancer Research Institute, Brazil
Ilyass Hajji, Atlantic Cancer Research Institute, Canada
Dina Soliman, Atlantic Cancer Research Institute, Canada
Nicolas Crapoulet, Atlantic Cancer Research Institute, Canada
Rodney Ouellette, Atlantic Cancer Research Institute, Canada

Short Abstract: DNA methylation is an epigenetic event that occurs when a methyl radical binds to cytosine in the DNA. This event, which regulates the expression of the genes, is usually found in repetitive sequences. However, it can be associated with repression or stimulation of the expression of genes with important roles in the biology of different tumor types. We applied an innovative approach using different types of comparisons as filters between methylation microarray data from different cancer patients. The phase 3 methylation microarray data publicly available was obtained from TCGA database. We used a total of 567 samples from different normal tissues, including 27 samples from normal breast tissues, and 1405 samples from tumor tissues from different types of cancer. As a result, we identified 764 genes with differentially methylated regions specific to breast tumor type, which 15 associated with this type of cancer. Also, the methylation of 8 genes was already identified in the cell-free DNA of breast cancer patients plasma. We can use the same method to identify specific candidate biomarkers in other tumor types. This new approach will help to identify new specific candidates as diagnosis biomarkers never described in breast cancer, including in blood plasma.

B-697: Long noncoding RNAs as sequence-specific DNA-binding factors

COSI: RegSys

Chao-Chung Kuo, RWTH Aachen Medical Faculty, Germany
Nevcin Senturk, DKFZ, Germany
Ingrid Grummt, DKFZ, Germany
Ivan G. Costa, RWTH Aachen University, Germany

Short Abstract: The importance of proteins-DNA interactions in gene regulation is indisputable. Yet, the role of RNA-DNA interactions in gene regulation have been poorly explored so far. We are interested in triple helices, where a single RNA strand binds to the major groove of a double helix and individual nucleobases form specific Hoogsteen hydrogen bonds with adenine or guanine residues of the purine-rich DNA strand. There is an increasing evidence on the use of triple helices binding in transcription regulation. So far, computational methods for triple helix detection are based on enumerating all triple helices, i.e. small sequences with high proportion of bases following the triple helix code, for a given pair of RNA and DNA sequence. We describe here a method for statistical characterization of set of RNAs to bind to particular DNA regions. Triplex Domain Finder indicates regions within the RNAs (DNA binding domains) with the highest potential for forming triple helices. Case studies on long noncoding RNAs known to form triple helices demonstrate that TDF is able to recover known regions of RNA and DNA forming triple helices. Moreover, sequencing confirms triple helix binding sites of a known and a novel Meg3 DNA binding domain.

B-698: Candidate non-coding driver mutations in super-enhancers and long-range chromatin interaction networks

COSI: RegSys

Juri Reimand, Ontario Institute for Cancer Research, Canada

Short Abstract: A catalogue of mutations that drive tumorigenesis and progression is essential to understanding tumor biology and developing therapies. Protein-coding driver mutations have been well-characterized by large exome-sequencing studies, however many tumors have no mutations in protein-coding drivers and few non-coding drivers besides the TERT promoter are known. To fill this gap, we analyzed 150,000 cis-regulatory regions in 1,844 whole cancer genomes from the ICGC-TCGA PCAWG project. Using our new method, ActiveDriverWGS, we found 41 frequently mutated regulatory elements (FMREs) enriched in non-coding SNVs and indels characterized by aging-associated mutation signatures and frequent structural variants. FMREs were enriched in super-enhancers and long-range chromatin interactions, suggesting that the mutations drive cancer by altering distal gene regulation. The chromatin interaction network of FMREs and target genes revealed associations of mutations and differential gene expression of known and novel cancer genes, activation of immune response pathways and altered enhancer marks. Thus distal genomic regions may include additional, infrequently mutated drivers that act on target genes via chromatin loops. Our study is an important step towards finding such regulatory regions and deciphering the somatic mutation landscape of the non-coding genome.

B-699: Identification of Transcription Factor Binding Sites using ATAC-seq

COSI: RegSys

Zhijian Li, RWTH Aachen Medical Faculty, Germany
Marcel H. Schulz, Saarland University, Germany
Martin Zenke, RWTH Aachen Medical Faculty, Germany
Ivan G. Costa, RWTH Aachen University, Germany

Short Abstract: Transposase-Accessible Chromatin (ATAC) followed by sequencing (ATAC-seq) is a simple and fast protocol for detection of open chromatin. However, computational footprinting in ATAC-seq, i.e. search for regions with depletion of cleavage events due to transcription factor binding sites, has been poorly explored so far. We propose HINT-ATAC, a footprinting method that addresses ATAC- seq specific protocol artifacts. HINT-ATAC uses a probabilistic framework based on Variable-order Markov models to learn the complex sequence cleavage preferences of the transposase enzyme. Moreover, we observed specific strand specific cleavage patterns around the binding sites of transcription factors, which are determined by local nucleosome architecture. HINT-ATAC explores local nucleosome architecture to significantly outperform competing footprinting methods in predicting transcription factor binding sites by ChIP-seq.

B-700: Principal Component Region Set Analysis: Facilitating Interpretation of PCA Dimensions for DNA Methylation Data

COSI: RegSys

John Lawson, University of Virginia, United States

Short Abstract: Principal component analysis (PCA) is a widely used technique for dimensionality reduction and visualization in genomics, where the number of dimensions can be thousands or even hundreds of thousands. However, since each principal component (PC) is a linear combination of original dimensions, the meaning of the new dimensions can be hard to interpret. For PCA of DNA methylation data, the cytosines which are the original dimensions may not have a clear biological annotation, further hindering interpretation. Currently, there is a lack of methods for interpreting PCs of DNA methylation data. We present a method which annotates PCs using sets of genomic regions corresponding to a given biological annotation, such as transcription factor binding or histone modifications. We tested the method on DNA methylation data from breast cancer, confirming known associations, and data from the rare childhood cancer Ewing sarcoma, discovering novel associations. Our method is computationally efficient, scales well with increasing number of samples, and will fit well into existing analysis workflows. This method will be broadly useful to help researchers understand variation in DNA methylation among samples.

B-701: Sequence-based prediction of regulatory genomic regions with an improved deep-learning method

COSI: RegSys

Koh Onimaru, RIKEN Center for Biosystems Dynamics Research (BDR), Japan

Short Abstract: The development of DNA sequencing technologies has been dramatically increasing the amount of genome sequence data derived from diverse species and individual humans. Apparently, the next demanding challenge is a deeper understanding of what genome sequences encode and how to extract useful information from them. Such sequence-based understandings would ultimately yield the predictability of phenotypes from genome sequences. While the syntax of protein-coding genes is well understood (thereby allows us to predict some extent of phenotypic consequences such as nonsense mutations), there is a big room to explore on the basic rules of non-coding gene regulatory sequences. In this presentation, I would like to introduce a deep learning-based approach to tackle this limitation. I design a deep convolutional neural network that can learn and predict regulatory DNA sequences. The key features of my method are the following: a) a convolutional layer that integrates information from forward and reverse DNA sequences; b) a simplified data structure; c) a new quality index to filter out low-quality data from a training data set. These features improve the prediction accuracy of the model. Furthermore, by extracting what the model learned, I show some preliminary results that may be useful to interpret genomic information.

B-702: GENOME-WIDE ANALYSIS OF EWS-FLI1 DRIVEN TRANSCRIPTION REPROGRAMMING: IMPACT ON DNA DAMAGE RESPONSE IN EWING SARCOMA

COSI: RegSys

Aparna Gorthi, University of Texas Health San Antonio, United States
Alexander Bishop, University of Texas Health San Antonio, United States
Yidong Chen, UT Health Science Center at San Antonio, United States

Short Abstract: Ewing sarcoma is an aggressive pediatric cancer predominantly driven by EWS-FLI1. Little is known about the systemic impact of EWS-FLI1 and the underlying basis of its chemosensitivity. We probed a genome-wide RNAi screen and identified transcription, RNA metabolism and DNA damage response as being required for Ewing sarcoma viability. Interestingly, these processes were also altered in response to damage in expression profiles of Ewing sarcoma cell lines. We found a highly significant accumulation of R-loops (three-stranded RNA:DNA structures), a consequence of transcription dysregulation, in Ewing sarcoma. We developed an analysis pipeline in order to compare genome-wide R-loops and other ChIP-seq data and found a strong concordance between R-loops and RNA Polymerase II, as well as the DNA repair protein BRCA1 both in terms of peak height (depicting level of enrichment) as well as coverage. Importantly, BRCA1 co-localization was significantly higher at genes also bound by EWS-FLI1. Further, BRCA1 localization was diminished following damage in control cell lines but less so in Ewing sarcoma. Finally, these observations were confirmed by experimental evidence of impaired homologous recombination and sensitivity to PARP1 inhibitors. In conclusion, our study combines bioinformatics and experimental data to establish the underlying basis of Ewing sarcoma chemosensitivity.

B-703: The BaMM web server for de-novo motif discovery and regulatory sequence analysis

COSI: RegSys

Anja Kiesel, Ludwig Maximilian University of Munich, Germany
Wanwan Ge, MPI-BPC, Germany
Christian Roth, MPI-BPC, Germany
Johannes Soeding, MPI BPC, Germany

Short Abstract: We have shown previously that higher-order Bayesian Markov Models (BaMMs) perform substantially better than PWMs or first-order models for motif discovery [Siebert M, NAR, 2016]. To bring the community the high-order BaMMs with improved quality and to offer users the possibility to combine various standard analyses, we developed the BaMM webserver with user-friendly interfaces and results pages. The BaMM webserver offers four tools: (i) de-novo motif discovery in a sequence set, (ii) scanning a sequence set with motifs to find motif occurrences, (iii) searching with an input motif for similar motifs in our BaMM database, and (iv) browsing and keyword searching in the database. Our motif database contains motifs for 798 transcription factors, trained from two ChIP-seq databases for human and mouse. In contrast to other servers, e.g. JASPAR and HOCOMOCO, we represent sequence motifs not by PWMs but by 4th-order BaMMs. To address the inadequacy of P- and E-values as measures of motif quality, which are badly correlated with biological relevance of the motif, we developed the AURRC score (area under the recall-versus-true-positive-to-false-positive-ratio curve). The AURRC score summarizes how well the motif model can distinguish true motif instances from the background. The BaMM server is freely accessible at https://bammmotif.mpibpc.mpg.de.

B-704: What We Talk About When We Talk About Enhancers

COSI: RegSys

Mary Lauren Benton, Vanderbilt University, United States
Sai Charan Talipineni, University of Pittsburgh, United States
Dennis Kostka, University of Pittsburgh, United States
John Capra, Vanderbilt University, United States

Short Abstract: Non-coding gene regulatory enhancers are essential to transcription in mammalian cells. As a result, numerous experimental and computational strategies have been developed to identify cis-regulatory enhancer sequences. Most studies consider enhancers identified by only a single method, and concordance between sets from different methods has not been comprehensively evaluated. We assessed the similarities of enhancer sets identified by ten representative strategies in four biological contexts and evaluated the robustness of resulting downstream conclusions. We demonstrate significant dissimilarity between enhancer sets in genomic characteristics, evolutionary conservation, and association with functional loci. We find most regions identified as enhancers are supported by only one method. The disagreement is sufficient to influence interpretation of functional loci, and to lead to disparate conclusions about enhancer biology and disease mechanisms. We also find limited evidence that regions identified by multiple methods are better enhancer candidates than regions identified by a single strategy. Our results highlight the inherent complexity of enhancer biology and argue that current approaches have yet to adequately account for enhancer diversity. To facilitate assessment of enhancer diversity in future studies, we developed creDB, a database of enhancer annotations designed to integrate into bioinformatics workflows.

B-705: Reconstructing differentiation networks and their regulation from time series single-cell expression data

COSI: RegSys

Jun Ding, Carnegie Mellon University, United States
Bruce Aronow, Cincinnati Children's Hospital, United States
Kaminski Naftali, Yale School of Medicine, United States
Joseph Kitzmiller, Cincinnati Children's Hospital, United States
Jeffrey Whitsett, Cincinnati Children's Hospital, United States
Ziv Bar-Joseph, Carnegie Mellon University, United States

Short Abstract: Generating detailed and accurate organogenesis models using single-cell RNA-seq data remains a major challenge. Current methods have relied primarily on the assumption that descendant cells are similar to their parents in terms of gene expression levels. These assumptions do not always hold for in vivo studies, which often include infrequently sampled, unsynchronized, and diverse cell populations. Thus, additional information may be needed to determine the correct ordering and branching of progenitor cells and the set of transcription factors (TFs) that are active during advancing stages of organogenesis. To enable such modeling, we have developed a method that learns a probabilistic model that integrates expression similarity with regulatory information to reconstruct the dynamic developmental cell trajectories. When applied to mouse lung developmental data, the method accurately distinguished different cell types and lineages. Existing and new experimental data validated the ability of the method to identify key regulators of cell fate.

B-706: Fine Mapping of Chromatin Interactions via Neural Networks

COSI: RegSys

Artur Jaroszewicz, University of California, Los Angeles, United States
Jason Ernst, University of California, Los Angeles, United States

Short Abstract: High-throughput chromatin conformation assays such as Hi-C have enabled genome-wide detection of long-range chromatin contacts, which have been shown to be integral in various regulatory mechanisms. However, interactions from Hi-C experiments are typically identified at relatively coarse resolutions (e.g., 5-25kb) and thus do not robustly identify interactions at a fine-scale. We present a novel computational method, Chromatin Interaction Siamese Convolutional Neural Net (ChISCNN), to fine map Hi-C detected interactions to their likely source at a high resolution. Using high resolution information within DNase-seq and ChIP-seq data for transcription factors and histone marks, we trained a Siamese Convolutional Neural Network (SCNN) to discriminate between true interactions and non-interactions. We then use a feature importance algorithm along with the SCNN to assign each pair of 100bp subregions a score that corresponds to its importance in the Hi-C interaction. We demonstrate the effectiveness of our approach both by comparing our predictions to independent genome annotations and the recovery of original Hi-C peaks after extending their boundaries. Finally, we discuss what signals give chromatin interactions their specificity.

B-707: Imputed gene associations identify replicable trans-acting genes enriched in transcription pathways

COSI: RegSys

Heather Wheeler, Loyola University Chicago, United States
Sally Ploch, Loyola University Chicago, United States
Alvaro Barbeira, University of Chicago, United States
Hae Kyung Im, University of Chicago, United States

Short Abstract: Regulation of gene expression is an important mechanism through which genetic variation can affect complex traits. A substantial portion of gene expression variation can be explained by both local (cis) and distal (trans) genetic variation. Much progress has been made in uncovering cis-acting expression quantitative trait loci (cis-eQTL), but trans-acting eQTL have been more difficult to identify and replicate. Rather than testing every SNP for association with every gene, we first imputed the component of gene expression determined by local genetic variation. Then, we tested this imputed gene expression component for association with observed expression of genes on different chromosomes to identify trans-acting genes. Gene expression imputation models were trained by applying statistical machine learning to independent eQTL panels. We leverage a recent extension of PrediXcan called MulTiXcan, which is a gene level association method that aggregates imputation models across multiple eQTL panels, to identify 1159 trans-acting genes and their 1247 targets, for a total of 3657 trans-acting/target gene pairs (FDR < 0.05). Trans-acting genes identified by MulTiXcan are enriched in transcription and transcription factor pathways, which indicates our method uncovers genes of expected function.

B-708: Module Analysis Captures Pancancer Genetically and Epigenetically Deregulated Cancer Driver Genes for Smoking and Antiviral Response

COSI: RegSys

Magali Champion, Paris Descartes University, France
Kevin Brennan, Stanford University, United States
Tom Croonenborghs, Broad Institute, KU Leuven, United States
Andrew Gentles, Stanford University, United States
Nathalie Pochet, Harvard University, United States
Olivier Gevaert, Stanford University, United States

Short Abstract: The availability of increasing volumes of multi-omics profiles across many cancers promises to improve our understanding of the regulatory mechanisms underlying cancer. The main challenge is to integrate these multiple levels of omics profiles and especially to analyze them across many cancers. Here we present AMARETTO, an algorithm that addresses both challenges in three steps. First, AMARETTO identifies potential cancer driver genes through integration of copy number, DNA methylation and gene expression data. Then AMARETTO connects these driver genes with co-expressed target genes that they control, defined as regulatory modules. Thirdly, we connect AMARETTO modules identified from different cancer sites into a pancancer network to identify cancer driver genes. Here we applied AMARETTO in a pancancer study comprising eleven cancer sites and confirmed that AMARETTO captures hallmarks of cancer. We also demonstrated that AMARETTO enables the identification of novel pancancer driver genes. In particular, our analysis led to the identification of pancancer driver genes of smoking-induced cancers and ‘antiviral’ interferon-modulated innate immune response.

B-709: COOPERATION BETWEEN POLYCOMB AND TRITHORAX/COMPASS COMPLEXES AT BIVALENT PROMOTERS

COSI: RegSys

Ruslan Sadreyev, Harvard University, United States

Short Abstract: Both repressive (H3K27me3) and active (H3K4me3) histone modifications are present at key developmental promoters in embryonic stem cells due to co-localization of repressive Polycomb group (PcG) and activating Trithorax/COMPASS group (TrxG) protein complexes. Direct functional interactions between PcG and TrxG at these bivalent promoters are unclear. Our integrative analysis of public RNA-seq and ChIP-seq datasets of multiple PcG and TrxG proteins revealed a quantitative genome-wide correlation between chromatin occupancies of Kdm2b, a component of Polycomb Repressive complex 1 (PRC1), and Mll2, an essential H3K4 methylase component of TrxG. This correlation suggested potential functional crosstalk between Kdm2b and Mll2 at both active and repressed promoters. Experimental validation of this hypothesis revealed that loss of Kdm2b resulted in depletion of Mll2 at promoters genome-wide, suggesting that Kdm2b is required for Mll2 occupancy at both bivalent and active promoters. Loss of Kdm2b or the core PRC1 component Ring1b also resulted in the reduction of H3K4me3 at bivalent promoters. This surprising hypothesis suggests a direct pathway for cooperation between PcG and TrxG complexes at bivalent promoters, an unexpected modification to the current model of bivalency. In addition, our results reveal genome-wide role of Kdm2b protein independent of the full PRC1 complex.

B-710: Discovery of HIF-dependent Alternative Splicing Events

COSI: RegSys

Natalie Davidson, MSKCC, United States
Gunnar Rätsch, ETH Zurich, Switzerland
Philipp Markolin, ETH Zurich, Switzerland
Christian Hirt, ETH Zurich, Switzerland
Christoph Chabbert, ETH Zurich, Switzerland
Nicola Zamboni, ETH Zurich, Switzerland
Gerald Schwank, ETH Zurich, Switzerland
Wilhelm Krek, ETH Zurich, Switzerland

Short Abstract: Hypoxia is prevalent in many tumors and a regulator of malignant tumor progression, notably through hypoxia-inducible transcription factors (HIFs). HIF’s downstream effects on alternative splicing (AS) is unclear. To identify HIF-dependent AS, we performed RNA-seq on human pancreatic cancer (PDAC) cells subjected to hypoxia or normoxia +/- ARNT/HIF1B, a dimerization partner for hypoxia transcriptional response. We identified 538 HIF-dependent events (FDR<15%), where 38 events have percent-spliced-in change (delta-PSI) > 0.05. We experimentally validated events using multiple PDAC cell lines and patient-derived PDAC organoids. More than half (22/38) were confirmed in TCGA. We compared PSI values between tumor/normal tissues across cancer types with sufficient sample size. In breast cancer, where HIFs are upregulated, 10/22 events have significant PSI difference. Among BRCA patients, we found differential usage of the hypoxia-inducible event we identified for SLC35A3 (q-value=2.8e-08; t-test), a transporter involved in metabolism. Focusing on SLC35A3, we illustrate how HIF-dependent isoforms constitute a novel regulatory mechanism in hypoxia biology. In summary, we report the discovery of hypoxia-inducible isoforms in PDAC, linking HIF proteins with post-transcriptional regulation. We validate a set of HIF-dependent splicing events in several model systems and investigate their prevalence in human cancer patients.

B-711: The Epigenomic Landscape of Aberrant Splicing in Cancer

COSI: RegSys

Donghoon Lee, Yale University, United States
Jing Zhang, Yale University, United States
Mark Gerstein, Yale University, United States

Short Abstract: Nearly all protein-coding genes undergo alternative RNA splicing, which provides an important mean to expand transcriptome diversity beyond the scope of genomic information. While splicing is an elaborate process, it can be prone to errors that could become pathogenic. Unsurprisingly, aberrant splicing, which collectively refers to splicing events that could confer risk of a disease, is often implicated in cancer. Recent studies have revealed splicing regulation is characterized by increased levels of nucleosome density and positioning, DNA methylation, and distinct histone modification patterns. However, most studies on aberrant splicing have largely focused on identifying genomic- and transcriptomic-level variations within splice sites, cis-acting splicing regulatory elements, and trans-acting splicing factors. The extent, nature, and effects of epigenomic dysregulation in aberrant splicing remain unsolved. By systematically profiling the epigenomic landscape of aberrant splicing using transcriptomic and epigenomic data from the ENCODE and the Epigenome Roadmap projects, we aimed to (1) identify chromatin status and distinct epigenetic signatures that characterize aberrant splicing in cancer, (2) classify aberrant splicing by different class of epigenomic dysregulation, and (3) elucidate the role of epigenomic control in aberrant splicing. The proposed study will significantly advance our understanding of epigenomic contribution to aberrant splicing in cancer.

B-712: A natural language system for analyzing gene regulation

COSI: RegSys

Xue Zhang, Tufts University School of Medicine, United States
Mark Burstein, Smart Information Flow Technologies, United States
Scott Friedman, Smart Information Flow Technologies, United States
Jeffrey Rye, Smart Information Flow Technologies, United States
Brent Cochran, Tufts University School of Medicine, United States

Short Abstract: As part of a DARPA program called “Communicating with Computers”, we are developing a natural language driven dialog system for analyzing gene expression. This system is one part of a larger system that allows for development of executable network models of biological pathways, network path analysis, and integration of cancer proteomic and genetic data. The overall system employs an agent based architecture. The system incorporates state of the art natural language processing, reasoning systems, and dialog management that incorporates semantics of biological molecules and processes. We have developed an agent for the system that we call the “Transcription Factor and Targeting Agent” or TFTA. The TFTA can answer queries regarding which transcription factors are associated with a given gene and which genes are bound by a given transcription factor, the tissue specific expression of a gene, and the association of a gene and transcription factor with a pathway. The TFTA can find transcription factors common to a list of genes or pathways common to set of genes. The TFTA also answers queries about microRNAs and their target genes. The goal of the system will be to allow biologists to easily access and analyze gene regulatory relations using natural language.

B-713: iPhDnet: Network-Based Global Chromatin Profile Fingerprinting Reveals Therapeutic Efficacy in Breast Cancer

COSI: RegSys

Shamim Mollah, University of California San Diego, United States
Shankar Subramaniam, University of California San Diego, United States

Short Abstract: It is widely recognized that disruptions of chromatin-based mechanisms caused by epigenetic alterations through histone modifications, contribute to cancer development and progression. These histone modifications are highly reversible, making them potential drug targets in cancer therapy. While many molecularly targeted drugs have the potential to revert these modifications, their precise mechanism of action, i.e., alterations in gene regulatory networks, are poorly understood. To address this problem, we developed an integrated phosphoprotein-histone-drug network (iPhDnet) that serves as a window into histone modifications in breast cancer, revealing molecular fingerprints, referred as “global chromatin profile fingerprints”. The model is based on a hybrid approach, whereby an unsupervised clustering method is used to histone signature generation and a supervised multivariate regression method is used to histone prediction using high-information content mass spectrometry data.

B-714: Genome-wide identification of enhancer release and retargeting, a novel disease mechanism involving enhancer-gene target switching

COSI: RegSys

Joydeep Mitra, Albert Einstein College of Medicine, United States
Soohwan Oh, University of California San Diego, La Jolla, CA, United States
Wenbo Li, University of Texas Health Science Center, United States
Michael Rosenfeld, University of California San Diego, La Jolla, CA, United States
Zhengdong Zhang, Albert Einstein College of Medicine, United States

Short Abstract: Enhancers and promoters both play indispensable roles in gene transcription activation. Recently, we observed that mutation in, or loss of a preferred cognate promoter can release its regulatory enhancer to loop to, and activate an alternative promoter in its chromosomal neighborhood. Here, we present a novel computational approach to identify such 'enhancer release and retargeting' (ERR) events on a genome-wide scale, and their implications in human diseases, through statistical analysis and integration of various genomic data sets and the GWAS catalog of human complex diseases. We identified putative ERR events, with a count ranging from 31 to 525, in all 48 human tissues available in the current version of GTEx. Over a hundred of them are common to multiple tissues, with some occurring in as many as 36 tissue types. In several ERR events, enhancer retargeting would cause activation of genes associated with diseases. Our analysis shows that ERR, a previously unobserved and unsuspected mechanism, by which genetic alterations of promoters causes activation of alternative gene promoters, is a common occurrence in the transcriptomes of multiple tissue types. Moreover, our study suggests that ERR may also allude to a previously-overlooked mechanism underlying disease and developmental defect risk.

B-715: Novel analysis of HT-SELEX data elucidates the role of DNA shape in transcription factor binding

COSI: RegSys

Soumitra Pal, National Institutes of Health, United States
Jan Hoinka, National Institutes of Health, United States
Teresa Przytycka, NIH, United States

Short Abstract: Characterizing DNA binding specificities of transcription factors (TFs) is of primary importance for studying gene regulation. Recently, several lines of evidence suggested that both DNA sequence and shape contribute to TF binding. However providing a direct evidence for the role of DNA shape in TF binding has been challenging due to the difficulty in separating the sequence and structure contributions to the binding. To address this challenge, we developed a novel way of analyzing the results of in vitro HT-SELEX experiments for TF-DNA binding. Specifically, the presence of motif-free sequences in late HT-SELEX cycles and their enrichment in weak binders allowed us to detect evidences for the role of DNA shape features in TF binding. Our approach revealed that, even in the absence of a sequence motifs, TFs weakly bind to DNA molecules enriched in specific shape features that were often TF specific. Surprisingly, we also found that some properties of DNA shape contribute to promiscuous binding of all tested TF families. Strikingly, such promiscuously bound shapes correspond to the most frequent shape formed by the DNA. We propose that this promiscuous binding facilitates sliding of TFs along the DNA molecule before it is locked in its binding site.

B-716: mirDIP 4.1––integrative database of human microRNA target predictions

COSI: RegSys

Tomas Tokar, University Health Network, Toronto, Ontario, Canada, Canada
Chiara Pastrello, University Health Network, Toronto, Ontario, Canada, Canada
Andrea Rossos, University Health Network, Toronto, Ontario, Canada, Canada
Mark Abovsky, University Health Network, Canada
Anne-Christin Hauschild, University Health Network, Toronto, Ontario, Canada, Canada
Mike Tsay, University Health Network, Toronto, Ontario, Canada, Canada
Richard Lu, University Health Network, Toronto, Ontario, Canada, Canada
Igor Jurisica, University Health Network, Toronto, Ontario, Canada, Canada

Short Abstract: MicroRNA Data Integration Portal (mirDIP) is the integrative database of human microRNA target predictions. In its recent version (v4.1), mirDIP compiles nearly 152 million computational predictions obtained from 30 different resources, covering almost all known human microRNAs and the vast majority of human genes. In contrast to other integrative resources, the scope of mirDIP extends beyond collection of the existing data. Predictions obtained from individual resources were standardized to the contemporary nomenclature and their predictive precision was benchmarked on the currently available experimental evidence. We used statistical learning to infer what we refer to as integrative score, to asses overall confidence in existence of the given microRNA-target interaction. As we demonstrated, integrative score provides more precise predictions than those obtained from individual resources. Importantly, due to its integrative nature, predictions derived using this score are also less biased than those from other resources, who tend to identify targets belonging to specific biological processes or pathways (likely due to the underlying knowledge bias). Using mirDIP we identified previously unknown functional classes of microRNAs and revealed novel associations between microRNAs and various human pathologies. Altogether, mirDIP provides a very comprehensive and reliable resource for miRNA-target predictions, substantially advancing the human microRNA-related research.

B-717: SpectralTAD: identification of topologically associated domaints (TADs) using spectral clustering

COSI: RegSys

Kellen Cresswell, Virginia Commonwealth University, United States
John Stansfield, Virginia Commonwealth University, United States
Mikhail Dozmorov, Virginia Commonwealth University, United States

Short Abstract: Topologically associated domains (TADs) are regions of the genome defined by strong inter-TAD interaction patterns. TADs have been shown to be associated with a variety of genomic functions, including gene regulation. Despite this importance, there is no consensus on how to properly detect TADs from raw Hi-C contact matrices. We propose a method that reframes TAD detection as a spectral clustering problem. To perform clustering, the contact matrix is interpreted as an adjacency matrix corresponding to a graph with weights indicating the number of inter-loci contacts. Spectral clustering is performed with each cluster corresponding to a unique TAD. Clusters are assigned based on the iterative discretization method described in (Yu and Shi 2003). We demonstrate how the eigengap, a common heuristic for determining the number of clusters for spectral clustering, fails when analyzing Hi-C matrices, and introduce a novel alternative for detecting the optimal number of TADs based on maximizing cluster-wise silhouette scores. Our method is implemented in SpectralTAD R package. Results show that TAD boundaries identified with this method co-locate heavily with CTCF peaks. Additionally, this method produces TADs that have better separation when compared to other commonly used methods.

B-718: Discovery of biased orientation of DNA motif sequences affecting enhancer-promoter interactions and transcription of genes

COSI: RegSys

Naoki Osato, Osaka University, Japan

Short Abstract: Chromatin interactions have important roles for enhancer-promoter interactions (EPI) and regulating the transcription of genes. CTCF and cohesin proteins are located at the anchors of chromatin interactions, forming their loop structures. DNA binding sequences of CTCF indicate their orientation bias at chromatin interaction anchors. Forward-reverse (FR) orientation is frequently observed. However, it is still unclear what proteins are associated with chromatin interactions. To find DNA binding motif sequences of transcription factors (TF) such as CTCF affecting EPI and the transcription of genes, transcriptional target genes were predicted based on enhancer-promoter association (EPA). EPA was shortened at the genomic locations of FR or reverse-forward (RF) orientation of DNA binding motifs of TF. The expression level of the target genes predicted based on EPA was compared with target genes predicted from only promoters. Total 351 biased orientation of DNA motifs affected the expression level of putative transcriptional target genes significantly in monocytes of four people in common, and included known transcription factors associated with chromatin interactions and EPI, such as CTCF, cohesin (RAD21 and SMC3), ZNF143 and YY1. Moreover, EPI predicted using FR or RF orientation of some DNA motifs were overlapped with chromatin interaction data (Hi-C) more than the other EPA.

B-719: Uncoupling of transcription and cytodifferentiation in mouse spermatocytes with impaired meiosis

COSI: RegSys

Alexander Fine, The Jackson Laboratory & Tufts University, United States
Robyn Ball, Stanford University, United States
Yasuhiro Fujiwara, The Jackson Laboratory, United States
Mary Ann Handel, The Jackson Laboratory & Tufts University, United States
Gregory Carter, The Jackson Laboratory & Tufts University, United States

Short Abstract: Cell differentiation is driven by changes in gene expression that manifest as changes in cellular phenotype or function. Altered cellular phenotypes, stemming from genetic mutations or other perturbations, are widely assumed to directly correspond to changes in the transcriptome and vice versa. Here, we use the cytologically well-defined Prdm9 mutant mouse as a model of developmental arrest to demonstrate that parallel programs of cellular differentiation and transcription can become dis-associated. By comparing cytological phenotype markers and transcriptomes in wild-type and mutant spermatocytes, we identified multiple instances of cellular and transcriptional uncoupling in Prdm9-/- mutants. Most notably, Prdm9-/- germ cells arrest cytologically in late-leptotene/zygotene but nevertheless develop gene expression signatures characteristic of later, post-arrest developmental substages. These findings suggest that transcriptome changes may not reliably map to cellular phenotypes in perturbed systems.

B-720: Mismatched base-pairs locally distort DNA structure and can induce increased DNA-binding by transcription factor proteins

COSI: RegSys

Ariel Afek, Duke University, United States
Raluca Gordan, Duke University, United States

Short Abstract: Transcription factors (TFs) are known to recognize DNA using both sequence (direct) and shape (indirect) readout. To investigate the contribution of shape to protein-DNA binding, we use mismatches (i.e. mis-paired bases) to induce significant structural changes in TF-DNA binding sites, with minimal changes in the DNA sequence of these sites. We present Saturation Mismatch Binding Assay (SaMBA), the first assay to characterize the effects of mismatches on TF-DNA binding in high-throughput. For genomic sequences of interest, SaMBA generates DNA duplexes containing all possible single-base mismatches, and quantitatively assesses the effects of the mismatches on TF-DNA interactions. We applied SaMBA to measure binding of 21 TFs (covering 14 structural families) to thousands of mismatched sequences, and mapped the impact of mismatches on these TFs. For all tested factors we found that DNA mismatches within binding sites can significantly increase TF binding levels. Furthermore, for several TFs we identified non-specific genomic regions that become strongly bound after certain mismatches are introduced. Structural analyses of mismatches that increase TF binding revealed that these mismatches oftentimes distort the naked DNA to induce shapes that are also present in the protein-bound sites, thus providing direct evidence of the contribution of DNA shape to protein-DNA recognition.

B-721: Using Markov Random Field to Model Gene Expression in the 3D Genome

COSI: RegSys

Naihui Zhou, Iowa State University, United States
Iddo Friedberg, Iowa State University, United States
Mark Kaiser, Iowa State University, United States

Short Abstract: The chromatin and its 3D organization plays important roles in cellular function in the eukaryotic cell, with the advance in the 3C (HiC) technology, more long-range intra-chromosomal and inter-chromosomal interactions between genomic loci have come to light. Specifically, the 3D organization of the genome may play important roles in transcription regulation. The theory of “transcription factory” is one such hypothesis. These nuclear subcompartments are dynamically organized so that the genes in these compartments have coordinated transcription. This study is an attempt to further consolidate the theory of “transcription factory” using a spatial Markov Random Field (MRF) model. By directly modelling gene expression values on a spatial neighborhood network inferred from HiC data, we were able to estimate the level of spatial dependency among protein-coding genes in the human IMR90 cell. We overcame computational challenges of large matrices using the double Metropolis algorithm to carry out the Markov Chain Monte Carlo (MCMC) simulation for this Bayesian model. Our study confirms the spatial dependency of transcription among neighboring genes in the 3D genome organization on a global scale. Further insights can be made into the mechanism of differential expression as a response to stimuli involving the chromatin compartments.

B-722: Novel normalization and clustering methods for accurately identifying subtypes of single cells in fresh human brains

COSI: RegSys

Yungil Kim, Icahn School of Medicine at Mt. Sinai, United States
John Fullard, Icahn School of Medicine at Mt. Sinai, United States
Ying-Chih Wang, Icahn School of Medicine at Mt. Sinai, United States
Kristin Beaumont, Icahn School of Medicine at Mt. Sinai, United States
Robert P. Sebra, Icahn School of Medicine at Mt. Sinai, United States
Panos Roussos, Icahn School of Medicine at Mt. Sinai, United States

Short Abstract: Better understanding of regulatory architectures and underlying disease etiology substantially enhance targeting effective risk variants or biological entities in complex diseases including Alzheimer (AD). Resolving the heterogeneity of various immune cell types, researchers recently deployed scRNA-seq into AD transgenic mice for identifying potential markers. However, current analytical pipelines for the error-prone single cell assays either used conventional methods from bulk RNA-seq studies with vulnerable assumptions or incorporated partial findings from scRNA-seq data into statistical methods. Due to the effects of inevitable noise and large sparcity in such high-dimensional data and a complex mixture of biological stochasticity and technical variability, the analytical outcomes are thus highly questionable especially in their accuracy. With more than 100,000 cells from fresh human brains in 12 individuals via drop-seq protocols, we developed a novel computational framework for identifying immune-related cell subtypes in unprecedentedly high resolution by iteratively combining parametric modeling and nonparametric approaches. We show that our approach successfully identifies both known and hidden and rare subtypes and more accurately reveals their associated marker genes than current methods. Our novel method would provide a detailed description of mechanistic interplay among distinctive immune cells in multiple scales and therefore assist better therapeutics in neurological diseases.

B-723: Network Inference from Single-Cell Transcriptomic Data Using Granger Causality

COSI: RegSys

Atul Deshpande, University of Wisconsin Madison; Morgridge Institute of Research, United States
Anthony Gitter, University of Wisconsin-Madison, United States

Short Abstract: Advances in single-cell transcriptomics have enabled observing gene expression in individual cells, providing a detailed view of dynamic biological processes. Cells’ expression states allow them to be ordered based on their progression through a process such as differentiation. Ordered data can be valuable in understanding the underlying gene-gene regulatory interactions that control the process. Regulatory interactions between genes can manifest as 'causal' relationships in their expression trends; for example, increases and decreases in expression of a regulator gene may consistently precede those of its target genes. However, the distribution of cells along the process is not uniform, preventing the use of standard mathematical methods for detecting dependencies in temporal data, including Granger causality. We present an ensemble approach using a generalized Lasso-based Granger causality test suitable for analyzing irregular time series to infer gene regulatory networks from ordered single-cell data. Modified Borda count aggregation combines multiple rankings obtained from diverse kernel-based Granger causality analyses. The kernel smooths over the irregularly-spaced and missing observations. We apply our algorithm to mouse embryonic stem-cell differentiation datasets and demonstrate that it recovers gold standard transcriptional regulatory interactions more accurately than existing single-cell network inference algorithms.

B-724: Divergence in DNA specificity among paralogous transcription factors contributes to their differential in vivo binding

COSI: RegSys

Ning Shen, Duke University, United States
Yuning Zhang, Duke University, United States
Jingkang Zhao, Duke University, United States
Raluca Gordan, Duke University, United States

Short Abstract: DNA-binding specificity is a fundamental characteristic of transcription factors (TFs). In eukaryotes, most TF-coding genes have undergone gene duplication and divergence during evolution, resulting in paralogous factors with highly conserved DNA-binding domains and recognizing similar DNA sequence motifs. However, paralogous TFs oftentimes bind to distinct targets in the cell, and they perform distinct regulatory functions. The differential genomic targeting by paralogous TFs is generally assumed to be due to interactions with protein cofactors or the chromatin environment. Using a computational-experimental framework called iMADS (integrative Modeling and Analysis of Differential Specificity), we show that, contrary to previous assumptions, paralogous TFs bind differently to genomic target sites even in vitro. We used iMADS to quantify, model, and analyze specificity differences between 11 TFs from 4 protein families. We found that paralogous TFs have diverged mainly at medium and low affinity sites, which are poorly captured by current motif models. We identify sequence and shape features differentially preferred by paralogous TFs, and we show that the intrinsic differences in specificity among paralogous TF contribute to their differential in vivo binding. Thus, our study represents a step forward in deciphering the molecular mechanisms of differential specificity in TF families.

B-725: Hierarchical Domain Structure Reveals the Divergence of Activity among TADs and Boundaries

COSI: RegSys

Lin An, The Pennsylvania State University, United States
Tao Yang, The Pennsylvania State University, United States
Jiahao Yang, Tsinghua University, China
Johannes Nuebler, Massachusetts Institute of Technology, United States
Qunhua Li, The Pennsylvania State University, United States
Yu Zhang, The Pennsylvania State University, United States

Short Abstract: Mammalian genomes are organized into different levels. As one of the fundamental structural units, Topologically Associating Domains (TADs) play a key role in gene regulatory machinery. Recent studies found that hierarchical structures are also present within some TADs. However, precise identification of the locations of hierarchical TAD structures still remains challenging. Here we present HitHiC, a dynamic programming based method that can accurately and quickly uncover hierarchical TAD structures from Hi-C data. Through a systematic evaluation, we show that HitHiC has better accuracy, reproducibility and running speed than the existing methods. We applied HitHiC to high resolution Hi-C matrices and found TADs that have nested structures are in general more active than those that do not. Furthermore, we identified a group of boundaries that are shared by multiple TADs, which we call super boundaries. We showed that the super boundaries are highly enriched with active chromatin states and expressed genes. This observation of super boundaries potentially agrees with by the asymmetric movement in loop extrusion model. Altogether, our results reveal new insights towards understanding the complex system of gene regulation.

B-726: Integrative analysis of environmental stress responsive gene networks and their drivers

COSI: RegSys

Irem Celen, University of Delaware, United States
Chandran Sabanayagam, University of Delaware, United States

Short Abstract: Environmental changes have profound effects on well-being and survival of living organisms. For example, environmental stress factors have been found to account for over 70% of the chronic diseases, yet the knowledge about their genome-wide impacts remains limited. Therefore, it is crucial to uncover the environmental responsive genetic mechanisms in systems level. We have built comprehensive gene co-expression networks with in-house and publicly available transcriptome data of Caenorhabditis elegans from different environmental stress conditions, such as exposures to microgravity and dietary alterations. Our gene networks showed high accuracy (>88%) for predicting the known gene networks and are significantly enriched in previously found protein-protein interactions (P<0.01). Moreover, the networks predicted 19% - 98% more interactions for the known pathways through guilt-by-association. A high correlation was found between the function of the gene networks and the observed phenotype. By incorporating the data from histone modifications and transcription factors, we identified the putative drivers of the gene co-expression networks. Overall, our combinatory approach can help facilitate exploration of the disease-driving genetic responses to environmental stimuli and create potential therapeutic targets.

B-727: Bridging the Gap Between Detailed Biophysical Modeling and Large-Scale Inference of Expression Regulation

COSI: RegSys

Konstantine Tchourine, NYU - Center for Genomics and Systems Biology, United States
Christine Vogel, New York University, United States
Richard Bonneau, New York University, United States

Short Abstract: Despite the availability of large datasets of RNA and protein expression data, the underlying regulatory mechanisms that allow a cell to adequately respond to a wide range of external stimuli have not been fully quantified. For example, many existing transcription network inference methods typically do not estimate biophysical parameters involved in RNA transcription, but instead rely on probabilistic methods such as random decision forests. Conversely, proper biophysical modeling has typically only been implemented in small-scale systems, such as the LAC operon. The primary goal of my work is to demonstrate that increasing the level of biophysical detail can improve large-scale modeling and inference of regulation. First, I demonstrate that explicitly accounting for RNA degradation improves transcription regulatory network inference in S. cerevisiae and B. subtilis, and that RNA half-lives estimated de-novo via condition- and gene-specific network inference optimization correspond to experimentally measured RNA half-lives. Furthermore, I demonstrate that more accurate mathematical treatment of the contribution of transcriptional repression to RNA expression regulation infers more biophysically accurate models of regulation for many genes.

B-728: Neuronal identity and neuron-specific gene cycling between PDF and DN1 neurons in the Drosophila brain.

COSI: RegSys

Marta Iwanaszko, Northwestern University, United States
Elzbieta Kula-Eversole, Michigan State University, United States
Ravi Allada, Northwestern University, United States
Rosemary Braun, Northwestern University, United States

Short Abstract: Approximately 150 neurons of the 200,000 neurons in the Drosophila brain comprise the fly's circadian neural network. Two main types of cells, PDF and DN1, have diverse functions in guiding fly circadian activity and behavior. PDF (or “morning”) cells control the morning peak of activity and are important in short photoperiods. DN1 cells, part of group called “evening” cells, control the evening peak of activity as well as the morning peak, and enhance morning arousal. DN1 cells feed back to morning and evening cells and promote sleep. It has been shown recently that temperature and sleep regulate activity of DN1 neurons. Using a variety of computational methods, we analyzed cycling patterns of gene expression in RNA-seq data from both DN1 and PDF neurons under different diet and temperature conditions. While the core clock genes retain their phase of oscillations in both cell types and under all conditions, we observe clock-regulated genes with a strong phase difference between the neuron types. Differential expression analysis shows that temperature impacts gene cycling profiles more strongly than diet in these neurons. Analysis of gene rhythmicity, differential cycling and functional clustering provides insights into these neurons’ distinctive functions and their response to environmental perturbations.

B-729: PREDICTD: PaRallel Epigenomics Data Imputation with Cloud-based Tensor Decomposition

COSI: RegSys

Timothy Durham, University of Washington, United States
Maxwell Libbrecht, University of Washington, United States
J Jeffry Howbert, University of Washington, United States
Jeffrey Bilmes, University of Washington, United States
William Noble, University of Washington, United States

Short Abstract: The Encyclopedia of DNA Elements (ENCODE) and the Roadmap Epigenomics Project seek to characterize the epigenome in diverse cell types using assays that identify, for example, genomic regions with modified histones or accessible chromatin. These efforts have produced thousands of datasets but cannot possibly measure each epigenomic factor in all cell types. To address this, we present a method, PaRallel Epigenomics Data Imputation with Cloud-based Tensor Decomposition (PREDICTD), to computationally impute missing experiments. PREDICTD leverages an elegant model called “tensor decomposition” to impute many experiments simultaneously. Tensor decomposition learns a low-rank representation of the epigenome that captures latent patterns in ChIP-seq and DNase-seq experiments from the Roadmap Epigenomics data corpus. Compared with the current state-of-the-art method, ChromImpute, PREDICTD produces lower overall mean squared error, and combining the two methods yields further improvement. We show that PREDICTD data captures enhancer activity at noncoding human-accelerated regions. PREDICTD provides reference imputed data and open-source software for investigating new cell types, and demonstrates the utility of tensor decomposition and cloud computing, both promising technologies for bioinformatics.

B-730: Integration of ATAC-seq and RNA-seq identifies tissue-specific genes and reveals actively repressed gene networks

COSI: RegSys

Rebekah Starks, Iowa State University, United States
Geetu Tuteja, Iowa State University, United States

Short Abstract: The placenta is crucial during pregnancy, regulating proper fetal growth and development. However, many aspects of placental function and development are not yet fully understood. We therefore aimed to identify active and repressed gene networks in mouse placenta at e9.5. We generated open chromatin data using ATAC-seq, and integrated it with previously published transcriptomic data. RNA-seq reads were quantified using transcripts per million (TPM), and ATAC-seq reads were quantified at gene promoters using the maximum read pileup (coverage). We then grouped genes based on their TPM and promoter coverage values. Genes with high expression and high coverage were enriched for house-keeping functions. Surprisingly, we identified genes that have high expression and medium-low coverage, that were enriched for placenta related terms including vasculogenesis and endothelial cell migration. We also identified genes that have low expression and high promoter coverage and, within this group, we extracted a protein-protein interaction network enriched for neuronal functions. Finally, we generalized these findings by running our analysis pipeline on eight other tissues/cell-lines. We found that the genes with medium-low coverage and high expression are consistently enriched for tissue-specific terms and genes. We also identified potentially repressed neuronal networks in placental cells and embryonic stem cells.

B-731: Continuous-trait probabilistic model for comparing multi-species functional genomic data

COSI: RegSys

Yang Yang, Carnegie Mellon University, United States
Quanquan Gu, University of Virginia, United States
Yang Zhang, Carnegie Mellon University, United States
Takayo Sasaki, Florida State University, United States
Julianna Crivello, University of Connecticut, United States
Rachel O'Neill, University of Connecticut, United States
David Gilbert, Florida State University, United States
Jian Ma, Carnegie Mellon University, United States

Short Abstract: A large amount of multi-species functional genomic data from high-throughput assays are becoming available to help understand the molecular mechanisms for phenotypic diversity across species. However, continuous-trait probabilistic models, which are key to such comparative analysis, remain under-explored. Here we develop a new model, called phylogenetic hidden Markov Gaussian processes (Phylo-HMGP), to simultaneously infer heterogeneous evolutionary states of functional genomic features in a genome-wide manner. Both simulation studies and real data application demonstrate the effectiveness of Phylo-HMGP. Importantly, we applied Phylo-HMGP to analyze a new cross-species DNA replication timing (RT) dataset from the same cell type in five primate species (human, chimpanzee, orangutan, gibbon, and green monkey). We demonstrate that our Phylo-HMGP model enables discovery of genomic regions with distinct evolutionary patterns of RT. Our method provides a generic framework for comparative analysis of multi-species continuous functional genomic signals to help reveal regions with conserved or lineage-specific regulatory roles.

B-732: Genome-wide Prediction and Characterization of Long Non-coding RNAs (lncRNAs) Involved in Drought and Heat Stress in Switchgrass (Panicum virgatum L.)

COSI: RegSys

Ketaki Bhide, Purdue University, United States
Shaojun Xie, Purdue University, United States
Sulbha Choudhari, Purdue University, United States
Malay C Saha, Noble Research Institute, United States
Venu Kalavacharla, Delaware State University, United States
Jyothi Thimmapuram, Purdue University, United States

Short Abstract: Long non-coding RNAs (lncRNAs) play crucial roles in many developmental processes in plants. In particular, plant lncRNAs have emerged as important regulatory elements in response to biotic and abiotic stress. To date, identification of lncRNAs in switchgrass has not been performed and their regulatory roles are unknown. In this study, we predicted lncRNAs using two tools, Coding Potential Calculator (CPC) and Plant Long Non-Coding RNA Prediction by Random Forest (PLncPRO) from RNA-Seq data of switchgrass derived from plants under heat and drought stress. A total of 14,144 novel candidate lncRNAs were predicted, of which 90, 44 and 128 were differentially expressed in drought, heat, and drought + heat stress conditions as compared to control, respectively. Further characterization of candidate lncRNAs was performed for their length distribution, exon number, AU content and annotation categories. Genes that overlap with, and are in close proximity (<= 10 kb) of differentially expressed lncRNAs were extracted and associated Gene Ontology (GO) terms were analyzed. Neighboring genes of differentially expressed lncRNAs were associated with stress related GO terms. This study will enable the exploration of potential regulatory roles of switchgrass lncRNAs under specific stress conditions and provide information to uncover functions of lncRNAs in switchgrass.

B-733: BART: Functional transcription factor prediction from query gene sets using public ChIP-seq data

COSI: RegSys

Zhenjia Wang, University of Virginia, United States
Chongzhi Zang, University of Virginia, United States

Short Abstract: Identification of functional transcription factors that regulate a given gene set is an important problem in gene regulation studies. Conventional approaches for identifying transcription factors, such as DNA sequence motif analysis, are unable to predict functional binding of specific factors and not sensitive to detect factors binding at distal enhancers. Here we present Binding Analysis for Regulation of Transcription (BART), a novel computational method and software package for predicting functional transcription factors that regulate a query gene set or associate with a query genomic profile, based on more than 6000 existing ChIP-seq datasets for over 400 factors in human or mouse. This method demonstrates the advantage of utilizing publicly available data for regulatory genomics research.

B-734: Modeling TF Binding Sites Gives Insight Into Regulatory Mechanisms of Drug Response

COSI: RegSys

Xiaoman Xie, University of Illinois at Urbana-Champaign, United States
Casey Hanson, University of Illinois at Urbana-Champaign, United States
Saurabh Sinha, University of Illinois at Urbana-Champaign, United States

Short Abstract: We present a new computational pipeline to identify transcription factors (TFs) associated with inter-individual drug response variation by integrating genotype, gene expression, and cytotoxicity data on a panel of ~300 cell lines, in the context of transcription factor binding sites (TFBS) derived from ENCODE and motifs from various databases. The first method of the pipeline, STAPMM, predicts the impact of a SNP on TF binding; we demonstrate its efficacy by predicting allele-specific binding SNPs and comparing its performance to other methods, such as gkmSVM. The second component of the pipeline assesses the extent to which SNPs impacting TFBS are proximal to drug response genes (genes whose expression co-varies with cytotoxicity), thereby associating TFs with drug response. We predicted 38 significant (TF, Drug) pairs at an FDR threshold of 0.05, 21 of which are not significant in the absence of TF binding predictors. Among the 38 (TF, Drug) pairs, three of them, (ELF1, Epirubicin), (ELF1, Doxorubicin), and (SP1, Carboplatin), were experimentally validated. We took a further look at (ELF1, Doxorubicin) and found that 25 out of the 44 drug response genes predicted to be affected by SNPs that change TF binding are central to the apoptosis pathway.

B-735: MPRAnalyze: A statistical framework for Massively Parallel Reporter Assay (MPRA) data

COSI: RegSys

Tal Ashuach, University of California, Berkeley, United States
David Fischer, Helmholtz Centre, Institute of Computational Biology, Munich, Germany, Germany
Anat Kreimer, University of California, Berkeley, Department of Electrical Engineering and Computer Science, Berkeley, CA, United States
Fabian Theis, Helmholtz Centre, Institute of Computational Biology, Munich, Germany, Germany
Nadav Ahituv, University of California, San Francisco, United States
Nir Yosef, University of California, Berkeley, Department of Electrical Engineering and Computer Science, Berkeley, CA, United States

Short Abstract: Massively parallel reporter assays (MPRAs) is a technique that enables testing thousands of regulatory DNA sequences in a single, quantitative experiment. Since MPRA is still a nascent technology, there’s no set of computational methods dedicated to effectively leverage their promise. Development of such methods could help improve future MPRA candidate sequence selection, enhance our ability to predict functional regulatory sequences and increase our understanding of the regulatory code and how its alteration can lead to a phenotypic consequence. Here we present MPRAnalyze: a statistical framework dedicated to analyzing MPRA count data. MPRAnalyze addresses all major questions posed in the context of MPRA experiments: estimating the magnitude of the effect of a regulatory sequence in a single condition setting, and comparing differential activity of regulatory sequences across multiple conditions. The framework allows for various distributional assumptions and uses generalized linear models to account for uncertainty in both DNA and RNA observations, control for various sources of unwanted variation, and incorporate negative controls for robust hypothesis testing, thereby providing clear quantitative answers in complex experimental. We demonstrate the robustness, accuracy and applicability of MPRAnalyze on simulated data and published data sets. MPRAnalyze is implemented as a publicly available R package.

B-736: Extending the Reach of Pathway Analysis Through Tissue-Specific Enhancer-Gene Links

COSI: RegSys

Caitlin Mills, University of Southern California, United States
Huaiyu Mi, University of Southern California, United States

Short Abstract: Although the usage of pathway analysis in whole genome sequencing (WGS) has greatly increased our understanding of the genetic determinants of disease, most disease-associated WGS variants are located outside of protein-coding regions and are thought to reside in regulatory regions, including many enhancers. In order to be able to utilize this information, it is necessary to map enhancers into existing pathway networks via their target genes. We want to be able to associate individual enhancers to the specific genes that they regulate in order to extend pathway networks to include enhancer function in a tissue-specific manner. We developed a pipeline to create links between putative enhancers and genes to infer the target genes of enhancers. We used this pipeline in conjunction with data from ENCODE, Ensembl, VISTA, FANTOM, GTEx, and EMBL-EBI to create a database of enhancer-gene links that allows for tissue-specific relationships between enhancers and genes, as well as transparency regarding which assays from which data sources were used to generate the links. Using epigenetic marks associated with enhancer activity, ChIA-PET data, single-tissue eQTL variants located within enhancers, and topologically associated domain data, we linked 340,119 enhancers to 18,674 protein-coding genes in 72 tissues and cell types.

B-737: ChIP-exo analysis highlights Fkh transcription factors as hubs that integrate multi-scale networks in budding yeast

COSI: RegSys

Thierry Mondeel, University of Amsterdam, Netherlands
Petter Holland, Chalmers University of Technology, Sweden
Jens Nielsen, Chalmers University of Technology, Sweden
Matteo Barberis, University of Amsterdam, Netherlands

Short Abstract: Forkhead (Fkh) transcription factors are evolutionarily conserved among eukaryotes, and coordinate a timely cell cycle progression. In budding yeast, Fkh are expressed during a lengthy window of the cell cycle, being potentially able to function as hubs integrating multiple cellular network. Here, we report on a novel ChIP-exo dataset of Fkh targets, which combines ChIP with lambda exonuclease digestion followed by high-throughput sequencing, that allows identification of a nearly complete set of binding sites at single nucleotide resolution. The available software for ChIP-seq analyses, GEM and MACE, yielded problems when analyzing ChIP-exo dataset. Therefore, we have developed a novel ChIP-exo data analysis method, that we named maxPeak. This method confirms known Fkh targets, and points to many novel ones across various cellular processes. We analyzed target genes with respect to their functional enrichment, temporal expression during the cell cycle and metabolic pathways they occur in. Furthermore, we present a comprehensive overview of the current knowledge of Fkh targets by integrating our results with complementary genome-wide studies available in literature, also pointing at differences in metabolic targets between Fkh. Our work highlights Fkh as hubs that integrate multi-scale regulatory networks to achieve proper timing of cell division in budding yeast.

B-738: Sequential regulatory activity prediction across chromosomes with convolutional neural networks

COSI: RegSys

David Kelley, Calico Labs, United States
Yakir Reshef, MIT, United States
Maxwell Bileschi, Google, United States
David Belanger, Google, United States
Cory Y. McLean, Google, United States
Jasper Snoek, Google, United States

Short Abstract: Models for predicting phenotypic outcomes from genotypes have important applications to understanding genomic function and improving human health. Here, we developed a machine learning system to predict cell type-specific epigenetic and transcriptional profiles in large mammalian genomes from DNA sequence alone. We introduce densely connected dilated convolution layers to propagate information across large sequence distances in a convolutional neural network. Using this architecture, the system identified promoters and distal regulatory elements and synthesized their content to make effective gene expression predictions. We trained models to predict thousands of human genomic profiles across hundreds of cell types. Model predictions for the influence of genomic variants on gene expression align well to causal variants underlying human eQTLs mapped by the Genotype-Tissue Expression project. We demonstrate how these predictions can be used to generate mechanistic hypotheses to enable fine mapping of disease loci.

B-739: Prediction of complete Hi-C contact maps from genomic sequence

COSI: RegSys

Christopher Jf Cameron, McGill University, Canada
Josée Dostie, McGill University, Canada
Mathieu Blanchette, McGill University, Canada

Short Abstract: The three-dimensional (3D) organization of genomes plays a key role in the regulation of genes. High-throughput chromosome conformation capture (Hi-C) allows 3D genomic organization to be determined by capturing all chromatin contacts within a cell population. Much work has already been done to computationally predict these contacts using one or more types of biochemical data (e.g., nucleosome positioning, ChIP-seq, Hi-C) as input. Although informative, these models cannot be applied to cell types or genomes where the required input data is unavailable (e.g., ancestral genomes). Moreover, most studies only predict a subset of all genome-wide chromatin interactions, which are typically found at relatively short distances (<1 Mb). Here, we describe the supervised regression problem of predicting complete Hi-C contact maps from genomic sequence alone. To address this problem, we define multiple features derived from genomic sequence data that allow machine learning algorithms to fit the underlying distribution of a Hi-C contact map at restriction-fragment resolution. We show that our models provide (i) accurate predictions of Hi-C contact frequency by properly weighting input features relevant to a particular cell type as well as (ii) insight into potential factors contributing to chromatin architecture changes.

B-740: Impact of Individual Genetic Diversity on Gene Relationship Network Architectures

COSI: RegSys

Yuichi Aoki, Tohoku University, Japan
Takeshi Obayashi, Tohoku University, Japan
Ikuko N Motoike, Tohoku University, Japan
Kengo Kinoshita, Tohoku University, Japan

Short Abstract: In the era of ever increasing number of genome-available organisms, direct estimation of functionally related genes from genome is a fundamental challenge in computational biology. In order to achieve this goal, we have investigated genomic features associated with the functional relationships of genes by using gene coexpression information. Gene coexpression, a similarity of gene expression profiles, provides a genome-wide approximation of functional gene relationships at transcriptional regulation level. In comparative analysis between the similarity of genomic features and the strength of gene coexpression, we found that genes belonging to same evolutionary age group tend to be strongly coexpressed. We further found that individual genetic diversity was significantly altered between the older gene loci and the younger one, by comparing allele frequencies calculated from the results of several large genome cohort studies. We also investigated the effect of genetic diversity on the architecture of other gene networks such as a protein-protein interaction network. We anticipate our results to be a starting point for understanding the mechanisms underlying cellular systems evolution, and for developing a genome-based gene function prediction method with taking into consideration of the individual genetic diversity.

B-741: RnBeads 2018 – comprehensive analysis of DNA methylation data

COSI: RegSys

Michael Scherer, Max-Planck Institute for Informatics, Germany
Fabian Müller, Max Planck Institute for Informatics, Germany
Yassen Assenov, German Cancer Research Center, Germany
Pavlo Lutsik, German Cancer Research Center, Germany
Jörn Walter, Saarland University, Germany
Thomas Lengauer, Max Planck Institute for Informatics, Germany
Christoph Bock, CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Austria

Short Abstract: DNA methylation is a well-studied epigenetic mark attributed with key roles in normal cell differentiation and cancer. Several software packages facilitate individual DNA methylation analysis steps, such as normalization and differential analysis. However, tools that provide a start-to-finish pipeline are rare. To fill this gap, we developed the RnBeads software package[1]. It is structured into the modules: import and export, quality control, preprocessing, covariate inference, and exploratory and differential analysis. Here, we present a substantially extended version of RnBeads with major improvements on each of the modules, including support of new data types (e.g. the Illumina EPIC array), new inference methods (such as epigenetic age prediction and estimating immune cell content of cancer samples) and improved usability by a new graphical user interface. We showcase this on four reproducible examples, each highlighting the new features: a large array-based blood data set, a whole-genome bisulfite sequencing data set on human hematopoiesis, a reduced-representation bisulfite sequencing data set on Ewing sarcoma and a benchmark data set for cross-platform integration. RnBeads represents a comprehensive tool for DNA methylation analysis and is available through R/Bioconductor. [1] Assenov, Y. et al. Comprehensive analysis of DNA methylation data with RnBeads. Nat. Methods 11, 1138–1140 (2014).

B-742: Identification of human silencers by correlating cross-tissue epigenetic profiles and gene expression

COSI: RegSys

Di Huang, NIH, United States
Ivan Ovcharenko, NIH, United States

Short Abstract: see the attached.

B-743: Genome-wide mapping of transcriptional regulation and metabolism describes information-processing units in Escherichia coli

COSI: RegSys

Daniela Ledezma-Tejeida, UNAM, Mexico
Julio Collado-Vides, UNAM, Mexico
Cecilia Ishida, UNAM, Mexico

Short Abstract: In the face of changes in their environment, bacteria adjust gene expression levels and produce appropriate responses. The individual layers of this process have been widely studied: the transcriptional regulatory network describes the regulatory interactions that produce changes in the metabolic network, both of which are coordinated by the signaling network, but the interplay between them has never been described in a systematic fashion. Here, we formalize the process of detection and processing of environmental information mediated by individual transcription factors (TFs), utilizing a concept termed genetic sensory response units (GENSOR units), which are composed of four components: (1) a signal, (2) signal transduction, (3) genetic switch, and (4) response. We used experimentally validated data sets from two databases to assemble a GENSOR unit for each of the 189 local TFs of Escherichia coli K-12 in RegulonDB. Further analysis suggested that feedback is a common occurrence in signal processing, and there is a gradient of functional complexity in the response mediated by each TF, as opposed to a one regulator/one capacity rule. Finally, we provide examples of other GENSOR unit applications, such as hypothesis generation, detailed description of cellular decision making, and elucidation of indirect regulatory mechanisms.

B-744: Constructing Robust Gene Co-expression Networks from RNA-seq Data

COSI: RegSys

Kayla Johnson, Michigan State University, United States
Arjun Krishnan, Michigan State University, United States
Kayla Johnson, Michigan State University, United States

Short Abstract: As the cost of RNA sequencing has continued to fall, the amount of publicly available RNA-seq data has continued to grow. This technology offers several advantages over microarrays including capturing known and novel transcripts and all isoforms. Constructing gene co-expression networks is a predominant method for studying gene function in specific biological contexts. However, integrating RNA-seq data from multiple sources into an accurate co-expression network still poses a significant challenge, largely due to the need for read count normalization and presence of batch effects from different experiments, which introduce non-biological variation into the data. In this research, we leverage thousands of uniformly aligned RNA-seq samples from various experiments and tissues to address these challenges. We construct gene co-expression networks for different experimental conditions (such as different tissues) using different normalization methods and batch effect correction methods to find the best methodology for each pipeline. The resulting networks are evaluated based on their ability to recover documented gene relationships.

B-745: Highly dynamic chromatin interactions drive neurogenesis through gene regulatory networks

COSI: RegSys

Valeriya Malysheva, The Babraham Institute, United Kingdom
Marco Antonio Mendoza-Parra, The Institute of Genetics and Molecular and Cellular Biology (IGBMC), France
Matthias Blum, The European Bioinformatics Institute (EMBL-EBI), United Kingdom
Hinrich Gronemeyer, The Institute of Genetics and Molecular and Cellular Biology (IGBMC), France

Short Abstract: Cell fate acquisition is a fundamental process in the ontogeny of multicellular organisms, involving a plethora of intrinsic and extrinsic instructive signals that direct the lineage progression of pluripotent cells. In the present study, we reveal the signal-propagating role of the chromatin interactome in the commitment and propagation of the initiating signal in early neurogenesis by reconstructing dynamic loop-enhanced Gene Regulatory Networks (eGRNs) that integrate transcriptome, chromatin accessibility and long-range chromatin interactions in a temporal dimension. We observe a highly dynamic re-wiring of chromatin interactions already at very early stages of neuronal differentiation. Long-range chromatin interactions are massively reorganized; only 30% of the initial interactome is conserved through cell differentiation, while new interactions are established already 6 hours after induction of neurogenesis. By integration of chromatin interactions together with temporal epigenome and transcriptome data, we identify a group of key regulatory elements that respond to and propagate the initial signal. Our data reveal an enormous capacity of the morphogen to reorganize long-range chromatin interactions by “reading” distant epigenetic signals and chromatin accessibility to drive cell fate acquisition. These results suggest that the differential establishment of chromatin contacts directs the acquisition of cell fate.

B-746: Slice It: A genome-wide resource and visualization tool to facilitate designing sgRNAs for CRSIPR/Cas9 screens, to edit protein-RNA interaction sites in the human genome

COSI: RegSys

Sasank Vemuri, IUPUI, United States
Rajneesh Srivastava, IUPUI, United States
Seyedsasan Hashemikhabir, IUPUI, United States
Sarath Chandra Janga, Indiana University – Purdue University Indianapolis, United States

Short Abstract: Several UV cross linking protocols such as eCLIP have been established to delineate the molecular interaction of RNA Binding Protein (RBP) and their target RNAs. With the advancement of pooled CRISPR/Cas9 screens, it is possible to perturb noncoding genomic regions and hence re-investigate the global impact of these proteins. We present SliceIt (http://sliceit.soic.iupui.edu/), a database of in silico sgRNA (or guideRNA) library to facilitate conducting such high throughput screens. We used CRISPR-DO to design ~4.8 million unique sgRNAs targeting all possible RBP binding sites from ENCODE eCLIP experiments of 123 RBPs in HepG2 and K562 cell lines. SliceIt provides a user friendly environment, developed using advanced framework, Elasticsearch. It is available in both table and genome browser views facilitating the easy navigation of RBP binding sites, designed sgRNAs, exon expression levels across 53 human tissues along with prevalence of SNPs and GWAS hits on binding sites. Users can also upload custom tracks of various file formats (in browser) to navigate and compare additional genomic features and omics data in hg38 genome. SliceIt provides a one-stop repertoire of sgRNA library for RBP binding sites, along with several layers of functional information to design both low and high throughput CRISPR/Cas9 screens.

B-935: Learning a human-mouse functional genomics conservation score

COSI: RegSys

Soo Bin Kwon, University of California, Los Angeles, United States
Jason Ernst, University of California, Los Angeles, United States

Short Abstract: Identifying regions that are similar in function among species is crucial because they are likely to be functionally conserved and therefore important. In particular, finding functionally similar regions between human and mouse would help us make an informed use of murine models. Previous studies are limiting in that they require matching cell-types across species and that they measure conservation at the functional genomics level only for a small portion of the genome. Thus we developed a score that quantifies conservation at the functional genomics level between human and mouse. Scores are generated at base-pair resolution by a neural network trained to learn the characteristics of functionally similar regions. Features consist of ChromHMM annotations and peak calls from DNase-seq, ChIP-seq, and CAGE experiments across human and mouse cell types. Unlike previous methods, our method does not require matching cell types, taking advantage of the wealth of publically available data. Regulatory elements conserved in sequence or active in similar cell types in human and mouse score highly, suggesting that our score captures conservation of regulatory activity. Our score is moderately correlated with sequence conservation scores, which suggests that our method offers complementary information to existing genomic annotations based on sequence alignment.

ISMB 2018

Sponsors

Posters

View Posters By Category

Session A: (July 7 and July 8)

Session B: (July 9 and July 10)