SPONSORS:

Silver

Silver Sponsor: Sanofi



General

General Sponsor - IBM Research

General Sponsor - MAGNet

General Sponsor -National Cancer Institute

RECOMB/ISCB RegSysGen 2014 Sponsor - NRNB

Cytoscape Sponsors

RECOMB/ISCB RegSysGen 2014 Sponsor - Agilent Technologies

RECOMB/ISCB RegSysGen 2014 Sponsor - Cytoscape

REGULATORY GENOMICS PRESENTATIONS & ABSTRACTS

Presented on Thursday, November 13 and Friday, November 14

Updated Nov 11, 2014


--> Go directly to Friday, Nov 14

THURSDAY, NOVEMBER 13



9:45 am – 10:05 am

RG T01
Identifying genetic and environmental determinants of gene expression


Roger Pique-Regi1, Christopher Harvey1, Gregory Moyerbrailean1, Omar Davis1, Donovan Watza1, Xiaoquan Wen2, Francesca Luca1

1Wayne State University, 2University of Michigan, Ann Arbor

The effect of genetic variants on a molecular pathway, and ultimately on the individual’s phenotype, is likely modulated by “environmental” factors. However, it is generally difficult to determine in which tissues and conditions genetic variants may have a functional impact. We denote the functional genetic variants that show cellular environment-specific effects as GxE expression quantitative trait loci (GxE-eQTLs). Achieving a better understanding of the mechanisms underlying GxE-eQTLs is a critical step in understanding the link between genotype and complex phenotypes.

To identify and characterize GxE-eQTLs we have established a new two-step and cost-effective experimental approach. In the first step, we identify global changes in gene expression using low-coverage sequencing of pools of highly multiplexed samples. In the second step, we select a subset of samples for deep sequencing and allele-specific analysis. For the first step, we generated 960 RNA-seq libraries in pools of 96 spanning 265 cellular environments across 5 cell-types (3 individuals), and 53 different treatments (including hormones, dietary components, environmental contaminants and metal ions). Relevant GO categories were enriched in the observed global gene expression changes (e.g., immune response for Dexamethasone, ion homeostasis for Zinc). We then analyzed allele specific expression (ASE) using a novel method (QuASAR) that allows for joint genotyping and allele specific analysis on RNA-seq data. Across 56 cellular environments we discovered 7738 instances of ASE (FDR<10%), corresponding to 6234 unique ASE genes. Using a Bayesian model across treatments within cell types, we observed that generally >95% ASE signals are shared and their effect sizes are highly concordant (posterior correlation coefficient 0.9). This is highly consistent with previous analysis of condition-specific eQTLs. Nevertheless, out of 112,564 tests we still estimate 2318 loci with a Bayes posterior probability supporting GxE interaction (1273 sites treatment-specific and 1045 sites control-specific, GxE-eQTLs). Genes that are differentially expressed also show a higher enrichment for condition-specific ASE. Our results constitute a first comprehensive catalog of GxE-eQTLs and we anticipate that it will contribute to the discovery and understanding of GxE interactions underlying complex traits.

...............................................................................................................................
Thursday, November 13
10:05 am – 10:25 am

RG T02
A pooling-based approach to mapping genetic variants associated with DNA methylation

Irene Kaplow1, Sarah Mah2, Julia MacIsaac2, Michael Kobor2, Hunter Fraser1

1Stanford University, 2University of British Columbia

DNA methylation is an epigenetic modification that plays a key role in gene regulation. Previous studies have investigated its genetic basis by mapping genetic variants that are associated with DNA methylation at specific sites, but these have been limited to microarrays that cover less than 2% of the genome and cannot account for allele-specific methylation (ASM). Other studies have performed whole-genome bisulfite sequencing on a few individuals, but these lack statistical power to identify variants associated with methylation. We present a novel approach in which bisulfite-treated DNA from many individuals is sequenced together in a single pool, resulting in a truly genome-wide map of DNA methylation. Compared to methods that do not account for ASM, our approach increases statistical power to detect associations while sharply reducing cost, effort, and experimental variability. As a proof of concept, we generated deep sequencing data from the pooled DNA of 60 human cell lines and identified over 2000 genetic variants associated with DNA methylation. We found that these variants are enriched in tissue-specific transcription factor binding sites and can also be associated with chromatin accessibility and gene expression. In sum, our approach allows genome-wide mapping of genetic variants associated with DNA methylation in any species, without the need for individual-level genotype or methylation data.

...............................................................................................................................
Thursday, November 13
10:25 am – 10:45 am

RG T03 • FULL LENGTH MANUSCRIPT
Are all genetic variants in DNase I sensitivity regions functional?


Gregory A. Moyerbrailean1, Chris T. Harvey1, Cynthia A. Kalita1, Xiaoquan Wen1, Francesca Luca1, Roger Pique-Regi1

1Wayne State University

A detailed mechanistic understanding of the direct functional consequences of DNA variation on gene regulatory mechanism is critical for a complete understanding of complex trait genetics and evolution. Here, we present a novel approach that integrates sequence information and DNase I footprinting data to predict the impact of a sequence change on transcription factor binding. Applying this approach to 653 DNase-seq samples, we identified 3,831,862 regulatory variants predicted to affect active regulatory elements for a panel of 1,372 transcription factor motifs. Using QuASAR, we validated the non-coding variants predicted to be functional by examining allele-specific binding (ASB). Combining the predictive model and the ASB signal, we identified 3,217 binding variants within footprints that are significantly imbalanced (20% FDR). Even though most variants in DNase I hypersensitive regions may not be functional, we estimate that 56% of our annotated functional variants show actual evidence of ASB. To assess the effect these variants may have on complex phenotypes, we examined their association with complex traits using GWAS and observed that ASB-SNPs are enriched 1.22-fold for complex traits variants. Furthermore, we show that integrating footprint annotations into GWAS meta-study results improves identification of likely causal SNPs and provides a putative mechanism by which the phenotype is affected.

...............................................................................................................................
Thursday, November 13
11:10 am – 11:30 am

RG T04
Viral and retrotransposon sequences have shaped the preferred contexts for APOBEC-mediated mutagenesis

Jeffrey Chen1, Thomas MacCarthy1

1Stony Brook University

The AID/APOBEC gene family of cytidine deaminases consists of mutagenic enzymes that have evolved roles in innate immunity such as virus restriction and suppression of transposable elements, particularly in mammals. The ancestral APOBEC gene, Activation Induced Deaminase (AID) arose early in vertebrate evolution and plays a key adaptive immunity role (somatic hypermutation of the Immunoglobulin genes) in all jawed vertebrates. Biochemical and in vivo profiling of many APOBECs shows they cause C to T transitions and have evolved a variety of local DNA sequence context preferences. APOBEC3F, for example, has a preference for mutations at TTC sites whereas APOBEC3G has a preference for CCC. We assess the impact of each motif on a set of potential target genes to investigate how individual preferences have been shaped. By specifically examining the impact of replacement mutations we demonstrate that the known APOBEC preferences maximally impact retrotransposons while minimally impacting essential host genes. Furthermore, permutation analysis of several mammalian virus genomes shows these have evolved to avoid the impact of these mutations. Our results also suggest that APOBEC preferences impose restrictions on codon and amino acid usage in their target genes by, for example, heavily disfavoring amino acid pairs that must encode the TTC motif favored by APOBEC3F.

...............................................................................................................................
Thursday, November 13
11:30 am – 11:50 am

RG T05
Quantitative modeling of transcription factor binding specificities using DNA shape

Tianyin Zhou1, Ning Shen2, Lin Yang1, Namiko Abe3, John Horton2, Richard Mann3, Harmen Bussemaker3, Raluca Gordan2, Remo Rohs1

1University of Southern California, 2Duke University, 3Columbia University

Our current knowledge of genome function is the result of sequence-based data in the form of one-dimensional strings of letters. However, DNA-binding proteins recognize the double helix as a three-dimensional object. Therefore, an understanding of transcription factor (TF) binding specificity must ultimately include DNA shape. The sequence-structure relationship in DNA is highly degenerate, and different nucleotide sequences can give rise to the same structure, while single nucleotide sequence variants sometimes change DNA shape over a region of several base pairs. To explore these effects on a genomic scale, we developed a method for the high-throughput DNA shape features. We used these structural features to augment nucleotide sequence in binding specificity models derived from statistical machine learning approaches such as support vector regression (SVR) and regularized multiple linear regression (MLR). Using these approaches, we learned in vitro DNA binding specificity models from protein binding microarray (PBM), genomic-context PBM, and HT-SELEX/SELEX-seq data. Based on data for many TFs from diverse protein families, we demonstrated that shape-augmented models are generally more efficient than existing sequence models in terms of accuracy, number of features, and computation time. Our models provide information on the importance of specific DNA sequence and shape features and thus reveal TF family-specific readout mechanisms and better explain why a given TF binds in vivo to a specific genomic target site.

...............................................................................................................................
Thursday, November 13
11:50 am – 12:10 pm

RG T06
Genome-wide map of regulatory interactions in the human genome


Nastaran Heidari1, Douglas Phanstiel1, Michael Snyder1

1Stanford University

Increasing evidence suggests that interactions between regulatory genomic elements play an important role in regulating gene expression. We generated a genome-wide interaction map of regulatory elements in human cells (ENCODE tier 1 cells, K562, GM12878) using Chromatin Interaction Analysis by Paired-End Tag sequencing (ChIA-PET) experiments targeting six broadly distributed factors. Bound regions covered 80% of DNase I hypersensitive sites including 99.7% of TSS and 98% of enhancers. Correlating this map with ChIP-seq and RNA-seq data sets revealed cohesin, CTCF, and ZNF143 as key components of three-dimensional (3D) chromatin structure and revealed how distal chromatin state affects gene transcription. Comparison of interactions between cell types revealed that enhancer-promoter interactions were highly cell-type specific. Construction and comparison of distal and proximal regulatory networks revealed stark differences in structure and biological function. Proximal binding events are enriched at genes with housekeeping functions while distal binding events interact with genes involved in dynamic biological processes including response to stimulus. This study reveals new mechanistic and functional insights into regulatory region organization in the nucleus.

...............................................................................................................................
Thursday, November 13
12:10 pm – 12:30 pm

RG T07
Deconvolution of massively- parallel reporter assays tiling 15,000 human regulatory regions reveal activating and repressive regulatory sites at nucleotide-level resolution

Jason Ernst1, Tarjei Mikkelsen2, Manolis Kellis3

1University of California, Los Angeles, 2Broad Institute, 3Massachusetts Institute of Technology

Massively parallel reporter assays have enabled genome-scale validation experiments towards gaining a systems-level view of gene regulation. A series of studies have demonstrated their use for testing thousands of predicted enhancers, dissecting regulatory motifs within them, and testing synthetically designed sequences. However, even with tens of thousands of sequences tested in a single assay, it has been impractical to dissect large numbers of regions at nucleotide level resolution, without an a priori knowledge of predicted regulatory motifs, limiting their large scale use to validation, but not discovery.

Here, we overcome this limitation, and present a new Bayesian tiling deconvolution approach, which combines experimental tiling of regulatory regions using 31 sequences of length 145 bp at 5 bp intervals covering 295 bp in total with computational deconvolution of the resulting signal to infer a nucleotide-level view of regulatory activity across thousands of regulatory regions. By exploiting the multiple overlapping sequences in a probabilistic framework, our method is also robust to noisy or missing measurements, and enables high-resolution inferences with a very small number of tested sequences per target region. This enables the de novo discovery of individual binding sites, and inference of their activating or repressive action in a single experiment across thousands of candidate regions. In contrast, activating and repressive sites are generally not distinguishable in current DNase hypersensitivity footprinting assays, as they both show footprints.

We apply this method to more than 15,000 regions in the human genome, in two ENCODE cell types, selected based on the presence of DNase hypersensitivity and chromatin marks covering a diverse range of regulatory regions, including enhancers, promoters, and insulator regions. Our method resulted in a regulatory activity score for more than 4.5 million nucleotides, which we used to predict bases of activation and repression. These nucleotides showed strong enrichments for motifs associated with activation or repression in the cell type.

Our method enables an unbiased, de novo, and high-resolution view of regulatory bases, which complements current motif scanning and DNase hypersensitivity footprinting approaches, and provides the first nucleotide-level view of activating and repressive sites across a sizeable fraction of the regulatory human genome.

...............................................................................................................................
Thursday, November 13
1:55 pm – 2:15 pm

RG T08 • FULL LENGTH MANUSCRIPT
cDREM: Inferring dynamic combinatorial gene regulation


Aaron Wise1, Ziv Bar-Joseph1

1Carnegie Mellon University

Motivation: Genes are often combinatorially regulated by multiple transcription factors (TFs). Such combinatorial regulation plays an important role in development and facilitates the ability of cells to respond to different stresses. While a number of approaches have utilized sequence and ChIP-based datasets to study combinational regulation, these have often ignored the combinational logic and the dynamics associated with such regulation.

Results: Here we present cDREM, a new method for reconstructing dynamic models of combinatorial regulation. cDREM integrates time series gene expression data with (static) protein interaction data. The method is based on a hidden Markov model and utilizes the sparse group Lasso to identify small subsets of combinatorially active TFs, their time of activation, and the logical function they implement. We tested cDREM on yeast and human data sets. Using yeast we show that the predicted combinatorial sets agree with other high-throughput genomic datasets and improve upon prior methods developed to infer combinatorial regulation. By applying cDREM to study human response to flu we were able to identify several combinatorial TF sets, some of which were known to regulate immune response while others represent novel combinations of important TFs.

...............................................................................................................................
Thursday, November 13
2:15 pm – 2:35 pm

RG T09
ATAC-seq is predictive of chromatin state


Chuan-Sheng Foo1, Sarah Denny1, Jason Buenrostro1, William Greenleaf1, Anshul Kundaje1

1Stanford University

Distinct combinations of chromatin modifications (chromatin states) have been found to be associated with different types of active and repressed functional elements in the human genome such as promoters, enhancers, and transcribed elements. Previously, multivariate hidden Markov models (e.g. ChromHMM and Segway) have been used to learn combinatorial chromatin states and automatically annotate genomes. However, such methods typically require multiple high-quality chromatin mark datasets as input, thus limiting their applicability in practice. Chromatin ChIP-seq experiments are time-consuming and costly to perform, and more importantly, require large amounts of input material to obtain reliable signal. We (Greenleaf lab) recently developed an assay, ATAC-seq, that accurately profiles genome-wide chromatin accessibility, DNA binding protein footprints, and nucleosome positioning from low amounts of input material based on direct in vitro transposition of sequencing adaptors into native chromatin. We previously showed that loci with different chromatin states (learned from histone modification ChIP-seq datasets) showed distinct distributions of ATAC-seq insert sizes in aggregate.

In this work, we further this connection between chromatin architecture and chromatin states by showing that chromatin architecture is in fact predictive of chromatin state at individual loci. More concretely, we show that a machine learning model trained on various features derived solely from ATAC-seq data is able to accurately predict different classes of regulatory elements in active and repressed chromatin states in cell lines and primary cells. The success of our method suggests that different classes of regulatory elements are associated with distinct open chromatin and nucleosome positioning signatures. We explore the feasibility of cross-cell-line chromatin state prediction and determine the minimum sequencing depth required for good predictive performance by subsampling reads. In conclusion, when applied to ATAC-seq data, our method enables high quality genome-wide chromatin state annotations from low quantities of input material using a single assay, potentially enabling the in vivo dissection of chromatin states from (rare) sorted cell populations in primary tissue.

...............................................................................................................................
Thursday, November 13
2:35 pm – 2:55 pm

RG T10
Integrative analysis of haplotype-resolved epigenomes across human tissues


Inkyung Jung1, Danny Leung1, Nisha Rajagopal1, Bing Ren1

1Ludwig Institute of Cancer Research

Allelic differences between the two sets of chromosomes can affect the propensity of inheritance in humans; however, the extent of such differences in the human genome has yet to be fully explored. Here, for the first time, we delineate allelic chromatin modifications and transcriptomes amongst a broad set of human tissues, enabled by a chromosome-spanning haplotype reconstruction strategy. The resulting masses of haplotype-resolved epigenomic maps are the first of its kind and reveal extensive allelic biases in the transcription of human genes, which appear to be primarily driven by genetic variations. Furthermore, allelic resolution of chromatin states allows us to discover cis-regulatory relationships between genes and their control sequences. These maps also uncover intriguing characteristics of cis-regulatory elements and tissue-restricted activities of repetitive elements. The rich datasets described here will enhance our understanding of the mechanisms controlling tissue-specific gene expression programs.

...............................................................................................................................
Thursday, November 13
2:55 pm – 3:15 pm

RG T11
Mechanistic basis of causal non-coding FTO obesity variant


Melina Claussnitzer1, Simon Dankel2, Gerald Quon3, Han Kim4, Hans Hauner5, Manolis Kellis3

1Harvard Medical School, 2University of Bergen, 3Massachusetts Institute of Technology, 4University of Toronto, 5Technical University Munich

Genome-wide association (GWA) studies revealed thousands of non-coding complex trait and disease genetic associations, whose mechanistic underpinnings remain elusive. Here, we leverage the Roadmap and ENCODE epigenomic maps across diverse human tissues and cell types to gain new insights into the regulatory underpinnings of the strongest genetic association with risk to polygenic obesity — i.e., the FTO obesity-associated locus — as a model for deciphering non-coding complex trait genetic associations. We find that the obesity-associated region harbors a >10kb super-enhancer candidate active in the adipose lineage, suggesting a role of adipose in FTO locus activity. We narrow down the causal region from 47kb and 82 variants ultimately to a single-nucleotide causal variant and identify its long-distant downstream target genes located up to a million nucleotides away in adipocytes. Using a comparative motif module analysis approach, we show that the causal variant overlaps a cluster of cross-species conserved regulatory motif instances for predicted master regulators enriched in obesity-associated variants across the genome, and that the risk allele disrupts the binding of the predicted transcriptional repressor, which is highly expressed in adipose cells. We demonstrate regulator-dependent repression of both the enhancer and its target genes in adipocytes conditional on the risk allele. We further confirm that the identified single-nucleotide change results in cellular phenotypes consistent with obesity, including decreased mitochondrial energy dissipation and increased triglyceride accumulation, based on repressor-dependent and allele-conditional differences in primary human adipocytes. Lastly, we evaluate the obesity role of the identified causal variant at the organismal level in transgenic mice, by generating an adipose-specific inhibition of target gene activity, opposite to their tissue-specific risk allele-dependent de-repression by the variant, and find dramatic differences in body weight, fat accumulation, and mitochondrial energy expenditure. Overall, our results suggest that an intronically located non-coding variant is the causal variant underlying the FTO association with obesity, by disrupting regulator-mediated repression of the identified long-distant target genes, and resulting in an adipose-specific shift from mitochondrial energy production to energy storage. We propose that the FTO locus controls energy dissipation in the form of heat, in a cell-autonomous way, opening up the potential to new therapeutics that directly target mitochondrial activity in adipocytes by exploiting the cell-regulatory circuitry of repressor, variant, and target that we have unraveled in this study. Overall, we here introduce a general model for the elucidation of non-coding variants associated with complex traits and disease, including: (1) establish the relevant tissue and cell type; (2) establish the target genes; (3) establish the causal variant; (4) recognize the upstream regulator; (5) establish the cellular phenotypic consequences; and (6) establish the organismal phenotypic consequences.

...............................................................................................................................
Thursday, November 13
3:40 pm – 4:00 pm

RG T12
Systematic detection of spatio-temporal patterns of epigenetic changes


Petko Fiziev1, Constantinos Chronis1, Kathrin Plath1, Jason Ernst1

1University of California, Los Angeles

Histone modifications associate with important regulatory regions such as promoters and enhancers that control the expression of genes. Time-course genome-wide maps of these epigenetic marks have become available in a growing number of biological settings, including somatic cell reprogramming and differentiation processes, circadian rhythms, embryogenesis, and lymphocyte development. However, our understanding of the underlying cellular processes remains limited because the current bioinformatics tools often fail to utilize fully the temporal aspects of this data. Here, we present a novel computational method for systematic detection of major classes of spatio-temporal patterns of epigenetic changes. The method takes as input data a series of chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) experiments for a single histone mark that are performed at consecutive time points during a given biological process. The method uses a probabilistic mixture model that explicitly models the spatio-temporal nature of the data to identify regions for which the mark either expands or contracts significantly with time or holds steady. We present applications of our method on data from somatic cell reprogramming in mouse and from cardiac differentiation that help in understanding the regulatory dynamics of these processes.

...............................................................................................................................
Thursday, November 13
4:00 – 4:20 pm

RG T13
MyoD induces active and poised chromatin structures during transdifferentiation


Dinesh Manandhar1, Lingyun Song1, Ami Kabadi1, Charles Gersbach1, Raluca Gordan1, Greg Crawford1

1Duke University

Overexpression of transcription factor (TF) MyoD has been shown to transdifferentiate cells from non-myogenic lineages into cells with muscle-like expression and functional characteristics. However, expression studies show that the transdifferentiated cells have only some myogenic genes upregulated. Chromatin level reprogramming is also incomplete. In this work, we investigate the reasons behind incomplete MyoD-induced transdifferentiation of fibroblasts, including potential MyoD cofactors, DNA methylation, and posttranslational histone modifications. We analyzed high-throughput chromatin accessibility (DNase-seq) data, in vivo MyoD binding (ChIP-seq) data, and global gene expression (RNA-seq) data on primary skin fibroblast cells transduced with inducible MyoD, and compared against the data obtained from starting fibroblast cells and target myoblasts and myotubes. Our study of local chromatin changes genome-wide suggests that the chromatin state of transdifferentiated fibroblasts is intermediary between fibroblast and muscle chromatin states. Importantly, we observed a continuum of chromatin reprogramming in the MyoD-induced fibroblasts, indicating that complete reprogramming is achieved in only a small fraction of the genome. We also see evidence that during MyoD-induced transdifferentiation, chromatin closes more easily than it opens up. Using random forest and support vector machine classifiers, we show that various genetic and epigenetic features dictate the efficiency of chromatin level reprogramming. For instance, fibroblast DNase hypersensitive sites (DHSs) with higher GC content tend to stay open more than DHSs with low GC content. Our analysis of TF motifs and histone modification data suggests that the presence of certain TFs or histone modification marks at or around a genomic site can dictate the efficiency of chromatin reprogramming. Analysis of gene expression data shows that reprogramming of genes correlates well with reprogrammed chromatin state. Nonetheless, enriched levels of “poised” or “memory” state chromatin are also observed around such genes. This indicates that MyoD is capable of inducing both active and poised chromatin structures that are similar to primary muscle lineages, and that other additional factors – such as Uhrf1, a chromatin remodeler under-expressed in transdifferentiated cells – can potentially help improve the reprogramming efficiency. Interestingly, we also found that although MyoD binding in non-DHSs opens up the chromatin at many genomic loci, a big fraction of MyoD-bound sites remain closed. Most of these closed sites lack MyoD-specific binding sites, which suggests that during transdifferentiation MyoD can also bind non-specifically or mediated by protein cofactors.

...............................................................................................................................
Thursday, November 13
4:20 pm – 4:40 pm

RG T14:  Epigenomics of the Mammalian Brain


Chongyuan Luo1

1Joseph Ecker Laboratory, The Salk Institute for Biological Studies

The mammalian brain consist of numerous neuronal and non-neuronal cell types that are functionally indispensable. The abundances of distinct cell types can differ by orders of magnitude. The absence of representation for sparse, but nevertheless functionally critical, cell types in whole tissue genomic analyses presents a challenge. We generated an extensive epigenomics dataset from nuclei of specific neuronal types purified by INTACT (isolation of nuclei tagged in specific cell types) approach. We identified abundant epigenomic and regulatory differences in nearly 200,000 discrete regions across pyramidal neurons, parvalbumin (PV)-expressing and vasoactive intestinal peptide (VIP)-expressing interneurons in adult mouse. Non-CG DNA methylation accounts for nearly half of all DNA methylation in adult mouse neurons and inversely correlates with cell-type specific gene expressions. Significant interactions between putative transcription factor binding and epigenomic features suggest an interplay between sequence specific and non-specific mechanisms for maintaining neuronal type specification. Our results stress the importance of cell-type specific methods for studying epigenomes and identification of gene regulatory mechanisms in complex tissues such as the mammalian brain.

...............................................................................................................................

Top of Page


FRIDAY, NOVEMBER 14



9:45 am – 10:05 am

RG15
Cell type-specific regulatory networks reveal common pathways of disease variants


Gerald Quon1, Melina Claussnitzer1, Michal Grzadkowski1, Manolis Kellis1

1Massachusetts Institute of Technology

Genome-wide association studies (GWAS) have identified thousands of single nucleotide variants associated with diverse human traits, but understanding their combined action in complex systems remains an open challenge. With more than 80% of lead GWAS SNPs located in non-coding regions of the genome rich in regulatory elements, functionally characterizing these variants necessitates knowledge of (1) the locations of cell type specific enhancers; (2) the identity of the target genes of those enhancers; and (3) the interactions between these target genes to identify disrupted pathways and subnetworks. However, existing gene-centric networks may not be suitable for network analysis because of uncertainty over the target genes of non-coding GWAS loci.

Using enhancer and promoter maps for 127 cell types predicted by the Roadmap Epigenomics Consortium, we have constructed directed cell type specific networks, where nodes represent four types of elements: transcriptional regulators, non-coding regulatory elements (enhancers, promoters), SNPs, and target genes. Edges lead from transcription factors to regulatory elements, and regulatory elements to genes, while SNPs are connected to tagged regulatory elements. To leverage these networks for GWAS analysis, we developed an efficient probabilistic model to simultaneously (1) identify the target regulatory element of each individual non-coding GWAS locus, from among the set of all tagged elements; (2) identify TF regulators whose binding sites are characteristic of GWAS target regulatory elements and distinguish them from other active regulatory elements in the cell type; and (3) identify other regulatory elements likely to harbor additional weak effect GWAS variants.

We applied our networks and model to a diverse range of GWAS traits, including metabolic (diabetes), lipid, and brain disorders. We find that our predicted regulators of complex traits recapitulate many known regulators from the literature, and we have experimentally validated several novel regulators in type 2 diabetes. Our cross validation experiments holding out subsets of GWAS loci during model training demonstrate that our networks have predictive power for identifying GWAS variants. We also found that based on network structure alone, a subset of sub-genome wide significant GWAS loci exhibit exceptionally strong evidence for involvement in trait variation, and their predicted target genes also are implicated in the complex trait. Finally, we show target genes of predicted target regulatory elements of GWAS loci yield sensible phenotypes in mouse when mutated. These results taken together suggest that these regulatory element-centric networks, combined with our novel probabilistic model, can help yield insight into the important genomic players of complex traits and disease.

...............................................................................................................................
Friday, November 14
10:05 am – 10:25 am

RG T16
Assessing the impact of non-silent somatic mutations on protein activity


Mariano Alvarez1, Federico Giorgi1, Yao Shen1, Andrea Califano1

1Columbia University

Large-scale sequencing of cancer genomes often reveals thousands of non-silent (amino acid-changing) somatic mutations (NSSM) in proteins. However, not all cancer mutations affect the molecular function of the mutated protein. Current computational approaches to differentiate functional from non-functional variants are based on predicting physiochemical effects of the substitutions, taking into account the protein surface placement in interaction sites, secondary and tertiary structure features, and evolutionary conservation of the affected protein domains. There is no approach, however, to estimate the effect of somatic mutations on protein activity in an unbiased and genome-wide fashion.
In this work, we measured the association between non-silent somatic mutations and the activity of the mutated protein, as estimated by the VIPER algorithm, for 3,912 tumor samples, representing 14 different tumor types, for which matched gene expression and exome or genome profile data is available from TCGA. For this, we assembled tissue type-specific regulatory models and inferred the activity of each transcription factor and signaling protein in each sample using the VIPER algorithm. We then tested whether samples harboring NSSMs in specific proteins were enriched among those ranked by the VIPER-inferred protein activity.

To illustrate the potential of our approach, we focused the analysis on 147 genes listed by the Catalogue Of Somatic Mutations In Cancer (COSMIC), which were mutated in at least 2 samples within the same tumor type. We find significant association of the mutations with VIPER-inferred protein activity for 75 of the 147 proteins (p < 0.05). Interestingly, NSSM were also significantly associated with differential expression for 85 of the evaluated genes. To isolate the independent contribution of protein activity (i.e., the purely post-translational effect), we eliminated any transcriptional component from VIPER-inferred activities by removing the transcriptional variance component from the analysis. Remarkably, we found significant association between NSSM and transcriptional-independent VIPER-activity for 71 of the 147 tested proteins, showing that the effect of mutations is largely independent of transcriptional changes in the corresponding genes and that VIPER analysis can effectively capture these post-transcriptional, NSSM-dependent effects.

Finally, we increased the resolution of the analysis by testing whether different NSSMs within the same host gene (e.g., G12V vs. G12D KRAS mutations) may differentially affect protein activity. In total, we analyzed 648 NSSMs affecting 49 genes across 12 tumor types. Careful analysis of these results showed that VIPER-detected changes in protein activity are both mutation-specific and tumor type-specific.

To our knowledge, this work constitutes the first genome-wide and unbiased approach to catalogue the functional relevance of coding somatic mutations in cancer, with profound implications in the discovery of driver variants at the single patient level, and clear application in precision medicine.

...............................................................................................................................
Friday, November 14
10:25 am – 10:45 am

RG T17 • FULL LENGTH MANUSCRIPT
A validated gene regulatory network and GWAS to identify early transcription factors in T-cell associated diseases


Mika Gustafsson1, Danuta Gawel1, Sandra Hellberg1, Aelita Konstantinell1, Daniel Eklund1, Jan Ernerudh1, Antonio Lentini1, Robert Liljenström1, Johan Mellergård1, Hui Wang2, Colm E. Nestor1, Huan Zhang1, Mikael Benson1

1Linköpings UniveristetUniversitet, 2MD Anderson Cancer Center

The identification of early regulators of disease is important for understanding disease mechanisms, as well as finding candidates for early diagnosis and treatment. Such regulators are difficult to identify because patients generally present when they are symptomatic, after early disease processes. Here, we present an analytical strategy to systematically identify early regulators by combining gene regulatory networks (GRNs) with GWAS. We hypothesized that early regulators of T-cell associated diseases could be found by defining upstream transcription factors (TFs) in T-cell differentiation. Time-series expression profiling identified upstream TFs of T-cell differentiation into Th1/Th2 subsets enriched for disease associated SNPs identified by GWAS. We constructed a Th1/Th2 GRN based on integration of expression, DNA methylation profiling, and sequence-based predictions data using LASSO algorithm. The GRN was validated by ChIP-seq and siRNA knockdowns. GATA3, MAF, and MYB were prioritized based on GWAS and the number of GRN predicted targets. The disease relevance was supported by differential expression of the TFs and their targets in profiling data from six T-cell associated diseases. We tested if the three TFs or their splice variants changed early in disease by exon profiling of two relapsing diseases, namely multiple sclerosis and seasonal allergic rhinitis. This showed differential expression of splice variants of the TFs during relapse-free asymptomatic stages. Potential targets of the splice variants were validated based on expression profiling and siRNA knockdowns. Those targets changed during symptomatic stages. Our results show that combining construction of GRNs with GWAS can be used to infer early regulators of disease.

...............................................................................................................................
Friday, November 14
11:10 am – 11:30 am

RG T18
Enhancer RNAs reveal widespread chromatin reorganization in prostate cancer cell lines


Ville Kytölä1, Annika Kovakka1

1University of Tampere

Chromatin conformation determines the gene regulatory programs and enables the diversity of cell types. Characterization of chromatin state across different cell lines has been a central focus of major projects such as ENCODE. These studies have revealed a number of insights into cellular programs in cell differentiation and disease related dysregulation. However, the degree of chromatin variation between individuals is less studied and the diversity of chromatin organization in cancer is not known.

In order to gain insight into diversity of chromatin organization in prostate cancer we characterized 11 prostate and prostate cancer cell lines under different culture conditions using Global Run-On sequencing (GRO-seq). This assay allowed us to identify the active enhancer areas from each cell line through detection of nascent transcription of enhancer RNA (eRNA) molecules. To this end, we developed a new computational algorithm to identify eRNA signals in a genome-wide manner by utilizing the unique bi-directional pattern of nascent transcription. Identified eRNA sites show high consistency with areas of open chromatin from DNase I sequencing (DNase-seq) data as over 80% of the sites are covered by open chromatin signals in LNCaP cells.

We present a comparison of eRNA signals across prostate cancer cell lines. Our analysis reveals extensive variation in enhancer activity between prostate cancer models. On average, approximately 3000 active eRNA loci were identified from each cell line with the number of detected sites varying from 1300 to 8000. Based on the detection results, the cell lines clustered according to androgen receptor (AR) status. When cultured in the presence of androgens, the number of identified eRNA sites in LNCaP and VCaP cells doubled in comparison to cells cultured without androgens. Overall, we identified nearly 25,000 distinct loci of which only 33% were shared between more than two cell lines. We find a high number of loci for which eRNA activity correlates with the expression of nearby genes. Interestingly, from among these sites we were able to extract a subset of over a hundred extremely highly correlating ( > 0.9) connections, strongly indicating that these enhancer regions are contributing to the phenotypic diversity of prostate cancer. Taken together, these analyses highlight several new patterns of active enhancer regions that associate with specific prostate cancer subtypes. We are integrating eRNA activities with DNA methylation and transcriptome data from the same cell lines to uncover detailed regulatory programs in prostate cancer.

...............................................................................................................................
Friday, November 14
11:30 am – 11:50 am

RG T19
Identifying differential functions of cancer mutations using a structurally resolved protein interaction network


Hatice Billur Engin1, Matan Hofree1, Hannah Carter1

1University of California, San Diego

Here we present a method for discovering the distinct functional outcomes of different somatic missense mutations in a protein. This is the first attempt, to our knowledge, to explicitly account for diverse structural consequences of mutations for protein activity when extracting altered gene sets from tumor ‘omics data. Until now, efforts to mine large tumor ‘omics datasets have assumed that all damaging amino acid substitutions in a protein have the same consequences for protein activity, and that all of the protein’s interactions are equally impacted. However, disease-causing mutations are frequently observed at interface residues mediating protein interactions[1-4]. These residues are not essential for protein stability but are nontrivial for the binding energies of protein-protein interactions. As a result, mutations at interfaces may cause diverse phenotypes by changing the interaction profile of the mutated protein.

In this study, we generated a hybrid protein-protein interaction network with a subset of edges that include protein structural information for frequently mutated cancer genes. By mapping missense mutations reported by The Cancer Genome Atlas onto the 3D structures of the encoded proteins, we identified core and interface mutations. We used these designations to alter the network by removing 1 or more edges according to whether the mutation is more likely to destabilize specific interaction(s) or the entire protein. Then, using a diffusion-based approach on the altered network, we implicated distinct sets of interacting proteins associated with different mutated residues. The interacting proteins for each mutation were functionally annotated to highlight specific biological processes likely to be affected.

Although the number of 3D structures capturing protein-protein interactions is small, our analysis using the existing structures provides strong evidence supporting specific functional consequences of somatic missense mutations at distinct sites within the same protein. We performed an in-depth case study of HRAS, implicating distinct biological processes associated with mutations observed at residues G12, G13 and G61 that may have clinical relevance for patients. We also clustered cancer patients based on diffusion profiles incorporating residue-specific effects to find subgroups of patients that are similar at the level of disrupted biological processes. Our analysis suggests that accounting for mutation-specific perturbations to cancer pathways will be essential for personalized cancer therapy.

References:
1 David, A., Razali, R., Wass, M. N. & Sternberg, M. J. Protein-protein interaction sites are hot spots for disease-associated nonsynonymous SNPs. Human mutation 33, 359-363, doi:10.1002/humu.21656 (2012).
2 Wang, X. et al. Three-dimensional reconstruction of protein networks provides insight into human genetic disease. Nature biotechnology 30, 159-164, doi:10.1038/nbt.2106 (2012).
3 Yates, C. M. & Sternberg, M. J. The effects of non-synonymous single nucleotide polymorphisms (nsSNPs) on protein-protein interactions. Journal of molecular biology 425, 3949-3963, doi:10.1016/j.jmb.2013.07.012 (2013).
4 Zhong, Q. et al. Edgetic perturbation models of human inherited disorders. Molecular systems biology 5, 321, doi:10.1038/msb.2009.80 (2009).

...............................................................................................................................
Friday, November 14
11:50 am – 12:10 pm

RG T20

Network-based Stratification of Tumor Profiles

Matan Hofree1, John P. Shen2, Hannah Carter3, Andrew Gross3, Trey Ideker1,2,3

1Department of Computer Science and Engineering, University of California San Diego, CA
2Department of Medicine, University of California San Diego, CA
3Department of Bioengineering, University of California San Diego, CA

Classification of cancer is predominantly organ based and fails to account for considerable heterogeneity of clinical outcomes such as survival or response to therapy. Somatic tumor genomes provide a rich new source of data for uncovering subtypes, but have proven difficult to compare, as tumors rarely share the same mutations. Recently we introduced network-based stratification (NBS), a method to integrate somatic tumor genomes with gene networks. This approach allows for stratification of cancer into subtypes by clustering together patients with perturbations in similar network regions. We demonstrate NBS in multiple cancer cohorts from The Cancer Genome Atlas. In each case, NBS identifies subtypes that are predictive of clinical outcomes such as patient survival, response to therapy or histology. We show that subtypes may be reproduced in an independent cohort and are predictive of chemo-resistance in cell-lines.  Finally, we show how we can integrate somatic mutations, gene fusion events and copy-number changes to discover subtypes of thyroid cancer and identify network regions characteristic of each type.

...............................................................................................................................
Friday, November 14
1:00 pm – 1:20 pm

RG T21 • FULL LENGTH MANUSCRIPT
Multi-species network inference improves gene regulatory network reconstruction for early embryonic development in Drosophila


Anagha Joshi1, Yvonne Beck1, Tom Michoel1

1The Roslin Institute, The University of Edinburgh

Gene regulatory network inference uses genome-wide transcriptome measurements in response to genetic, environmental, or dynamic perturbations to predict causal regulatory influences between genes. We hypothesized that evolution also acts as a suitable network perturbation and that integration of data from multiple closely related species can lead to improved reconstruction of gene regulatory networks. To test this hypothesis, we predicted networks from temporal gene expression data for 3,610 genes measured during early embryonic development in six Drosophila species and compared predicted networks to gold standard networks of ChIP-chip and ChIP-seq interactions for developmental transcription factors in five species. We found that (i) the performance of single-species networks was independent of the species where the gold standard was measured; (ii) differences between predicted networks reflected the known phylogeny and differences in biology between the species; (iii) an integrative consensus network which minimized the total number of edge gains and losses with respect to all single-species networks performed better than any individual network. Our results show that in an evolutionarily conserved system, integration of data from comparable experiments in multiple species improves the inference of gene regulatory networks. They provide a basis for future studies using the numerous multi-species gene expression datasets for other biological processes available in the literature.

...............................................................................................................................
Friday, November 14
1:20 pm – 1:40 pm

RG T22
Comparison of D. melanogaster and C. elegans developmental stages, tissues, and cells by modENCODE RNA-Seq data


Jingyi (Jessica) Li1, Haiyan Huang2, Peter J. Bickel2, Steven Brenner2

1University of California, Los Angeles, 2University of California, Berkeley

We report a statistical study to discover transcriptome similarity of developmental stages from D. melanogaster and C. elegans using modENCODE RNA-seq data. We focus on "stage-associated genes" that capture specific transcriptional activities in each stage and use them to map pairwise stages within and between the two species by a hypergeometric test. Within each species, temporally adjacent stages exhibit high transcriptome similarity, as expected. Additionally, fly female adults and worm adults are mapped with fly and worm embryos, respectively, due to maternal gene expression. Between fly and worm, an unexpected strong collinearity is observed in the time course from early embryos to late larvae. Moreover, a second parallel pattern is found between fly prepupae through adults and worm late embryos through adults, consistent with the second large wave of cell proliferation and differentiation in the fly life cycle. The results indicate a partially duplicated developmental program in fly. Our results constitute the first comprehensive comparison between D. melanogaster and C. elegans developmental time courses and provide new insights into similarities in their development. We use an analogous approach to compare tissues and cells from fly and worm. Findings include strong transcriptome similarity of fly cell lines, clustering of fly adult tissues by origin regardless of sex and age, and clustering of worm tissues and dissected cells by developmental stage. Gene ontology analysis supports our results and gives a detailed functional annotation of different stages, tissues, and cells. Finally, we show that standard correlation analyses could not effectively detect the mappings found by our method.

...............................................................................................................................
Friday, November 14
1:40 pm – 2:00 pm

RG T23
Context-specific regulation by miR-155 through ApA-dependent and independent mechanisms


Gabriel Loeb1,2, Yuheng Lu1, Jing-Ping Hsin12, Christina Leslie1, Alexander Rudensky1,2

1Memorial Sloan Kettering Cancer Center, 2Howard Hughes Medical Institute

MicroRNAs (miRNAs) are critical post-transcriptional regulators of gene expression that repress expression of target mRNAs by mediating the interaction between RISC and cognate sites in 3’UTRs. Recent studies have begun to investigate whether miRNAs, like transcription factors, regulate their targets in a cell-type and context dependent manner. For example, alternative polyadenylation (ApA) produces cell-type specific changes in 3’UTR isoform expression, and relative “shortening” or “lengthening” of 3’UTRs can lead to loss or gain of miRNA binding sites. Previous studies, however, have reached conflicting conclusions about the impact of ApA on the cell-type specificity of miRNAs. Moreover, it is plausible that there are ApA-independent mechanisms of miRNA context specificity, such as the tissue-specific expression of particular RNA-binding proteins.

miR-155 is an important regulator in the immune system and is up-regulated upon activation of multiple immune cell types, including macrophages, dendritic cells, and T and B lymphocytes. To examine the cell-type specificity of miR-155 in physiologically relevant cellular contexts, we performed PolyA-seq and RNA-seq in these four activated immune cell populations from WT and miR-155 KO mice. Importantly, PolyA-seq enables the analysis of miR-155 regulation at the level of 3’UTR isoforms as well as detection of ApA between cell types. Quantitative expression analysis revealed that a large fraction of miR-155 targets are significantly differentially regulated between cell types at both the gene and 3’UTR isoform levels. Overall, miR-155 targets are strongly enriched for genes with multiple 3’UTR isoforms. Among the multi-isoform target genes, there is a significant overlap between genes differentially regulated by miR-155 and genes exhibiting ApA between cell types. In-depth analysis suggested that ApA-dependent miRNA specificity may be a combination of two mechanisms: (1) lengthening or shortening of 3’UTRs can result in gain or loss of miRNA target sites; and (2) the regulatory activity of the same miRNA target site may depend on its relative position within 3’UTR isoforms. Meanwhile, we have also found that in many cases the same target 3’UTR isoform can be differentially regulated between cell types, suggesting ApA-independent mechanisms of miRNA specificity.

...............................................................................................................................
Friday, November 14
2:00 pm – 2:20 pm

RG T24 • FULL LENGTH MANUSCRIPT
Systematic study of synthetic transcript features in S. cerevisiae exposes gene-expression determinants


Tuval Ben-Yehezkel1, Shimshi Atar2, Tzipy Marx1, Rafael Cohen1, Alon Diament2, Alexandra Dana2, Anna Feldman2, Ehud Shapiro1, Tamir Tuller2

1Weizmann Institute of Science, 2Tel Aviv University

A major challenge in functional genomics is understanding how different parts of the transcript affect aspects of its expression. Heterologous gene expression can potentially contribute to this research topic, but has rarely been studied systematically, specifically in eukaryotes.

Here, we use a synthetic biology approach to study the distinct and causal effect of different parts of the transcript in the eukaryote S. cerevisiae. We generated three distinct reporter libraries of the viral HRSVgp04 gene for studying the effect of three distinct regions in the transcript: (1) the 5'UTR, (2) the first 40 codons, and (3) codons 42-81of the ORF. Each of the three libraries contained variants with multiple, rationally designed synonymous mutations, totaling 383 distinct variants tested individually for gene expression.

Our results show that while synonymous mutations in each of the three regions can have a dramatic effect on protein abundance, those closer to the 5’ end of the ORF are the most effective modulators of protein abundance. Additionally, while weaker local mRNA folding at the beginning of the ORF (codons 1–8) increases protein abundance, it decreases protein abundance when present in downstream codons, reinforcing previous evolutionary studies demonstrating the selection of folding strength in different parts of the ORF. Finally, we show that the mean relative codon decoding time, based on ribosomal densities in endogenous genes, significantly correlates with our measured protein abundance (correlation up to r = 0.6175; p=0.0013). While this report provides an improved understanding of transcript evolution and gene expression regulation, it also suggests relatively simple rules for engineering synthetic gene expression in a eukaryote.


Top of Page | Go directly to Friday, Nov 14