RSG POSTER ABSTRACTS - 61 and above


Complete list of RSG Poster Abstracts (.pdf) - Click here.
...............................................................................................................................

Poster: P61
Identifying sequence-dependent regulators of gene expression from a novel massively parallel reporter assay


Vincent Fitzpatrick, Columbia University, Department of Biological Sciences, United States
Joris van Arensbergen, Netherlands Cancer Institute, Netherlands
Marcel de Haas, Netherlands Cancer Institute, Netherlands
J Omar Yáñez-Cuna, Netherlands Cancer Institute, Netherlands
Ludo Pagie, Netherlands Cancer Institute, Netherlands
Bas van Steensel, Netherlands Cancer Institute, Netherlands
Harmen Bussemaker, Columbia University, Department of Biological Sciences, United States

DNA-binding proteins regulate expression through sequence-specific interactions with gene promoters. These interactions are further mediated by local chromatin context extrinsic to the promoter sequence, making it difficult to separate sequence-dependent regulatory mechanisms from other contextual factors. To this end, our collaborators in the Van Steensel lab (NKI) have developed SuRE-seq, a high-throughput reporter assay that screens for genomic fragments capable of driving expression of a uniform plasmid reporter. SuRE-seq quantifies the relative expression rate of millions of genomic elements in parallel, providing insight into genome-wide mechanisms of transcription regulation. Using a regression-based approach, we have discovered sequence-specific, spatially-dependent mechanisms of gene regulation in Drosophila and human cell lines, including motifs attributable to known transcription factors and low-complexity sequence patterns with strand-dependent contributions to expression. These results allow us to separate sequence-intrinsic regulatory properties of gene promoters and enhancers that are independent of endogenous chromatin context.

...............................................................................................................................

Poster: P62
Characterization of phased, secondary, small interfering RNAs (phasiRNAs) using Machine Learning


Parth Patel, University of Delaware, United States
Sandra Mathioni, University of Delaware, United States
Atul Kakrana, University of Delaware, United States
Hagit Shatkay, University of Delaware, United States
Blake Meyers, University of Delaware, United States

Small RNAs (sRNAs) in plant range in size from 21 to 24 nucleotides, and play important roles in biological processes such as development, epigenetics modification, and plant defense. They can be partitioned into three major classes: microRNAs (miRNAs); heterochromatic small interfering RNAs (hc-siRNAs); and phased, secondary, small interfering RNAs (phasiRNAs) (Fei et al., 2013). Our study focuses on phasiRNAs, for which the knowledge about functionality is still limited. We (Zhai et al. 2015) and others have shown that maize anthers (male reproductive organs), express two classes of phasiRNAs (21-nt and 24-nt) during different developmental time points (pre-meiotic and meiosis). Other data suggest these phasiRNAs are required for fertility.

Given the important role grasses such as maize and rice play as a prime food-source in many countries and as influential factors in the global economy, we aim to identify and understand the function of grass-specific phasiRNAs in maize and rice development. To this end, we use the framework of hidden Markov models (HMMs) in order to model both phasiRNA and non-phasiRNA sequences, and to distinguish between the two types of these small RNAs. We performed ANOVA with Dunnett's method, demonstrating that the probability assigned by the resulting HMMs to phasiRNAs (21/24-nt) from rice and maize is significantly different from that assigned to other genomic sequences of similar length. Future work will include classification to distinguish phasiRNA sequences from non-phasiRNA sequences using other machine learning classifier(s), aiming to extract patterns (i.e., motifs, GC content) occurring in phasiRNAs to provide further insight into their biological function.

...............................................................................................................................

Poster: P63
The transcription factor GABP selectively binds and activates the mutant TERT promoter in cancer


Robert J.A. Bell, University of California San Francisco, United States
H. Tomas Rube, Columbia University, United States
Alex Kreig, University of Illinois Urbana-Champaign, United States
Andrew Mancini, University of California San Francisco, United States
Shaun F. Fouse, University of California San Francisco, United States
Raman P. Nagarajan, University of California San Francisco, United States
Serah Choi, University of California San Francisco, United States
Chibo Hong, University of California San Francisco, United States
Daniel He, University of California San Francisco, United States
Melike Pekmezci, University of California San Francisco, United States
John K. Wiencke, University of California San Francisco, United States
Margaret R. Wrensch, University of California San Francisco, United States
Susan M. Chang, University of California San Francisco, United States
Kyle M. Walsh, University of California San Francisco, United States
Sua Myong, University of Illinois Urbana-Champaign, United States
Jun S. Song, University of Illinois Urbana-Champaign, United States
Joseph F. Costello, University of California San Francisco, United States

Reactivation of telomerase reverse transcriptase (TERT) expression enables cells to overcome replicative senescence and escape apoptosis, which are fundamental steps in the initiation of human cancer. Multiple cancer types, including up to 83% of glioblastomas (GBMs), harbor highly recurrent TERT promoter mutations of unknown function but specific to two nucleotide positions. We identified the functional consequence of these mutations in GBMs to be recruitment of the multimeric GA-binding protein (GABP) transcription factor specifically to the mutant promoter. Allelic recruitment of GABP is consistently observed across four cancer types, highlighting a shared mechanism underlying TERT reactivation. Tandem flanking native E26 transformation-specific motifs critically cooperate with these mutations to activate TERT, probably by facilitating GABP heterotetramer binding. GABP thus directly links TERT promoter mutations to aberrant expression in multiple cancers.

...............................................................................................................................

Poster: P64
Charting the human genome’s regulatory landscape with transcription factor binding site predictions


Xi Chen, New York University, United States
Richard Bonneau, New York University/Simons Foundation, United States

Transcription factor (TF) binding is an essential step in the regulation of gene expression. Differential binding of multiple TFs at key cis-regulatory loci allows the specification of progenitor cells into various cell types, tissues and organs. ChIP-Seq is a technique that can reveal genome-wide patterns of TF binding. However, it lacks the scalability to cover the range of factors, cell types and dynamic conditions a multicellular eukaryotic organism sees. So charting the regulatory landscape spanning multi-lineage differentiation requires computational methods to predict TF binding sites (TFBS) in an efficient and scalable manner.

We develop a method to predict binding sites for over 800 human TFs using a rich collection of DNA binding motifs. We integrate genomic features, including chromatin accessibility, motif scores, TF footprints, CpG/GC content, evolutionary conservation and the proximity of TF motifs to transcription start sites in sparse logistic regression classifiers. We label candidate motif sites with ChIP-Seq data and apply correlation-based filter and L1 regularization to select relevant features for each trained TF. The resulted logistic regression classifiers accurately predict TFBS and perform favorably in comparison to the current best TFBS prediction methods. Further, we map TFs based on feature distance to a nearest trained TF neighbor. Cross-TF predictions allow us to scale and expand the repertoire of putative TFBS to any TFs where motif data is available and to any cell types where accessibility data is obtainable. Our method has the potential to be applied in previously intractable domains, such as the identification of cell type-specific cis-regulatory modules, and reveal key properties underlying the regulatory complexity of multicellular eukaryotes.

...............................................................................................................................

Poster: P65
Deconvolving discriminative sequence features in overlapping categories of TF binding sites


Akshay Kakumanu, Penn State, United States
Silvia Velasco, New York University, United States
Esteban Mazzoni, New York University, United States
Shaun Mahony, Penn State, United States

Given multiple ChIP-seq experiments, we often seek to define clusters of binding sites that describe site properties across experiments. For example, we may categorize a given transcription factor’s binding sites as condition-independent or condition-specific across multiple condition ChIP-seq experiments. Similarly, we may categorize a transcription factor’s binding sites as being located in active enhancers or not based on overlaps with appropriate histone modification ChIP-seq experiments. Given such binding site categories, it is natural to ask what sequence features are associated with a category label. However, discovering such label-specific sequence features is often confounded by overlaps between binding site categories. For example, if condition-independent transcription factor binding sites are also more likely to be located within promoter regions, any sequence features specific to condition-independent binding behavior will be convolved with sequence features specific to promoters. Therefore, in order to identify sequence signals specifically associated with a given binding label it is necessary to deconvolve discriminative sequence signals from overlapping labels.

In order to meet this challenge, we developed SeqUnwinder, a principled approach to identifying interpretable discriminative sequence features for overlapping categories of transcription factor binding sites. SeqUnwinder uses local k-mer frequencies as predictors for a multiclass logistic classifier. Class label relationships between clusters are incorporated through an L-2 norm regularizer that encourages clusters sharing a label to have similar predictor weights. Our approach yields an integrated framework that identifies discriminative sequence signals for individual TF binding class labels and all combinations of labels, making it easy to gain more insights into TF binding preferences in a given in vivo system.

We demonstrate SeqUnwinder by using it to characterize TF binding during direct motor neuron programming. In our system, over-expression of Ngn2, Isl1, and Lhx3 (NIL) induces rapid and highly efficient conversion of mouse embryonic stem (ES) cells to spinal motor neurons. However, little is known about how the NIL factor combination achieves direct programming. We used ChIP-seq to profile NIL binding at three intermediate time points during the direct programming process. We then formed overlapping clusters of binding sites according to two criteria: dynamics over the course of programming, and the chromatin context in the initial ES cells. SeqUnwinder enables us to identify several meaningful sequence features associated with each cluster label, and thereby allows us to formulate hypotheses about the mechanisms through which over-expression of NIL can alter the fate of ES cells into induced motor neurons.

...............................................................................................................................

Poster: P66
Implementation of a Deep Learning Framework to Predict De Novo Anticancer Drug Activity


Jose Zamalloa, Princeton University, United States
Mona Singh, Princeton Universtiy, United States

Cancer treatment can greatly benefit from highly accurate drug prediction models. Current methods aim to identify key features in genomic data in order to predict known efficacies of a particular drug across cancer cell lines. The Cancer Cell Line Encyclopedia (CCLE) and Cancer Genome Project (CGP) provide the cancer drug panel data to build predictive models based on known drug compounds and genomic backgrounds of cancer cells. However, given that known compounds are not sufficient to efficiently treat cancer at the moment, there is a pressing need to develop methods that can accurately predict drug activity of compounds for which we have no prior information. The present method aims to solve this problem by combining chemical information across compounds and cancer genetic backgrounds to predict an unknown drug activity using a Deep Learning framework. We incorporate structural information of endogenous metabolites to describe chemical features of drug compounds and integrated them as features into our predictor along with selected genomic information.

We applied our approach to the CCLE dataset. We train our model on all the dataset but the compound of interest and test it on such compound in order to simulate the prediction of an unknown drug. Our preliminary results show that our accuracy is on par or better than current methods suggesting its potential use in predicting untested cancer drug candidates.

...............................................................................................................................

Poster: P67
Computational Discovery of Transcription Factors Associated with Drug Response


Casey Hanson, University of Illinois at Urbana - Champaign , United States
Junmei Cairns, Mayo Clinic, United States
Liewei Wang, Mayo Clinic, United States
Saurabh Sinha, University of Illinois at Urbana - Champaign, United States

Purpose: Genome wide association studies in pharmacogenomics generally involve associating drug-induced response with biomarkers. While GWAS suffers from sensitivity issues after correction, even signals that survive face problems of functional interpretation. Our study ameliorates this issue by posing the statistical test in the context of gene regulation. Rather than identifying SNPs or genes associated with drug response, we integrate biomarkers with genome-wide transcription factor (TF) binding data to elucidate whether a TF’s regulatory influence is associated with the drug. Our approach (GENMi) integrates gene expression, genotype, and drug-response data with ENCODE TF tracks to quantify the association between TFs and drugs via cis-regulatory eQTLs.

Methods: The GENMi method for testing a (TF, drug) combination consists of the following procedure. First, SNPs located outside of the TF’s ENCODE peak are discarded. Considering the 50kb upstream region of a gene as a putative cis-regulatory region, the gene is scored by the most significant eQTL under the TF’s peak. The top 400 eQTL genes are then tested for overlap with all genes correlated with the drug’s-induced cytotoxicity, using Gene Set Enrichment Analysis.

Results: We analyzed 114 TFs and 24 treatments using GENMi, yielding 334 significantly associated (TF, drug) pairs. The top 20 sparse (TF, drug) pairs yielded literature support for 13 associations, often from studies where perturbation of the TF’s expression changes drug response. We demonstrate the advantage of our approach by contrasting it with a baseline without using gene expression data. Our method reports more associations than the baseline approach at identical false positive rates (FPR). We further tested 14 TFs GENMi associated with either anthracycline (doxorubicin or epirubicin) and 21 TFs associated with either taxanes (paclitaxel or docetaxel). MTS cytotoxicity assays after TF knockdowns in two triple negative breast cancer cell lines, BT549 and MDA-MB231, yielded 6 TFs that significantly de-sensitized the cell to taxane induced apoptosis and 4 TFs that significantly de-sensitized the cell to anthracycline induced apoptosis.

...............................................................................................................................

Poster: P68
Pervasive variation of transcription factor orthologs contributes to regulatory network divergence


Shilpa Nadimpalli, Princeton University, United States
Anton V. Persikov, Princeton University, United States
Mona Singh, Princeton University, United States

Differences in transcriptional regulatory networks underlie much of the phenotypic variation observed across organisms. Changes to cis-regulatory elements are widely believed to be the predominant means by which regulatory networks evolve, yet examples of regulatory network divergence due to transcription factor (TF) variation have also been observed. To systematically ascertain the extent to which TFs contribute to regulatory divergence, we analyzed the evolution of the largest class of metazoan TFs, Cys2-His2 zinc finger (C2H2-ZF) TFs, across 12 Drosophila species spanning ~45 million years of evolution. Remarkably, we uncovered that a significant fraction of all C2H2-ZF 1-to-1 orthologs in flies exhibit variations that can affect their DNA-binding specificities. In addition to loss and recruitment of C2H2-ZF domains, we found diverging DNA-contacting residues in ~44% of domains shared between D. melanogaster and the other fly species. These diverging DNA-contacting residues, present in ~70% of the D. melanogaster C2H2-ZF genes in our analysis and corresponding to ~26% of all annotated D. melanogaster TFs, show evidence of functional constraint: they tend to be conserved across phylogenetic clades and evolve more slowly than other diverging residues. These same variations were rarely found as polymorphisms within a population of D. melanogaster flies, indicating their rapid fixation. The predicted specificities of these dynamic domains gradually change across phylogenetic distances, suggesting stepwise evolutionary trajectories for TF divergence. Further, whereas proteins with conserved C2H2-ZF domains are enriched in developmental functions, those with varying domains exhibit no functional enrichments. Our work suggests that a set of highly dynamic and largely unstudied TFs are a likely source of regulatory variation in Drosophila and other metazoans.

...............................................................................................................................

Poster: P69
A Novel Experimental Model Sheds Light on the Mechanism of Host-Gut Microbiome Interactions


Allison Richards, Wayne State University, United States
Michael Burns, University of Minnesota, United States
Adnan Alazizi, Wayne State University, United States
Roger Pique-Regi, Wayne State University, United States
Ran Blekhman, University of Minnesota, United States
Francesca Luca, Wayne State University, United States

The components of the human gut microbiome vary between physiological and pathological states. It has been shown that the gut microbiome differs in individuals with certain diseases, such as diabetes. However, the question of the cause and effect of the differences in both gut microbiome and host state remains unsolved. In order to delve into the relationship between gut microbiome and host, we treated human primary colonic epithelial cells (colonocytes) with different concentrations of gut microbiome from a healthy donor for varying time intervals. Each experiment was performed in triplicate. We found that a gut microbiome to host ratio of 10:1 is best to simulate the symbiotic environment of the colon. Under these conditions, we performed RNA-seq to determine changes in the host gene expression that comprise a response to microbiome exposure. RNA-sequencing reads were aligned using BWA-mem and differentially expressed genes were identified using DESeq2. We found 2,111 genes and 1,110 genes that change expression in the host colonocytes following exposure to the microbiome for 4 and 6 hours, respectively (FDR = 1%). These genes are enriched for a variety of pathways involved in the interaction of host cells and gut microbiome. Specifically, we found enrichment in pathways involved in cell adhesion and cell surface receptor signaling. In addition, we performed 16S sequencing on bacterial DNA derived from the same co-cultures as the host colonocytes in order to study changes in the composition of the gut microbiome following exposure to the host. We found that after 4 hours of co-culturing, there was a decrease in the proportional abundance of the phylum Firmicutes and a corresponding increase in the phylum Proteobacteria. Furthermore, there was a decrease in overall diversity of the gut microbiome following exposure to the host colonocytes. Together, these results help us to identify which pathways are involved in the host response following microbiome exposure and in turn, how the microbiome is changed by host exposure. Study of both of these responses will help us to understand the cause of the differences in gut microbiome composition that have been seen in various pathological states.

...............................................................................................................................

Poster: P70
Experimentally identified gene-environment interactions contribute to heritability of complex traits


Cynthia Kalita, Wayne State University, United States
Gregory Moyerbrailean, Wayne State University, United States
Omar Davis, Wayne State University, Canada
Chris Harvey, Wayne State University, United States
Adnan Alizizi, Wayne State University, United States
Donovan Watza, Wayne State University, United States
Xiaoquan Wen, University of Michigan, United States
Xiang Zhou, University of Michigan, United States
Roger Pique-Regi, Wayne State University, United States
Francesca Luca, Wayne State University, United States

Genome wide association studies (GWAS) have identified thousands of common genetic variants associated with complex traits, including normal traits and common diseases. However, the significant SNPs found in these association studies explain only a small proportion of disease heritability. One possible explanation for this missing heritability is that the effect of the variant on the trait can be detected only under the right environmental conditions.

To test this hypothesis, we used GEMMA to jointly analyze summary statistics from 18 GWAS meta-analysis studies with annotations of regulatory variation. Our annotations are derived from: SNPs with allele specific expression (ASE) in 48 cellular environments, eQTLs (Wen et al 2014), and SNPs with conditional allele specific expression (cASE).

GEMMA (Genome-wide Efficient Mixed Model Association) tests for the proportion of variance in phenotypes explained (PVE) by typed genotypes, for example, “chip heritability”. At the same time, it estimates enrichment of a set of annotations within a GWAS trait. We find a range of enrichments for SNPs in genes with ASE, up to 7.84 for mean platelet volume. In comparison, for this same trait, SNPs in genic regions without ASE show an enrichment value of 1.03. When we consider SNPs in genes with cASE, we observe an enrichment of 5.10 as compared to 3.93 (SNPs in genes with ASE) and 1.04 (SNPs in genic regions). This approach, which integrates regulatory variation and gene-environment interactions into GWAS signals, can provide a much better understanding of the molecular mechanisms underlying inter-individual variation in complex traits.

...............................................................................................................................

Poster: P71
A systematic survey of the Cys2His2 zinc finger DNA-binding landscape


Joshua Wetzel, Princeton University, United States
Anton Persikov, Princeton University, United States
Mona Singh, Princeton University, United States
Marcus Noyes, NYU Institute for Systems Genetics, United States

Cys2His2 zinc fingers (C2H2-ZFs) comprise the largest class of metazoan DNA-binding domains. Despite this domain's well-defined DNA-recognition interface, and its successful use in the design of chimeric proteins capable of targeting genomic regions of interest, much remains unknown about its DNA-binding landscape. To help bridge this gap in fundamental knowledge and to provide a resource for design-oriented applications, we screened large synthetic protein libraries to select binding C2H2-ZF domains for each possible three base pair target. The resulting data consist of >160 000 unique domain–DNA interactions and comprise the most comprehensive investigation of C2H2-ZF DNA-binding interactions to date. An integrated analysis of these independent screens yielded DNA-binding profiles for tens of thousands of domains and led to the successful design and prediction of C2H2-ZF DNA-binding specificities. Computational analyses uncovered important aspects of C2H2-ZF domain–DNA interactions, including the roles of within-finger context and domain position on base recognition. We observed the existence of numerous distinct binding strategies for each possible three base pair target and an apparent balance between affinity and specificity of binding. In sum, our comprehensive data help elucidate the complex binding landscape of C2H2-ZF domains and provide a foundation for efforts to determine, predict and engineer their DNA-binding specificities.

...............................................................................................................................

Poster: P72
Statistical Algorithms for Motif Discovery on SELEX Data


Chaitanya Rastogi, Department of Biological Sciences, Columbia University, United States
Harmen Bussemaker, Department of Biological Sciences, Columbia University, United States

SELEX-seq is an experimental and computational platform that combines biophysical modeling and deep sequencing in order to determine the DNA binding specificity of a transcription factor complexes [1]. Recent work has demonstrated the protocol’s ability to elucidate novel recognition properties of the eight Drosophila Hox proteins [2]. SELEX-seq analyses require detailed oligomer count information to infer affinities, a challenging computational task given the size of the data. Efficient implementations of the computational pipeline are required as the adoption of SELEX-seq increases. Following the methodology set out in [1,2], we have developed a suite of R/Bioconductor functions, named "SELEX," to facilitate the analysis of SELEX-seq data. Thanks to efficient algorithms, this software can run on a standard laptop computer. Our package includes functionality for kmer counting, Markov model construction, and information gain (Kullback-Leibler divergence) calculations, along with integrated solutions for painless annotation and management of SELEX-seq experiments. Significantly, the package forms the basis for advanced feature-based modeling of TF binding sites. These novel statistical models directly infer G values for nucleotide, dinucleotide, and DNA shape features without any prior information about the binding factor in question.

[1] T.R. Riley, M. Slattery, N. Abe, C. Rastogi, D. Liu, R.S. Mann†, and H.J. Bussemaker†. (2014) SELEX-seq, a method for characterizing the complete repertoire of binding site preferences for transcription factor complexes. Methods Mol. Biol. 1196:255-78.

[2] M. Slattery, T.R. Riley, P. Liu, N. Abe, P. Gomez-Alcala, R. Rohs*, B. Honig*, H.J. Bussemaker*, R.S. Mann*. (2011) Cofactor Binding Evokes Latent Differences in DNA Binding Specificity between Hox proteins. Cell 147(6):1270-82.

...............................................................................................................................

Poster:  P73
Interpreting non-coding SNPs and quantifying the value of perturbation experiments using ensembles of biophysical models

Farzaneh Khajouei, University of Illinois Urbana-Champaign, United States
Md. Abul Hassan Samee, University of Illinois Urbana-Champaign, United States
Stanislav Shvartsman, Princeton University, United States
Saurabh Sinha, University of Illinois Urbana-Champaign, United States

Characterizing the mechanisms of gene regulation and predicting the functional effects of single nucleotide polymorphisms in regulatory sequences are two major challenges of the day. Thermodynamics-based models that map an enhancer sequence to its precise expression levels in varying cellular conditions are a promising means to meet these challenges. However, these models often involve several (10-20) free parameters, and the parameter space typically has many local optima. Current approaches rely on the best-fitting model (parameterization) to make predictions about perturbation conditions, but a large collection of models that fit the training data almost equally well remains unexplored as a result. We study this crucial problem with the goals of (1) constructing a probabilistic landscape or ‘ensemble’ of models from training data and (2) using the ensemble of models to analyze non-coding polymorphisms and to systematically design perturbation experiments.

Our approach considers all models with goodness-of-fit above a threshold. A large collection of models is constructed through deep, uniform sampling of the parameter space followed by local optimizations. From this collection we construct a continuous probability landscape over the parameter space, using Gaussian mixture models and adaptive variance estimation. The resulting landscape allows predictions about perturbation conditions to be made with error estimates. Using the thermodynamics-based GEMSTAT model, we constructed probabilistic landscapes for wild-type expression of developmental genes in Drosophila. We also constructed landscapes of models that fit both wild-type data and perturbation data from transcription factor knock-down or site mutagenesis experiments. Finally, we assessed the value of each perturbation experiment by the reduction in entropy of parameter landscape resulting from the experiment. Assigning information-theoretic values to experiments is a first step to systematic experiment design, paving the way to fully resolved models.

We also used the above probability landscape to analyze single nucleotide polymorphisms (SNPs) in a well-studied enhancer of the developmental gene ‘ind’ in Drosophila. We predicted the effects of every possible single nucleotide variation within this enhancer, and shortlisted the variations with the greatest predicted effects on average (over the probability landscape) or whose predicted effects exhibit the most uncertainty. We then focused on SNPs within this enhancer as recorded in the DGRP resource, and compared the spectrum of predicted quantitative effects of these observed SNPs to the spectrum of all possible SNPs. Our analyses pointed to an avoidance of strong-effect variations in general, but also provided strong evidence for compensation between modest-effect SNPs in the same individual.

...............................................................................................................................

Poster:  P74
Tissue context improves disease-gene mining from biomedical text


Ruth Dannenfelser, Princeton University, United States
Ran Zhang, Princeton University, United States
Olga Troyanskaya, Princeton University, United States

Identifying the genes associated with disease is an important task in biology that furthers our understanding of disease mechanisms and has important clinical implications. Although it is well known that genetic defects regulate disease manifestation in a tissue-specific manner, tissue contexts are not considered in disease-gene curation, nor are they annotated in current disease-gene databases. In this work we develop the first method to identify potential disease-gene associations in a given tissue by mining PubMed abstracts. Conditioning on tissues, we collect and extract the information contained in biomedical abstracts to fit an unsupervised model, based on conditional mutual information, for prediction. In over 50 diverse tissues, we achieve promising performance with our model-based approach, which also outperforms tissue naive prediction, suggesting that our method can accurately assign disease related genes to their specific tissue contexts. Additionally, our model can help identify the genetic cause of tissue specific dysfunction when a disease affects multiple tissues. We illustrate this by localizing obesity related genes in the hypothalamus and adipocytes. A web server will be made available for users to query disease-gene predictions with tissue labels.

...............................................................................................................................

Poster: P75
H3K4me3 downstream of transcription start sites is responsible for transcriptomic modifications in systemic lupus erythematosus

Zhe Zhang, The Children's Hospital of Philadelphia, United States
Lihua Shi, The Children's Hospital of Philadelphia, United States
Kathleen Sullivan, The Children's Hospital of Philadelphia, United States

Autoimmune disease systemic lupus erythematosus (SLE) has a systematically modified epigenome according to our previous studies on histone modifications such as tri-methylation of histone H3 lysine 4 (H3K4me3). H3K4me3 is a canonical open chromatin mark of active transcription.  Recent studies also suggested that H3K4me3 breadth at transcription start site (TSS) has important regulatory role in cell identity. This project examined H3K4me3 breadth at TSS in primary monocytes and its association with differential gene transcription in SLE. Integrative bioinformatics analysis was applied to ChIP-seq and RNA-seq data generated from the same samples, as well as public genomic data. We created an online application for this project, which also enables users to explore its data and perform their own analysis. (http://awsomics.org/project/sle_h3k4me3_breadth)

Distinctive H3K4me3 patterns of ChIP-seq peaks were identified from 14,217 TSSs in control monocytes. The narrow peaks are mostly related to housekeeping functions. The broader peaks have extended H3K4me3 at TSS upstream and/or downstream and are often found at immune response genes. Many TSSs have downstream H3K4me3 extended to ~650bp, where H3K36me3, a transcriptional elongation mark, starts to raise. H3K4me3 pattern is strongly associated with gene overexpression in SLE. Genes with narrow peaks were less likely (OR = 0.14) while genes with extended downstream H3K4me3 were more likely (OR = 2.4) to be overexpressed in SLE. Since H3K4me3 levels of nearby regions are correlated to each other, we removed the interdependence of TSS, upstream and downstream regions by fitting a linear model and evaluated the direct correlation between differential transcription and differential H3K4me3 at each region. The downstream region has the strongest association with differential transcription. Of the genes having significant overexpression in SLE (p < 0.01), respectively 78.8%, 55.0% and 47.1% had increased H3K4me3 at their downstream, TSSs and upstream regions. Gene transcription sensitively and consistently responded to downstream H3K4me3 change, as every one percent increase of H3K4me3 led to ~1.5% average increase of transcription.

In summary, we identified TSS downstream as a crucial region responsible for transcription changes in SLE. Given that many genes have the transcriptional initiation-elongation transition in this region, it is plausible to hypothesize that increase of downstream H3K4me3 will facilitate the transition by making the nucleosome more accessible to elongation machinery. This study applied a unique method to study the effect of H3K4me3 breadth on diseases, and revealed new insights about epigenomic modifications in SLE, which can potential lead to novel  treatments.  

...............................................................................................................................

Poster: P76
Coupled dynamics of drug synergy, gene expression, and alternative splicing in combination therapies of breast cancer

Bojan Losic, Icahn Institute for Genomics and Multiscale Biology, Mount Sinai, United States
Xintong Chen, Icahn Institute for Genomics and Multiscale Biology, Mount Sinai, United States
Gustavo Stolovitzky, IBM Research and Icahn School of Medicine at Mount Sinai, United States

Drug combination therapies in the cancer setting often succeed where mono-therapies fail, facilitating durable and robust responses that may curtail metastases and even be accompanied by milder side-effects. Predicting synergistic and antagonistic combinations based on the gene expression data of mono-therapy drug-tumor response is an important open problem (see concurrent submission) wherein the role of transcriptional splicing dynamics is often ignored or too poorly correlated with phenotypes to be useful.

In this work we leverage the inherent transcript/exon level resolution of RNA-seq data to infer gene expression and splicing signatures associated with additive and synergistic drug combinations as defined by canonical viability measurements in a time-course experiment.  Briefly, we used the HiSeq Illumina RNA-seq assay to study the transcriptional response over time (0, 3, 6, 9, 12, and 24 h) for three drugs (A, B and C) and their combinations (AB, AC and BC) in MCF-7 (ER+) breast cancer cells lines.  Cell viability measurements show that one of the combinations (AB) is strongly synergistic, whereas the other two (AC and BC) are merely additive.  We show via rigorous linear modeling of RNA-seq count data at the exon level that in addition to a novel transcriptional signature driven by differential expression, the combination AB transcriptional landscape is characterized by persistent alternative splicing signatures mostly comprised of genes which are not differentially expressed with respect to A or B but whose functional role has been dramatically changed by the addition (deletion) of a key regulatory protein domain encoded by the extra(missing) exon. We construct an isoform-level co-expression network to probe the regulatory changes this dynamical splicing induces and show that it crucially contributes to the emergence of extensive transcriptional cascades by creating and removing key gene-gene correlations and altering the modular structure of the network. Our results suggest that any gene-signature based drug synergy prediction algorithm must take into account alternative splicing in order to effectively characterize the novel pathways being activated in the synergistic drug-tumor interaction.   

...............................................................................................................................

Poster: P77
Spatial-temporal gene regulatory network of maize embryo and endosperm development

Wenwei Xiong, Montclair State University, United States
Chunguang Du, Montclair State University, United States

The study of maize embryo and endosperm has significant agricultural importance, but remains elusive because of a great number of involved genes and their complex interactions. To better understand the genetic control in maize seed development, we need to reveal the dynamic transcriptional regulatory relationships among transcription factors and their target genes quantitatively. Here we report our integrated regulatory network study using genome-wide spatiotemporal transcriptome RNA-Seq data of B73 maize seed development. Gene expression intensities at all stages were normalized and discretized into bins defined by the B-Spline functions. Then we calculated the entropy of each gene according to its respective distribution probabilities within each bin, which is also known as the marginal entropy. For each pair of genes, their joint distribution under the previously defined bins was taken into account to measure the joint entropy.

The mutual information between any two genes was defined as the sum of both marginal entropies subtracting their joint entropy, which indicates the mutual dependence between genes. Transcription factors are major regulators for gene expression, thus transmitting more information to target genes than to unrelated genes or information between non-regulatory genes. Greater mutual information usually suggests higher probability of dependence in general. However some indirect relationships can also contribute to mutual information as well. To avoid false positives caused by these indirect relationships in inferred transcriptional network, we employed relative importance of mutual information indicated by z-scores among all potential regulators and targets, premised on the sparse nature of biological networks. We compared the inferred gene regulatory network to known well-studied genes and found potential transcription factors and genes. We further conducted motif analysis within the same target gene groups. Since functional domains are often involved in transcriptional events, we searched the Pfam database for hits in our enriched set of genes. Network motifs were discovered from the number of edges a node connected to, as well as the topological patterns such as hierarchical structure and network hubs. There are 91 transcription factors and 1,167 genes present exclusively in seed development among an overall of 26,105 investigated genes. This work provides an in-depth dynamic view of the complex regulatory network in maize kernel development.

...............................................................................................................................

Poster: P78
Copy Number Variation Analysis with GROM-RD

Sean Smith, Rutgers University, United States
Joseph Kawash, Rutgers University, United States
Andrey Grigoriev, Rutgers University, United States

Copy number variants (CNVs), amplifications or deletions of genome segments, are important contributors to phenotypic variation. The advent of next-generation sequencing (NGS) has prompted read depth analysis as an essential tool for the detection of CNVs. However, the predictive capabilities of existing algorithms using genome read coverage are frequently hindered by various biases in NGS platforms. Additionally, imprecise breakpoint identification somewhat limits the utility of read depth tools. We describe GROM-RD, an algorithm that analyzes multiple biases in read coverage to detect CNVs in NGS data. After using existing GC bias correction methods we found lingering non-uniform variance across distinct GC regions and developed a novel approach to normalize such variance. By adjusting for repeat bias and using a two-pipeline masking approach GROM-RD is able to detect CNVs in complex and repetitive segments that otherwise complicate CNV detection, as well as improve sensitivity in less complicated regions. GROM-RD employs a CNV search using size-varying overlapping windows to improve breakpoint resolution, a typical weakness of RD methods. Compared to two widely used programs based on read depth methods, CNVnator and RDXplorer, GROM-RD showed improvements in CNV detection and breakpoint accuracy.

...............................................................................................................................

Poster: P79
Probabilistic modeling of multiple ChIP-Seq and open chromatin datasets enables discrimination of direct and indirect binding events

Artur Jaroszewicz, UCLA, United States
Jason Ernst, UCLA, United States

Chromatin ImmunoPrecipitation followed by high-throughput sequencing (ChIP-Seq) is an important assay in the study of gene regulation and epigenetics.  Unfortunately, this assay does not discriminate signal associated with a target being in direct contact with the DNA from indirect signal such as through protein-protein interactions or 3-dimensional looping interactions.  We present a method that attempts to discriminate between such cases called ChIPs n DIP (ChIP-seq Signal Prediction of Direct and Indirect Peaks).  This method takes as input ChIP-seq tracks for multiple targets and open chromatin data such as DNaseI hypersensitivity data and probabilistically models their joint signal to infer for each ChIP-seq track a relative probability of direct binding at each position. Specifically, we model the ChIP-seq signal at each position and each track as belonging to one of three binding types: direct, indirect, or none. The inferred probability of these events in the model depends not only on the signal of the target track, but also the chromatin and transcription factor binding context in which it occurs. We use the existence of DNA motif in direct binding predictions to evaluate the method.


top