RSG POSTER ABSTRACTS - 1 through 20


Complete list of RSG Poster Abstracts (.pdf) - Click here.
...............................................................................................................................

Poster: P01

Distinct specificities of the androgen and glucocorticoid receptors revealed using feature-based recognition model analysis of SELEX data

Liyang Zhang, University of Iowa, United States
Gabriella Martini, Columbia University, United States
H. Tomas Rube, Columbia University, United States
Vincent D. Fitzpatrick, Columbia University, United States
Harmen J. Bussemaker, Columbia University, United States
Miles A. Pufall, University of Iowa, United States

The androgen (AR) and glucocorticoid (GR) nuclear hormone receptors are closely related transcription factors. They are believed to bind to DNA as homodimers with indistinguishable specificity through identical DNA binding surfaces, and yet each occupies distinct genomic loci to drive distinct gene expression programs. How this functional difference ensues is not well understood. Here, by combining SELEX-seq assays on the DNA binding domain of AR and GR with statistical modeling, we show that the intrinsic DNA binding preference of the two factors differ substantially. We present an iterative algorithm that can accurately quantify the free energy parameters of a biophysically motivated recognition model over DNA footprints of unprecedented length (~30bp) by fitting a feature-based generalized linear model. Use of this algorithm allows us to analyze contributions to the binding specificity well outside the 15bp core region. In these outer flanks AR, but not GR, shows a preference for poly-A sequences. Isothermal titration calorimetry measurements confirm the difference in intrinsic specificity, and point to an AR-specific enthalpy-driven binding mechanism that derives additional binding energy from a narrowed minor groove. Our analysis shows that this mode of recognition restricts AR from binding GR sites, although the converse is not true. This contrast provides a basis for the differential genomic occupancy exhibited by AR and GR in LnCaP cells, helping to explain the finding that GR can functionally substitute for AR in androgen independent prostate cancers. Taken together, our results demonstrate that differences in the intrinsic DNA binding specificity between closely related steroid hormone receptors exist and are functionally relevant. Our computational approach is general and widely applicable.

...............................................................................................................................

Poster: P02
Quantitative modeling of gene expression from sequence, using DNA shape-based model of binding sites

Pei-Chen Peng, University of Illinois at Urbana Champaign, United States
Saurabh Sinha, University of Illinois at Urbana Champaign, United States

Motivation: Prediction of gene expression levels driven by regulatory sequences is one of the major challenges of genomic biology. A major current focus in transcriptional regulation is sequence-to-expression modeling, which interprets the enhancer sequence in light of transcription factor concentrations and DNA binding specificities and predicts precise gene expression levels in varying cellular contexts. Such models have so far exclusively relied on the position weight matrix (PWM) model for transcription factor (TF)-DNA binding. Several reports have pointed out deficiencies in the PWM model and presented alternative models, including DNA shape-based models, that are claimed to be in greater agreement with TF-DNA binding data. However, it is not known if alternative models of DNA binding, such as DNA shape models, can also improve prediction of gene expression.

Results: Here, we adapted a statistical thermodynamics model to develop a quantitative model of gene expression interprets enhancer sequences using DNA shape features of binding sites, as opposed to a PWM-based scoring of sites. We used rigorous methods to evaluate the fits of expression readouts of more than 35 enhancers regulating spatial gene expression patterns in the blastoderm-stage Drosophila embryo, and we show that DNA shape-based models perform at least as well as, and arguably better than PWM-based models. We objectively characterized the relationship between DNA shape-based models and PWM models of binding site affinity, and observed that DNA shape features carry information that is complementary to the PWM and useful for sequence-to-expression modeling. In addition, we combined DNA shape and PWM into a single model and tested if it would achieve better predictions than using either binding model independently. The integrative model did not perform consistently better than either DNA shape or PWM based model alone.

Conclusion: Our work shows that quantification of TF binding site affinity using DNA shape is not only justified by binding affinity data, it is also effective in interpreting enhancer sequence to accurately predict gene expression. With the growing availability of data sets describing TF-DNA binding affinities comprehensively, we expect that it will be possible to train such models more accurately and to utilize them to better predict gene expression and the functional effects of single nucleotide polymorphisms in the non-coding genome.

...............................................................................................................................

Poster: P03
Estimating the Number and Diversity of Cancer Mutations In the Overall Population from 5,319 Complete Cancer Genomes

Prathik Naidu, Thomas Jefferson High School for Science and Technology, United States
Joseph Kaplinsky, Beth Israel Deaconess Medical Center, United States
Ramy Arnaout, Beth Israel Deaconess Medical Center and Harvard Medical School, United States

Cancer is a genetic disease. To understand the link between cancer and genetics, large-scale efforts such as The Cancer Genome Atlas (TCGA) have begun to catalog cancer-related mutations, using hundreds of samples across many cancer types and subtypes. However, because even this sample size is small compared to the tens of millions of people with cancer worldwide, there is a risk for substantial sampling bias: the number and diversity of cancer mutations in the sample may not reflect their number and diversity in the overall population. To address this issue, we used the Recon (Reconstruction of Estimated Communities from Observed Numbers) algorithm to estimate the overall number and diversity of coding and non-coding cancer mutations for 14 common cancer types, including breast cancer, prostate cancer, and glioblastoma, and four clinically important subtypes of breast cancer (e.g. luminal A, luminal B Her2, and Basal). Our results suggest that while most common mutations have been discovered, the majority (78,186 – 1,887,539) remain unknown. Interestingly, the number of undiscovered mutations is not obvious from observations for a given cancer. For example, although glioblastoma and prostate-cancer samples exhibit relatively few mutations, our results suggest that overall, glioblastoma—but not prostate cancer—is likely to have as many mutations as all breast cancers. Thus, our algorithm may reveal aspects of cancer that are not obvious from direct observations (e.g., the potential presence of genetically different subtypes).

...............................................................................................................................

Poster: P04
MiSL: a method for mining synthetic lethal partners of recurrent cancer mutations uncovers novel mutation-specific therapeutic targets

Subarna Sinha, Stanford University, United States
Daniel Thomas, Stanford University, United States
Yang Gao, University of California at Berkeley, United States
Steven Chan, Stanford University, United States
Diede Brunen, Netherlands Cancer Institute, Netherlands
Rene Bernards, Netherlands Cancer Institute, Netherlands
Ravindra Majeti, Stanford University, United States
David L. Dill, Stanford University, United States

Synthetic lethality, in which a single gene defect leads to dependency on a second gene that is otherwise not essential, is an attractive paradigm to identify targeted therapies for cancer-specific mutations. Current methods to detect synthetic lethal (SL) partners for somatic mutations rely on large-scale shRNA screens in cell-lines or use human orthologs of yeast SL interactions, both of which are not necessarily representative of primary tumors and have incomplete coverage.

We have developed MiSL, a novel Boolean implication-based algorithm that utilizes large pan-cancer patient datasets (mutation, copy number and gene expression) to identify SL partners for cancer mutations. The underlying assumption of our approach is that, across multiple cancers, SL partners of a mutation will be amplified more frequently or deleted less frequently, with concordant changes in expression, in primary tumor samples harboring the mutation. Pan-cancer analysis discovers robust biological relationships that are likely to be independent of cancer subtype and increases statistical power.

First, we sought to validate MiSL using existing knowledge and large-scale shRNA data. Consistent with prior knowledge, MiSL candidates for BRCA1 mutation (mut) in breast cancer were enriched for DNA repair genes (p=.0.006). We also found: (1) significant overlap (p=0.002) between leukemia IDH1mut MiSL candidates and essential genes in IDH1mut cells determined by a DECIPHER shRNA screen we performed in doxycycline-inducible IDH1(R132) THP-1 cells, and (2) for multiple mutations in colorectal cancer, MiSL candidates were enriched (p<0.05) with genes that were selectively essential in mutated colorectal cell-lines in Achilles data.

Secondly, we experimentally confirmed novel SL partners that are druggable in acute myeloid leukemia (AML) and breast cancer. MiSL predicted a novel SL interaction in AML between IDH1mut and ACACA, the rate-limiting enzyme that controls lipid biosynthesis. Consistent with our prediction, inhibition of ACACA with shRNA or a small molecule inhibitor TOFA prevented cell proliferation in IDH1mut (but not wildtype) AML cell-lines and primary blasts. MiSL also predicted that AKT1 is a SL partner of PIK3CAmut in breast cancer which we experimentally confirmed using 8 breast cancer lines. All four PIK3CAmut (but not wildtype) breast cancers were sensitive to AKT1 inhibition in viability and colony assays.

In conclusion, MiSL is a scalable computational solution that finds novel SL interactions. Using primary patient data allows it to capture in vivo tumor evolution, revealing SL interactions missed by existing methods. It can be widely applicable and can greatly accelerate novel target discovery for precision medicine in cancer.

...............................................................................................................................

Poster: P05
Tracking the Evolution of 3D Gene Organization

Alon Diament, Tel Aviv University, Israel
Tamir Tuller, Tel Aviv University, Israel

One of the most fundamental open biological questions is what determines the eukaryotic genomic organization. It has been shown that the distribution of genes in eukaryotic genomes is not random; however, formerly reported large scale relations between gene function and genomic organization were relatively weak.

Previous studies have demonstrated that codon usage bias is related to all stages of gene expression and to protein function. Here we apply a novel tool for assessing functional relatedness, codon usage frequency similarity (CUFS), which measures similarity between genes in terms of codon and amino acid usage. By analyzing Hi-C data, describing the three dimensional conformation of the DNA, we show that the functional similarity between genes captured by our metric is directly and very strongly correlated with their three dimensional (3D) distance in five eukaryotes (r > 0.74; p<1e-323 in all cases; Diament et al. Nature Commun. 2014).

We utilize this result to propose a novel approach for improving the accuracy of 3D genome reconstructions by introducing additional predicted physical interactions to the model, based on orthologous interactions in an evolutionary-related organism and based on predicted functional interactions between genes (e.g. based on CUFS). We demonstrate in the eukaryote S. cerevisiae that this approach indeed leads to the reconstruction of improved models (Diament et al. PloS Comput. Biol. 2015).

We have previously shown that some level of conservation of genomic organization exists between organisms. However, almost all studies of 3D genomic organization analyzed each organism independently from others. Here we propose a novel approach for inter-organismal analysis of the organization of genes. By utilizing Hi-C data from two fungi – S. cerevisiae and S. pombe – we detect orthologous gene families that underwent changes in their 3D co-localization during evolution. We show that this approach enables identifying various biologically meaningful modules of co-evolving genes with shared function (Diament et al. under-review 2015).

Our results emphasize the importance of three-dimensional genomic organization in eukaryotes and suggest that the evolutionary mechanisms that shape the 3D organization of genes are affected by their functionality and expression pattern. In addition, we provide novel algorithms for 3D genome reconstructions and for deciphering gene function and organization.

...............................................................................................................................

Poster: P06
From phenotypic to molecular synergy: A transcriptional study of the dynamics of drug combinations based on single drug responses

Mehmet Eren Ahsen, IBM Research and Icahn School of Medicine at Mount Sinai, United States
Jennifer E. L. Diaz, IBM Research and Icahn School of Medicine at Mount Sinai, United States
Xintong Chen, Icahn School of Medicine at Mount Sinai, United States
Bojan Losic, Icahn School of Medicine at Mount Sinai, United States
Gustavo Stolovitzky, IBM Research, United States

Drug combination therapies have proven to be good strategies in cancer treatment in that they may elicit less adverse effects than single drugs while overcoming the resistance to individual drugs that cancer cells tend to develop. Since screening all possible drug pairs is impractical, accurate methods for predicting synergistic drug combinations are needed. Attempts to predict the effect of drug combinations based on the transcriptional response of cells to single drugs have succeeded only partially because not enough data exists on how those transcriptional responses combine in the cellular environment. In this work we study the mechanisms whereby transcriptional responses combine to give rise to synergistic or additive responses to combined therapies. Specifically, we used RNAseq to study the transcriptional response over time (0, 3, 6, 9, 12, and 24 h) and for three drugs (A, B and C) and their combinations (AB, AC and BC) in MCF-7 breast cancer cells. Cell viability measurements show that one of the combinations (AB) is strongly synergistic, whereas the other two (AC and BC) are additive. The number of differentially expressed genes for the synergistic combination AB was at least one order of magnitude larger than the number of the differentially expressed genes resulting from each of the individual drugs A or B, and increased over time. For the additive combinations the number of differential expressed genes was about the same as for the single drugs, and was dominated by one of the drugs (C). To explain the massive transcriptional response of the synergistic combination, we extended the concept of additivity from the phenotypic to the transcriptional level. We found that most of the genes differentially expressed in AB but not in A nor B are non-additive. Using this information in the MCF-7 specific gene regulatory network we looked for transcriptional cascades that could explain the transcriptional program in AB based on that in A and B. We found that the majority of transcription factors that get activated at a given time point remain active at later time points. We studied how the activation of transcriptional regulators in A and B activate synergistic genes explaining much of the transcriptional response to AB. These analyses can pave the way for the design of algorithms to predict the response of cells to drug combinations based on RNAseq data from single drugs.

...............................................................................................................................

Poster: P07
Three-dimensional analysis of regulatory features reveals functional enhancer-associated loops

Yao Wang, University of Texas Health Science Center at San Antonio, United States
Junbai Wang, Oslo University Hospital – Norwegian Radium Hospital, Norway
Yufan Zhou, University of Texas Health Science Center at San Antonio, United States
Malaina Gaddis, University of Southern California, United States
Rohit Jadhav, University of Texas Health Science Center at San Antonio, United States
Xun Lan, Stanford University, United States
Tim Huang, University of Texas Health Science Center at San Antonio, United States
Shili Lin, The Ohio State University, United States
Peggy Farnham, University of Southern California, United States
Seth Frietze, University of Vermont, United States
Victor Jin, University of Texas Health Science Center at San Antonio, United States

Several critical gaps remain in our knowledge of the relationship of chromatin structure to gene regulation. These include 1) classifying different types of chromatin interactions (including promoter-enhancer contacts), 2) determining the relationships between classes of chromatin interactions and the epigenomic state, 3) deciphering the functional relevance of chromatin interactions, and 4) determining whether genes associated with different chromatin interaction classes are involved in disease.
To gain insight into the relationship between chromatin structure and gene expression, we conducted chromatin conformation analysis using PANC1 pancreatic cancer and MCF7 breast cancer cells. For PANC1, we carried out Tethered Chromatin Capture (TCC) on two biological replicates, and compared correlation between replicates to validate the data quality. For MCF7, we used both TCC and in situ Hi-C protocols on replicates and also performed correlation analysis. To analyze the 3D conformation in these two cancer cells, we first detected topologically associated domains (TADs) in each chromosome, then applied a novel Hi-C analysis algorithm and identified hundreds of thousands of Interacting Loci Pairs (ILPs) in each of the two cell types. We classified ILPs according to location with respect to gene structure, gene expression, different histone modifications, DNase hypersensitivity, and RNA polymerase II and CTCF binding. Interestingly, we found that a majority of ILPs are within a particular TAD, and only 5% of the ILPs are involved in promoter regions, with even fewer promoter-enhancer loops. To further explore the potential mechanism behind 3D conformation and gene expression, we conducted TCC on PANC1 treated with drug ICG001 known as a CBP inhibitor, and C646 known as a CBP/p300 competitor. We examined the changes of TADs and ILPs in the drug-treated PANC1, and the impact of pharmacological inhibition of histone acetylation on genes having promoter-enhancer loops in PANC1 cells. We find that genes associated with promoter-enhancer loops have cell-type-specific functional annotations. We further demonstrated that genes with promoter-enhancer loops altered expression in response to drug treatment in PANC1, suggesting that the chromatin loops we identified are functional. Taken together, our study provides insights into the interdependence of three-dimensional chromatin looping and gene expression mediated by enhancer-promoter interactions.

...............................................................................................................................

Poster: P08
Understanding Breast Cancer Heterogeneity through Personalized Drosophila Models

Jennifer Diaz, Icahn School of Medicine at Mount Sinai, United States
Avi Ma'Ayan, Icahn School of Medicine at Mount Sinai, United States
Ross Cagan, Icahn School of Medicine at Mount Sinai, United States

Triple negative breast cancer (TNBC) is a molecularly heterogeneous disease characterized by poor therapeutic response, low survival rates, and few druggable molecular targets. We aim to study this heterogeneity by examining complex genetic patient-specific models of TNBC in Drosophila.

Building genetic models of TNBC requires identifying the genes responsible for tumor progression. TNBCs are largely driven by genes with altered copy number status. Our analysis of breast cancer copy number data from The Cancer Genome Atlas (TCGA) has identified prioritized likely putative drivers specific to TNBC from over 8000 genes in amplified and deleted regions.

We will functionally screen these putative driver genes for enhancement of cell migration and tissue expansion in transformed tissue in Drosophila. In preliminary studies, these phenotypes accurately identified several known driver genes. The newly identified driver genes, along with known driver genes harboring mutations, will be used to construct a set of complex, personalized Drosophila models for ten TNBC patients in TCGA. To further guide the selection of key drivers for each model, we have developed novel Drosophila gene set enrichment tools, which identify key genes when applied to expression data from each patient. We will then use gene expression data to quantitatively track the accuracy of each model and select the best fit model for the patient’s actual tumor.

We are developing increasingly accurate personalized models through an iterative experimental-computational workflow. Each final personalized model has the potential to display unique properties, and we will use these models to study TNBC heterogeneity between patients. In each model, we will examine the extent of cell proliferation and migration, determine the signaling pathways responsible for these phenotypes by biochemical analysis and a chemical genetic screen, and measure response to standard-of-care chemotherapies. Our goal is to shed light on the molecular basis for patient-to-patient variability in survival and therapeutic response in TNBC.

...............................................................................................................................

Poster: P09
Creating a library of genome-wide chromatin state patterns during B lymphopoiesis

Mark Maienschein-Cline, University of Illinois at Chicago, United States
Pinal Kanabar, University of Illinois at Chicago, United States
Neil Bahroos, University of Illinois at Chicago, United States
Malay Mandal, University of Chicago, United States
Marcus Clark, University of Chicago, United States

B lymphopoiesis proceeds through several stages, during which the cell undergoes rearrangements in its antibody gene content that change the specificity of its antigen recognition mechanisms. This process forms a crucial underpinning of our adaptive immune system. In addition to changes in the antibody genetic content, these stages are also defined by a number of important epigenetic modulations. However, only a fraction of these are currently understood, often only in the context of specific transcription factors in certain developmental stages. Obtaining more general patterns of chromatin state on a broader, lymphopoiesis-wide context would thus provide an invaluable resource underpinning our understanding of epigenomics changes in B lymphopoiesis.

To this end, we consider nucleosome positioning and chromatin accessibility, which play an important role in determining regions of the genome that regulatory factors can interact with. Methodologies like ATAC-seq, DNase-seq, and FAIRE-seq can be employed to detect loci with open chromatin. In particular, ATAC-seq is a newly developed and particularly powerful tool, as it involves a simpler protocol that can be applied to smaller populations of cells; ATAC-seq can also be used to infer nucleosome positioning (the converse of open chromatin) if paired-end sequencing is used. Additionally, the activity of specific transcription factors can be inferred using bioinformatic techniques like motif enrichment, making open chromatin measurements a valuable basis for additional epigenetic studies.

We have profiled eight B lymphopoietic stages using ATAC-seq and describe the exciting preliminary results and analysis strategy here. We first identified regions of interest genome-wide from open chromatin enrichment, and then used patterns of chromatin state changes to separate the loci into functionally differentiated groups. In particular, we used an unsupervised clustering approach to discover clusters of loci with concordant chromatin state changes, and employed consensus clustering to determine the number of distinct patterns that can be identified reliably. In our data, we discovered 11 distinct patterns that describe changes in chromatin state across >100,000 differentiated loci. In addition to revealing important changes to the regulatory landscape across B lymphopoiesis, we believe that these patterns and the associated loci can be used as a valuable reference “library” of chromatin state for future B lymphopoiesis studies.

...............................................................................................................................

Poster: P10
Bringing big genomic data into focus for studying complex diseases in specific biological contexts

Arjun Krishnan, Princeton University, United States
Ran Zhang, Princeton University, United States
Victoria Yao, Princeton University, United States
Chandra Theesfeld, Princeton University, United States
Aaron Wong, Simons Foundation, United States
Alicja Tadych, Princeton University, United States
Natalia Volfovsky, Simons Foundation, United States
Alan Packer, Simons Foundation, United States
Alex Lash, Simons Foundation, United States
Olga Troyanskaya, Simons Foundation, United States

A big challenge in genomics is characterizing the genetic and functional dysregulation in complex diseases. Addressing this problem requires systematic computational approaches that can harness the explosion of data and bring ever-finer biological contexts into focus e.g. tissue, cell-type, sex and age. Towards this goal, we recently developed a Bayesian framework that integrates thousands of gene-expression, protein-interaction and regulatory-sequence datasets to predict tissue-specific functional relationships between genes in each of 144 specific human cell-types and tissues.

Here, using autism spectrum disorder (ASD) as an example, we demonstrate how tissue-specific networks provide a valuable apparatus for generating hypotheses about the molecular basis of human diseases. ASD has a strong genetic basis that remains poorly characterized by sequencing and quantitative genetics studies. Using an evidence-weighted machine learning approach that utilizes the human brain-specific functional gene network, we generated the first genome-wide prediction of autism-associated genes. These predictions were validated using an independent large case-control sequencing study. Leveraging these genome-wide predictions and the brain-specific network, our analyses demonstrate that the large set of ASD genes, including a host of novel candidates, converges on a smaller number of key cellular pathways and specific early developmental stages of the brain.

Manifesting in early development and being five times more common among boys than among girls, ASD is also an exemplar of diseases whose incidence or severity varies dramatically across the human lifespan and between the sexes. Therefore, our next goal lies in expanding our genomics toolkit to address age- and sex-specificity in addition to tissue/cell-type-specificity. we will conclude with preliminary results that demonstrate the promise of some of our approaches towards this goal.

...............................................................................................................................

Poster: P11
Nucleotide Sequence Composition Adjacent to Intronic Splice Sites Improves Splicing Efficiency and Reduces Translation Costs in Fungi

Zohar Zafrir, Tel Aviv University, Israel
Tamir Tuller, Tel Aviv University, Israel

RNA splicing is the central process of intron removal in eukaryotes known to regulate various cellular functions. The canonical sequence elements which are essential for intron recognition are well-known. However, the role of various sequence features affecting splicing efficiency, intronic retention, and translation regulation has yet to be thoroughly studied. Focusing on four fungi as model organisms (S. cerevisiae, S. pombe, A. nidulans, and C. albicans) we performed for the first time a comprehensive high resolution and large scale systems biology study, aimed at characterizing how splicing efficiency of introns and the crosstalk between gene splicing and translation are encoded in transcripts and affect their evolution. Our analysis suggests that pre-mRNA local folding strength at intronic boundaries is under selective pressure, as it directly affects splicing efficiency and improves recognition of intronic boundaries (Yofe* and Zafrir* et al., PLoS Genetic, 2014; Zafrir and Tuller, RNA, 2015). In addition, when considering the reading frame of exons upstream and adjacent to introns we find evidence of preference for intronic STOP codons close to the intronic 5’end and that the beginning of introns are selected for ‘codons’ with higher translation efficiency, presumably to reduce translation and metabolic costs in cases of non-spliced introns. Ribosomal profiling data analysis in S. cerevisiae supports the conjecture that in this organism intron retention frequently occurs; thus, introns are partially translated, and their translation efficiency affects organismal fitness (Zafrir and Tuller, under revision, 2015). These new discoveries are contributory steps towards a broader understanding of splicing regulation, mRNA translation, intron evolution, and the effect of silent mutations on gene expression and organismal fitness.

...............................................................................................................................

Poster: P12
Multi-omics learning and optimal experimental design for microbial organisms

Minseung Kim, University of California at Davis, United States
Navneet Rai, University of California at Davis, United States
Violeta Zorraquino, University of California at Davis, United States
Xiaokang Wang, University of California at Davis, United States
Ilias Tagkopoulos, University of California at Davis, United States

Accurate prediction of cellular and molecular state in novel environments is one of the grand challenges in modern biology. Despite the availability of omics profiles, it remains unclear how and at what degree their integration can train a predictive model, or how current datasets can guide which new conditions should be investigated. To address these challenges, we developed a framework of omics data integration, predictive modeling and optimal experimental design. We constructed a comprehensive Escherichia coli compendium specifically structured for efficient machine learning. The compendium integrates 4,389 profiles in multiple layers ranging from transcriptome, proteome, metabolome, fluxome, and phenome with in-depth characterization of profiling conditions by 612 features of strain genotypes, chemical composition of medium used, stresses exposed, and genetic perturbations. The compendium was undergone in multi-step procedure of preprocessing to correct for gene-level noises, batch-effects, and platform-biases. We used this resource to train a multi-scale statistical model that integrates four omics layers to predict expression levels of 4096 transcripts, 1001 proteins, 2382 metabolic fluxes and 356 metabolite concentrations as well growth dynamics. To guide future experimentation, we developed a methodology to identify experiments that optimally sample the experimental space and simultaneously decrease the uncertainty of the model. The proposed methodology takes into account two types of uncertainty in genome-scale prediction; prediction interval from bootstrapped RNNs and entropy estimated by Gaussian process. The genetic and environmental ontology that was reconstructed from the omics data is substantially different and complementary to the ontologies that are traditionally derived by using genetic and chemical information. Predictive performance (PCC) over novel conditions range from 0.54 to 0.87 for the various omics layers and their integration outperformed any single layer for growth rate prediction. Growth prediction of our model was particularly effective for novel wild type conditions (PCC=0.76). The efficacy of optimal experimental design was evaluated over 15 rounds of transcriptional profiling in novel conditions that resulted in a substantial improvement of the performance over alternative methods. The performance of genome-wide expression prediction for the condition space close to optimal conditions newly profiled was substantially improved after refinement (PCC from 0.41 to 0.61) and gradual decrease in uncertainty of the prediction model over the course of 15 rounds was significant than alternatives (P < 0.005). This work provides an integrative framework of omics-driven predictive modeling and experimentation that can be broadly applied to guide biological discovery.

...............................................................................................................................

Poster: P13
Comparison of Methods to Predict Impact of Regulatory Variants


Felix Yu, Johns Hopkins University, United States
Dongwon Lee, Johns Hopkins University, United States
Michael Beer, Johns Hopkins University, United States

The vast majority of sequence variants associated with common human disease are intergenic, enriched in open chromatin regions, and likely regulatory. To identify functional variants within GWAS associated LD blocks, we have developed a sequence-based model based on our gapped k-mer SVM (gkm-SVM) (Lee et al., Nature Genetics, 2015; Ghandi et al., PLOS Comp Biol 2014). This approach uses cell-type specific epigenetic data to train a gkm-SVM whose scoring function encodes the relative regulatory importance of individual sequence features in the disease relevant cell-type. The change in sequence feature scores induced by a regulatory variant determines its predicted impact, a score we call deltaSVM. We have shown that deltaSVM is roughly 10x more accurate at predicting dsQTLs than other methods (Kircher et al., Nat Gen 2014; Ritchie et al., Nat Meth 2014) and our previous kmer-SVM (Lee et al., Gen Res 2011). We have also used deltaSVM to predict the expression change in massively parallel reporter assays, which shows good agreement with high throughput datasets in mouse liver (Patwardhan et al., Nat Biotech 2012), K562 cells, and HepG2 cells (Kheradpour et al., Gen Res 2013). Here, we compare the accuracy of deltaSVM to other computational approaches, including PWMs, other kmer-based approaches, and deep neural networks (Zhou and Troyanskaya, Nat Meth 2015).

...............................................................................................................................

Poster: P14
High-throughput allele-specific expression across 250 environmental conditions

Gregory Moyerbrailean, Wayne State University, United States
Chris Harvey, Wayne State University, United States
Omar Davis, Wayne State University, United States
Adnan Alazizi, Wayne State University, United States
Donovan Watza, Wayne State University, United States
Yoram Sorokin, Wayne State University, United States
Karoline Pruder, Wayne State University, United States
Nancy Hauff, Wayne State University, United States
Xiaoquan Wen, University of Michigan, United States
Roger Pique-Regi, Wayne State University, United States
Francesca Luca, Wayne State University, United States

Adaptations to local environments have played major roles in shaping allele frequency distributions in human populations. Yet, a mismatch between genotype and environment may be responsible for higher disease risk. Recent studies have shown that GxE interactions can be detected when studying molecular phenotypes that are relevant for complex traits (e.g. infection response eQTLs in immune cells). Despite these relevant examples, the extent to which the environment can modulate genetic effects on quantitative phenotypes is still to be defined. Here we have taken a high-throughput approach to achieve a comprehensive characterization of GxE interactions in humans through allele-specific expression (ASE) analysis. To this end we have investigated the transcriptional response to 50 treatments in 5 different cell types (for a total of 250 cellular environments and 3 individuals per cell type). Across 56 cellular environments (cell type/treatment with large changes in gene expression) we discovered 6073 instances of ASE (FDR<10%), corresponding to 4310 unique genes. We found that in an individual sample, on average, 0.5% of genes with heterozygous SNPs are ASE genes. We observe that the majority of ASE is consistent across conditions ("shared" ASE), confirming previous conditional eQTL analyses. Overall, we find 248 loci with evidence for GxE interaction (conditional ASE), 120 with control-only ASE and 128 with treatment-only ASE genes. We used a multinomial generalized linear model with elastic net regularization (glmnet) to assess which factors influence the likelihood of conditional ASE. This model allows us to control for factors that may influence ASE and potential confounders (e.g., gene expression, cell type, treatment). Cell type seems to be an important factor for shared ASE: Melanocytes show a 30% increase in the probably of ASE, while LCLs show an 18% reduction. When we focus on treatment-only ASE, there are significant differences across treatments but these are largely explained by the changes in gene expression. For genes that are differentially expressed, each 2-fold increase in gene expression response corresponds to a 2.22-fold increase in the probability of treatment-only ASE. Finally, integrating our results with data from 18 traits from GWAS meta-analysis revealed enrichments for genes differentially expressed in specific treatments. For example, variants associated with Crohn's disease are enriched in genes that respond to aspirin in PBMCs and HUVECs, thus identifying candidate genes for aspirin aggravating effects on Crohn's symptoms.

...............................................................................................................................

Poster: P15
Evaluating Genetic Variation Impact on Transcription Factor Binding Sites


Wenqiang Shi, University of British Columbia, Canada
Oriol Fornes, University of British Columbia, Canada
Wyeth Wasserman, University of British Columbia, Canada

Current clinical sequence analysis focuses on exomes that highlight protein coding regions, despite awareness that cis-regulatory variations can cause human genetic disorders. Genome-wide association studies have identified thousands of disease-related variations most of which fall within cis-regulatory regions. Whole genome sequencing is now widely used in clinical genetics research, but the bioinformatics methods for the identification of functional regulatory changes are inadequate. The need to interpret and prioritize regulatory variations is becoming urgent for clinical genome analysis.

In this project, we focus on prioritizing variations likely to disrupt transcription factor binding sites in cis-regulatory elements. In developing the methods for cis-regulatory sequence analysis, we focus on differential transcription factor (TF) binding between two alleles distinguished by single nucleotide alterations. In ChIP-seq data, allele-specific binding (ASB) events, which indicate a TF selectively binds to one of two alleles at heterozygous positions, directly reveal the impact of cis-regulatory variation on TFBS within the same cellular context. We extracted ASB events from ENCODE ChIP-Seq data coupled with available WGS data in the corresponding cells. This key ASB reference collection exhibits a strong relationship between the predicted strength of TF-DNA interactions (as scored with position weight matrices (PWM)) and observed TF binding in vivo. DNase I accessibility differences between two alleles are also strongly associated with TF binding difference across multiple TFs and cells. In a TF-specific manner, cofactors can be quantitatively identified based on the differential overlap of cofactor ChIP-seq peaks between ASB and non-ASB events. Combining the available feature data, a classifier model trained to distinguish between ASB and non-ASB events achieves good accuracy (e.g. 78% for CTCF).

...............................................................................................................................

Poster: P16
Network model of normal gene expression predicts gene perturbation fold changes


Sudhir Varma, HiThru Analytics, United States

Gene expression exhibits a network effect whereby perturbations of some genes (in the form of siRNA knockdown or drug treatment in vitro or mutations, changes in methylation or aneuploidy in disease) influence the expression of downstream genes. Thus the downstream genes are simply responding to the dysregulation of the root-cause genes and are themselves not the source of the perturbation. If the expression (or fold change) of a gene is explained by the expression (or change in expression) of other genes, it becomes less likely to be the source of the perturbation. Conversely, genes with a large positive or negative difference between the predicted and actual expression show evidence of being the main drivers in the experimental or disease condition.

Using a set of 4277 normal samples from various organs compiled from public datasets, we have built a network where the expression of each gene (target gene) is modeled as a linear combination of a small number of other genes (source genes). A single model for each gene was fitted to samples from all organs. For each organ we tested the fit of the model using the correlation between the predicted and actual values. We used the network model to predict the fold changes using a set of 658 siRNA knockdown samples.

The network model predicts the expression of a median of 42% (20%-71%) of variable-expression genes (log expression range>0.5) in all of the organs with a correlation>0.80. On the siRNA knockdown samples, the model predicted the resulting fold changes of all genes with a median correlation of 0.31 across the samples (0.13-0.68).

We demonstrate that a single linear regression model (per gene) is sufficient to predict the expression of most genes for multiple organs. The relationships between source genes and target gene defines a network which is capable of quantitatively predicting the downstream effects of a perturbation. Conversely, the difference between the predicted and true expression in a disease sample points to possible root causes of the disease.

We have implemented a web tool for exploring the network predictions on a variety of disease samples (www.explainbio.com).

...............................................................................................................................

Poster: P17
Identifying condition specific transcription factor binding with ATAC-seq


Roger Pique-Regi, Wayne State University, United States
Donovan Watza, Wayne State University, United States
Molly Estill, Wayne State University, United States
Sophia Chaudhry, Wayne State University, United States
Francesca Luca, Wayne State University, United States

Specific regulatory sequences control gene transcription response when a cell is exposed to changes in the cellular environment (e.g. drug treatment). Recent technical advances in functional genomics have facilitated the profiling of regulatory sequences across many cell-types and tissues, yet we are still very far from mapping the sequences that control cell transcriptional response to many external stimuli. Profiling across different environmental conditions the binding activity of these TFs can be quickly accomplished at a genome-wide scale with the recently developed technique ATAC-seq, which utilizes the Tn5 transposase to fragment and tag accessible DNA. When coupled with a computational method such as CENTIPEDE, footprint models for TFs with known motifs can be generated across the genome to detect binding. To date, there are no methods that efficiently incorporate the information provided by paired-end sequencing which allows both the identification of the library fragment length as well as the two cleavage locations that generated the fragment. We have extended CENTIPEDE to utilize fragment length information to exploit the joint statistics of cleavage pairs. Our results indicate that paired-end sequencing provides a more informative footprint model for ATAC-seq libraries which leads to greater accuracy in predicting TF binding. These results were validated with ChIP-seq data (ENCODE Project) for multiple factors including CTCF, NRSF, NRF-1, and NFkB. We then assayed TF activity in lymphoblastoid cell-lines (LCLs) across multiple treatments (selenium, copper, retinoic acid and glucocorticoids) for which we previously determined significant differences in gene expression levels. From our initial sequencing results we were able to resolve 383 actively bound motifs across all conditions. We were also able to characterize 5236 regions that have significantly changed chromatin accessibility (FDR < 10%) in response to both copper and selenium. We have extended the CENTIPEDE model hierarchical prior to detect motifs that have differences in footprint activity in treatment vs. control experiments. For both metal ions we have detected a significant increase of binding for ETS and CRE motifs. Our results demonstrate that ATAC-seq together with an improved footprint model are excellent tools for rapid profiling of transcription binding factor activity to study cellular regulatory response to the environment.

...............................................................................................................................

Poster: P18
An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes


Bingqiang Liu, Shandong University, China
Qin Ma, South Dakota State University, United States

Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs (motifs for short) in orthologous regulatory regions of query genes from multiple genomes, based on the viewpoint that motifs tend to evolve slower than their surrounding non-functional sequences. However, the real power of this strategy is yet to be fully realized, as people still have obstacles in how to optimize the selection of orthologous data and how to effectively reduce the false positives in motif prediction. Here we present an integrative phylogenetic footprinting framework, named MP3, for prokaryotic genomes based on a new orthologous data preparation procedure and a novel promoter scoring and pruning method, in support of accurate motif predictions. Specifically, we collected orthology broadly from all prokaryotic genuses and building the orthologous regulatory regions based on sequence similarity of promoter regions, which not only fully made use of the large-scale genomic data and taxonomy information but also filtered the promoters with limited contribution out, thus can produce the high quality reference set. On the other hand, the promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools, which mine motif candidates as many as possible and eliminate the effect of random noise simultaneously. We have applied the framework to Escherichia coli k12 genome and get the prediction performance evaluated by comparing with seven existing programs. This evaluation is carried out in a systematic way both in nucleotide level and binding site level by adopting the benchmark method proposed by Tompa, along with additional statistical measurement. The results showed that MP3 performs better with 98% and 88% improvement in Performance Coefficient and Correlation Coefficient in nucleotide level over MDscan, which is the best one of other tools. In binding sites level, MP3 outperforms MDscan by 60% in F-Score and 46% in Average Site Performance. Most importantly, we have integrated this phylogenetic footprinting framework into our motif identification and analysis server DMINDA, through which the users can efficiently identify and analyze motifs for any prokaryotic genes.

...............................................................................................................................

Poster: P19
The CoGAPS matrix factorization algorithm infers feedback mechanisms from therapeutic inhibition of EGFR that increases expression of growth factor receptors

Elana J. Fertig, Johns Hopkins University, United States
Hiroyuki Ozawa, Department of Otorhinolaryngology-Head and Neck Surgery, Keio University School of Medicine, Japan
Manjusha Thakar, Johns Hopkins University, United States
Jason Howard, Johns Hopkins University, United States
Gabriel Krigsfield, Johns Hopkins University, United States
Alexander V. Favorov, Johns Hopkins University, United States
Daria A. Gaykalova, Johns Hopkins University, United States
Michael F. Ochs, The College of New Jersey, United States
Christine H. Chung, Moffitt Cancer Center & Research Institute, United States

Next generation sequencing technologies open a door for a precise personalized medicine. Thus, patients with oncogene driven tumors are currently treated with targeted therapeutics such as EGFR inhibitors. However, drug interactions with other activated signaling pathways in treated tumors often alter predicted therapeutic response. Therefore, bioinformatics algorithms are needed to infer unanticipated molecular interactions from anticipated molecular response to targeted therapeutics in diverse genetic backgrounds. To model heterogeneous genetic backgrounds in HNSCC, we use HaCaT cells with forced overexpression of EGFR, HRAS, and PIK3CA. Previously, the CoGAPS matrix factorization algorithm was shown to infer the specific signaling pathways that were activated in these HaCaT knock-in constructs from gene expression data. In this study, we evaluated whether CoGAPS could also delineate unanticipated signaling changes from anticipated cellular signaling response caused by targeted therapeutic in diverse genetic backgrounds. To test this hypothesis, we measured gene expression after treating the modified HaCaT cells with three EGFR targeted agents (gefitinib, cetuximab and afatinib) for 24 hours. The CoGAPS matrix factorization algorithm distinguished a gene expression signature associated with the anticipated silencing of the EGFR network and a signature associated with unanticipated transcriptional feedback in HaCaT constructs that were sensitive to EGFR inhibitors. Notably, the feedback signature showed that EGFR gene expression itself increased in cells that were responsive to EGFR inhibitors. The CoGAPS algorithm further associated such feedback with increased expression of several growth factor receptors by the AP-2 family of transcription factors. Once transcribed, these growth factor receptors may ultimately compensate for EGFR inhibition in these sensitive cells. Our data suggest, that CoGAPS gene expression signatures delineate on and off target effects of drugs related to therapeutic sensitivity in diverse genetic backgrounds.

...............................................................................................................................

Poster: P20
A network approach to monitor progression of treatment in tuberculosis


Awanti Sambarey, Indian Institute of Science, India
Abhinandan Devaprasad, Indian Institute of Science, India
Nagasuma Chandra, Indian Institute of Science, India

Tuberculosis remains one of the leading causes of mortality due to an infectious agent, affecting ~9 million people each year and resulting in about 1.5 million deaths annually. Delayed diagnosis, presence of co-morbidities and emerging drug resistance further compound the problem and underscore the need to gain mechanistic insights into the host response to infection and treatment. Unraveling correlates of successful host response and outcome of therapy are essential for the development of improved therapeutic strategies. The host response to tuberculosis is multifaceted, complex and dynamic, involving intricate cross-talk among several processes occurring simultaneously across different host compartments, and it becomes important to address the underlying complexity in these interdependent molecular networks in order to elucidate the relationship between molecular origins of disease and the manifested phenotype, thereby necessitating a systems approach.

Despite the increased availability of host genome-scale omics data in infection and over the course of treatment in tuberculosis, the molecular end points of therapy have not been clearly elucidated. While transcriptomics data illuminates the differential regulation of genes in individual patients, the cause and consequences of such differential regulation are still not understood. In this study, we have used network-based approaches, together with gene expression data to capture inter- and intra-cellular communication in the host and identify markers that can predict treatment prognosis. We first construct a comprehensive genome-scale network of host processes comprising 11,017 proteins and 1,51,645 interactions. We then integrate genome-wide gene expression data to generate dynamic response networks that monitor progress of therapy in tuberculosis over multiple weeks. Through weighted shortest path analysis we have identified molecular processes that are differentially regulated over the course of treatment, highlighting the importance of host signaling processes and lipid metabolism in governing outcome of therapy. By shifting the focus from individual genes to pathway-based analysis, network-based studies help illuminate the effects of local changes on the global system, and can aid in modifying therapeutic design for effective tuberculosis treatment.


top