Late Breaking Research Presentation ScheduleLBR01 Sunday, July 17: 10:45 a.m. - 11:10 a.m. Potential function of proteins encoded by chimeric transcriptsRoom: Hall F1 Presenting author: Milana Frenkel-Morgenstern, Spanish National Cancer Research Centre (CNIO), Spain Additional authors: Milana Frenkel-Morgenstern, Spanish National Cancer Research Centre (CNIO), Spain Iakes Ezkurdia, Spanish National Cancer Research Centre (CNIO), Spain Michael Tress, Spanish National Cancer Research Centre (CNIO), Spain Alfonso Valencia, Spanish National Cancer Research Centre (CNIO), Spain Area Session Chair: Curtis Huttenhower Abstract: Chimeric RNAs are produced by trans-splicing of independent RNA transcripts. These RNAs have been reported in many organisms, and they have been identified when analyzing ensembles of ESTs and data from high-throughput paired-end RNA-seq. However, to what extent the resulting chimeric RNAs have been selected during evolution in order to design new functional proteins currently remains unclear. An extensive review of the composition of the proteins translated from such chimeric transcripts in human, mouse and fruit fly, emphasized the mixture of the missing and complete functional domains of the proteins assembling the chimera. In many cases, such chimeric proteins contain transmembrane domains and signal peptides that can potentially change their cellular location, altering the corresponding cellular processes. By reviewing the protein domains preserved in these chimeras, one can hypothesize that those resulting from trans-splicing may produce proteins with novel functions that in a relevant number of cases can be associated with dominant negative phenotypes. Indeed, we report intriguing examples of potential chimeric proteins the existence of which was confirmed in mass-spectrometry experiments in human, mouse and fly. The exploration of a number of chimeras encoded by complementary DNA strands in humans provides additional evidence of the potentially complex molecular isoforms that can be generated by alternative transcription. Presentation PDF: Download Abstract TOP LBR02 Sunday, July 17: 11:15 a.m. - 11:40 a.m. Identification, classification and functional role of viral protein kinases.Room: Hall F1 Presenting author: Nidhi Tyagi, Indian Institute of Science, India Additional authors: Nidhi Tyagi, Indian Institution of Science, in Narayanaswamy Srinivasan, Indian Institute of Science, in Area Session Chair: Curtis Huttenhower Abstract: Protein kinases encoded by viral genomes play a major role in infection, replication and survival of viruses. Our study focuses on identification, classification and recognition of functional role(s) of various protein kinases from viruses. Using traditional sequence homology detection tools, sequence alignment methods and phylogenetic approach, protein kinases were recognized. 9,82,857 protein sequences from 53,315 viral genomes (including strains) have been used in this analysis. Protein kinases are identified using a combination of profile-based search methods such as PSI-BLAST and RPS-BLAST and HMMER approach. 470 protein kinase domains in 466 proteins from 241 viral genomes have been identified. We classified these viral protein kinases in 30 subfamilies. Remarkably, more than 50% of subfamilies are reported viral-specific as sequences from various eukaryotic/prokaryotic sources sharing homology with these sequences could not be identified. We also hint at their potential biological roles by functional annotation. Surprisingly, some of the largely populated subfamilies share homology with well characterized eukaryotic protein kinase groups such as tyrosine, tyrosine kinase-like, Casein kinase 1, CAML (Ca2+/calmodulin-dependant protein kinase), CMGC (Cyclin-dependant kinases (CDK), Mitogen activated kinases and Glycogen synthase kinases and CDK-like kinases). Phylogenetic analysis of viral kinases along with members from various eukaryotic protein kinase groups suggest close homologous relationship among them indicating horizontal gene transfer. Such viral kinases represent the mutated form of proteins from carcinogenic cells. Some viral kinase homologues participate in metabolic pathways raising concerns for host-pathogen interaction (host protein mimicking). Presentation PDF: Download Abstract TOP LBR03 Sunday, July 17
: 11:45 a.m. - 12:10 p.m. A structural and dynamical model of human telomeraseRoom: Hall F1 Presenting author: Samuel Flores, Uppsala University, Sweden Area Session Chair: Curtis Huttenhower Abstract: Mutations in the telomerase complex disrupt either nucleic acid binding or catalysis, and are the cause of numerous human diseases. Despite its importance, the structure of the human telomerase complex has not been observed crystallographically, nor are its dynamics understood in detail. Fragments of this complex from Tetrahymena thermophila and, more controversially,Tribolium castaneum have been crystallized. Biochemical probes provide important insight into dynamics. In this work we use available structural fragments to build a homology model of human TERT, and validate the result with functional assays. We then generate a trajectory of telomere elongation following a “typewriter” mechanism: the RNA template moves to keep the end of the growing telomere in the active site, disengaging after every 6-residue extension to execute a “carriage return” and go back to its starting position. A hairpin can easily form in the telomere, from DNA residues leaving the telomere-template duplex. The trajectory is consistent with available experimental evidence and suggests focused biochemical experiments for further validation. Presentation PDF: Download Abstract TOP LBR04 Sunday, July 17: 4:00 p.m. - 4:25 p.m. The multiple Specificity Landscape of Peptide Recognition DomainsRoom: Hall F1 Presenting author: David Gfeller, Swiss Institute of Bioinformatics, Switzerland Additional authors: David Gfeller, Swiss Institute of Bioinformatics, Switzerland Frank Butty, University of Toronto, Canada Marta Wierzbicka, University of Toronto, Canada Erik Verschueren, CRG-Centre de Regulacio Genomica, Spain Peter Vanhee, CRG-Centre de Regulacio Genomica, Spain Haiming Huang, University of Toronto, Canada Andreas Ernst, University of Toronto, Canada Nisa Dar, University of Toronto, Canada Igor Stagljar, University of Toronto, Canada Luis Serrano, CRG-Centre de Regulacio Genomica, Spain Sachdev S Sidhu, University of Toronto, Canada Gary D Bader, University of Toronto, Canada Philip M Kim, University of Toronto, Canada Area Session Chair: Curtis Huttenhower Abstract: Modular protein interaction domains form the building blocks of eukaryotic signaling pathways. Many of them, known as Peptide Recognition Domains, mediate protein interactions by recognizing short, linear amino acid stretches on the surface of their cognate partners with high specificity. Residues in these stretches are usually assumed to contribute independently to binding, which has led to a simplified understanding of protein interactions. Conversely, we observe in large binding peptide datasets that different residue positions display highly significant correlations for many domains in three distinct families (PDZ, SH3 and WW). These correlation patterns reveal a widespread occurrence of multiple binding specificity and give novel structural insights into protein interactions. We show that multiple specificity more accurately predicts protein interactions and experimentally validate some of the predictions for the human proteins DLG1 and SCRIB. Overall, our results reveal a rich specificity landscape in peptide recognition domains, suggesting new ways of encoding specificity in protein interaction networks. Presentation PDF: Download Abstract TOP LBR05 Sunday, July 17: 2:30 p.m. - 2:55 p.m. Investigating The Evolution of Novel Enzyme Function And Chemistry Within Structurally Defined Protein SuperfamiliesRoom: Hall F1 Presenting author: Nicholas Furnham, European Bioinformatics Institute, United Kingdom Additional authors: Nicholas Furnham, European Bioinformatics Institute, United Kingdom Gemma Holliday, European Bioinformatics Institute, United Kingdom Ian Sillitoe, University College London, United Kingdom Alison Cuff, University College London, United Kingdom Christine Orengo, University College London, United Kingdom Janet Thornton, European Bioinformatics Institute, United Kingdom Area Session Chair: Curtis Huttenhower Abstract: A significant proportion of gene products are annotated as having enzymatic functions, which, as biological catalysis, are essential for life. In addition, many of the targets of pharmaceutical drugs are acting to modify the behavior of enzymes. Thus, an understanding of how enzymes have evolved to undertake the wide variety of reactions they perform is essential to many studies in biology and medicinal chemistry. To unravel this problem requires the combination of protein three-dimensional structural, sequence, phylogenetic and chemistry information. We have combined this variety of data in an automatic pipeline for investigating enzyme functional evolution within structurally defined protein superfamilies. This has permitted us to analysis all enzymatic superfamilies cataloged by the CATH database. In addition to showing relationships between structures and sequences though phylogeny, we are able to show relationships of the small molecule metabolites the enzymes are acting on as well as similarities between the mechanisms by which an enzyme performs its overall reaction. This allows us to demonstrate the power of combining the range of information to show features across multiple superfamilies as well as unique qualities of specific enzyme superfamilies, thus providing a means to improve function prediction and contribute to the design of novel enzyme functions. Presentation PDF: Download Abstract TOP LBR06 Sunday, July 17: 3:00 p.m. - 3:25 p.m. Quantitative analysis of cellular composition in primary breast tumours deconvolutes molecular signatures and elucidates impact of tumour microenvironmentRoom: Hall F1 Presenting author: Yinyin Yuan, Cambridge Research Institute, United Kingdom Additional authors: Yinyin Yuan, Cancer Research UK, United Kingdom Area Session Chair: Curtis Huttenhower Abstract: A major obstacle in refining molecular cancer signatures is the heterogeneity of cellular composition in most tumour samples, which is a mixture of cancer, stromal, and immune cells. These cells form an integral part of the tumour microenvironment, but their contributions may obscure the “pure” cancer signal in genomic and transcriptomic data. We show how to extract cellular composition using an automated image analysis system and integrate it with DNA copy number and expression data. Quantifying the percentage of tumour cells allows improving the signal-to-noise ratio in copy-number profiles. We have applied our approach to 290 images of frozen breast tumour samples for which we have also obtained copy-number and gene-expression profiles. Our quantitative results are highly correlated to the qualitative results obtained by human experts on the same samples. On the other hand, we show that location of cancer cells allows distinguishing tumours of different complexity using spatial statistics. Also, quantifying the density of lymphocytes allows distinguishing cases with active immune system from cases with less active immune system. We show that lymphocyte density is predictive of prognosis in ER-negative subtype, which includes both an aggressive form of breast cancer but also an often over-treated patient subgroup. Presentation PDF: Download Abstract TOP LBR07 Sunday, July 17: 3:30 p.m. - 3:55 p.m. Modulators of microRNA Activity Regulate Glioblastoma PathogenesisRoom: Hall F1 Presenting author: Pavel Sumazin, Columbia Univesity, United States Additional authors: Xuerui Yang, Columbia University, United States Area Session Chair: Curtis Huttenhower Abstract: MicroRNAs (miRs) have been shown to drive pathogenesis and prognosis in cancer tumors, including glioblastoma, the most common and the most aggressive type of primary human brain tumor. MiR activity is modulated by miR-target abundance and by post-transcriptional factors that regulate miRISC-mediated mRNA degradation. MiR-activity modulators are potential key regulators of cancer, but the extent to which they regulate cancer remains unknown. We developed an algorithm to help assemble the repertory of tumor-specific miR-activity modulators as a step towards elucidating their regulatory effect on pathogenesis of cancer. We identified over a hundred glioblastoma miR-activity modulators, which activate or suppress miRISC-mediated regulation through protein interaction or act as miR decoys by titrating miRs away from other mRNAs. We identified dozens of oncogenes and tumor suppressors that are regulated by miR-activity regulators in glioblastoma, including master regulators RUNX1 and PTEN. We experimentally validated WIPF2 as a miRISC-mediated modulator of RUNX1, and PALB2 and WNT7A as miRISC-mediated modulators of PTEN. Moreover, a number of genes, including RUNX1, were identified as decoy modulators of PTEN. Interestingly, PTEN and its 7 modulators bear genetic alterations in 60% of 462 GBM samples. Both modulator silencing and PTEN-3’UTR transfection confirmed these 7 modulators as decoy modulators of PTEN, suggesting that genetic alterations at these genes post-transcriptionally regulate PTEN. In summary, our results suggest that miR modulators post-transcriptionally regulate the expression of master regulators of glioblastoma, thus playing a significant role in its tumorigenesis and progression. Presentation PDF: Download Abstract TOP LBR08 Sunday, July 17: 4:00 p.m. - 4:25 p.m. Post-transcriptional regulators of microRNA biogenesis regulate pathogenesis of cancerRoom: Hall F1 Presenting author: Wei-Jen Chung, Columbia University, United States Additional authors: Pavel Sumazin, Columbia University, United States Area Session Chair: Curtis Huttenhower Abstract: MicroRNAs (miRs) have emerged as key regulators of both normal and pathologic phenotypes, including cancer, but fine grained regulation of their biogenesis is still poorly understood. In order to understand the extent and specificity of miR-biogenesis control, as well as the role of miR-biogenesis regulators in tumorigenesis and cancer progression, we set out to identify these regulators and profile their targets. We developed an algorithm for genome-wide inference of miR-biogenesis regulators, and identified regulators that are specific to individual miRs or miR families in glioma and ovarian cancer. Our algorithm identified known biogenesis regulators, including DGCR8, HNRNPA1, DDX5, LIN28 and SMAD family proteins, and predicted new miR-biogenesis regulators that are common to both cancers or specific to one tumor type. We validated miR-biogenesis regulators that are common to both cancers and target tumor- and prognosis-specific miRs, including oncomirs miR-218, miR-23b, and miR-155. We showed that miR-biogenesis regulators can act before or after cropping by DROSHA, and that they can alter expression of large sets of microRNAs. Our results suggest that miR biogenesis is a complex, context specific, and finely-regulated process, and that miR-biogenesis regulators may influence tumor initiation and progression by altering the expression of specific tumor-suppressor miRs and oncomirs or by modifying large miR programs. Presentation PDF: Download Abstract TOP LBR09 Monday, July 18: 10:45 a.m. - 11:10 a.m. SpliceGrapher: Predicting Splice Graphs from Diverse EvidenceRoom: Hall F1 Presenting author: Asa Ben-Hur, Colorado State University, United States Additional authors: Asa Ben-Hur, Colorado State University, United States Mark Rogers, Colorado State University, United States Anireddy, S.N. Reddy, Colorado State University, United States Area Session Chair: Predrag Radivojac Abstract: Deep transcriptome sequencing with next-generation sequencing (NGS) provides unprecedented opportunities for researchers to assess the extent of alternative splicing in many species. Although it is inexpensive and easy to obtain whole transcriptome data in this manner, one limitation has been the lack of versatile methods to analyze the data. We present a new method called SpliceGrapher, which is designed to enhance existing gene annotations on the basis of NGS reads and EST alignments. SpliceGrapher predicts splice graphs which are a compact representation of all the ways in which a gene's exons may be assembled. We demonstrate our approach using NGS read data from Arabidopsis and grape, and find that SpliceGrapher's predictions are better aligned with the existing annotations than those of other tools. Presentation PDF: Download Abstract TOP LBR10 Monday, July 18: 11:15 a.m. - 11:40 a.m. A Comprehensive Computational Model for Analyzing Gene TranslationRoom: Hall F1 Presenting author: Tamir Tuller, Weizmann Institute of Science, Israel Area Session Chair: Predrag Radivojac Abstract: We describe a model of gene translation that is based on all the physical and dynamical aspects of this process. The Whole-cell Simulation Translation Model (WSTM) predicts fundamental features of the translation process, including translation rates, protein abundance-levels, ribosomal densities and the relation between all these variables, better than alternative ('non-physical') approaches. In addition, we show that the WSTM model can be used for accurately inferring various variables that could not be inferred by previous predictors, such as genes’ initiation rates and cost of translation. We find that increasing the number of available ribosomes (or equivalently the initiation rate) increases the genomic translation rate and the mean ribosome density only up to a certain point, beyond which both saturate. Strikingly, assuming that the translation system is tuned to work at the pre-saturation point leads to the best predictions of experimental data. This result may suggest that in all the organisms analyzed, ribosome allocation is optimized to the pre-saturation point. Remarkably, the gap between the performances the WSTM and the alternative predictors is strikingly large in the case of heterologous genes, testifying to the model’s promising biotechnological value in predicting the protein abundance of heterologous proteins before expressing them in the desired host. We find that the different local features of the coding sequence of a gene (codon bias, amino acid charge, and mRNA folding energy) affect its translation elongation (partial correlations of - 0.47, -0.22, and 0.31 respectively with ribosomal density; all p-values < 0.0015). Presentation PDF: Download Abstract TOP LBR11 Monday, July 18: 11:45 a.m. - 12:10 p.m. Differential expression with RNA-seq: a matter of DepthRoom: Hall F1 Presenting author: Ana Conesa, CIPF, Spain Additional authors: Ana Conesa, CIPF, Spain Area Session Chair: Predrag Radivojac Abstract: We investigate the relationship between sequencing depth and differential expression detection in RNA-seq experiments. We show that current statistical approaches have a strong dependency of their differential expression calls on the amount of available reads and that this leads to a high number false discoveries specially of genes of short length, with small fold-change differences including off-target non coding genes. We present a novel methodology -NOISeq- which is robust to this kind of biases. NOISeq, by adopting an empirical approach to model the null distribution of differential expression, captures better the shape of noise in RNA-seq data and obtains more accurate and consistent results. Presentation PDF: Download Abstract TOP LBR12 Monday, July 18: 12:15 p.m. - 12:40 p.m. Inferring Gene Regulatory Networks from Expression Data using Tree-based MethodsRoom: Hall F1 Presenting author: Van Anh Huynh-Thu, University of Liege, Belgium Additional authors: Van Anh Huynh-Thu, University of Liege, Belgium Area Session Chair: Predrag Radivojac Abstract: One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray gene expression data. In this work, we present GENIE3, an original algorithm for the inference of GRNs. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene) is predicted from the expression patterns of all the other genes (input genes), using tree-based ensemble methods. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. The whole network is then reconstructed by aggregating putative links over all genes. Our method was evaluated in the context of the Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge, which is an annual international competition aiming at the evaluation of GRN inference algorithms on benchmarks of simulated and real data. The GENIE3 method was best performer on the DREAM4 In Silico Multifactorial challenge in 2009 and on the DREAM5 Network Inference challenge in 2010. In addition, the resulting method is simple and generic, making it adaptable to other types of genomic data and interactions. Presentation PDF: Download Abstract TOP LBR13 Monday, July 18: 2:30 p.m. - 2:55 p.m. Age matters: Functional annotation tailored to human development improves our understanding of fetal transcriptional profilesRoom: Hall F1 Presenting author: Donna Slonim, Tufts University, United States Area Session Chair: Predrag Radivojac Abstract: Recent successes in detecting fetal RNA in amniotic fluid and in profiling both fetal DNA and RNA from maternal blood suggest novel, comprehensive, and minimally-invasive approaches to prenatal diagnosis and treatment. However, our ability to interpret transcriptional profiles meaningfully depends increasingly on prior knowledge of functionally related gene sets relevant to the subject of the study. We established the DFLAT (Developmental FunctionaL Annotation at Tufts) project to identify developmentally-relevant gene sets that my help us better interpret transcriptional profiles of the living human fetus. To assess the novel contributions of the current DFLAT collection of gene sets, we re-analyzed three of our previously-published data sets: one that identifies fetal RNA in blood from pregnant mothers carrying healthy, full-term fetuses, and two comparing amniotic fluid of common autosomal aneuploid pregnancies to controls (in Trisomy 21 and Trisomy 18). In all three cases, the DFLAT data has facilitated novel, valuable observations. In some cases the new results clarify and refine existing knowledge; in others, they suggest new, early molecular causes for well-studied consequences of common aneuploidies. These new results demonstrate that functional annotation focused on developmentally active proteins can indeed improve our ability to interpret developmental expression profiles. Presentation PDF: Download Abstract TOP LBR14 Monday, July 18: 3:00 p.m. - 3:25 p.m. Analysis of bacterial genes with disrupted ORFs reveals frequent utilization of Programmed Ribosomal Frameshifting (PRF) and Programmed Transcriptional Realignment (PTR).Room: Hall F1 Presenting author: Pavel Baranov, University College Cork, Ireland Additional authors: Pavel Baranov, University College Cork, Ireland Virag Sharma, University College Cork, Ireland Area Session Chair: Predrag Radivojac Abstract: Bacterial genome annotations contain a number of CDSs that in spite of disruption(s) of initial reading frame encode a single continuous polypeptide. Such disruptions have different origins: sequencing errors, frameshift or stop codon mutations, as well as instances of unconventional decoding utilization (Recoding). We have extracted over one thousand CDSs with annotated disruptions and found out that about 75% of them can be clustered into 64 groups based on sequence similarity. Analysis of the clusters revealed deep phylogenetic conservation of ORF organization as well as presence of conserved sequence patterns that indicate likely utilization of the programmed ribosomal frameshifting (PRF) and the programmed transcriptional realignment (PTR). Further enrichment of these clusters with homologous nucleotide sequences revealed over six thousand candidate genes utilizing PRF or PTR. Analysis of the patterns of conservation apparently associated with non-triplet decoding revealed presence of both previously characterized frameshift-prone sequences and a few novel ones. Since the starting point of our analysis was a set of genes with already annotated disruptions, it is highly plausible that in this study we have identified only a fraction of all bacterial genes that utilize PRF and PTR. In addition to identification of a large number of alternatively decoded genes, a surprising observation as that about half of them are expressed via PTR - a mechanism that, in contrast with PRF, has not yet received substantial attention. Presentation PDF: Download Abstract TOP LBR15 Monday, July 18: 3:30 p.m. - 3:55 p.m. Metagenomic Biomarker Discovery and ExplanationRoom: Hall F1 Presenting author: Nicola Segata, Harvard School of Public Health, United States Additional authors: Jacques Izard, The Forsyth Institute, United States Levi Waldron, Harvard School of Public Health, United States Dirk Gevers, Broad Institute, United States Larisa Miropolsky, Harvard School of Public Health, United States Wendy Garrett, Harvard School of Public Health, United States Curtis Huttenhower, Harvard School of Public Health, United States Area Session Chair: Predrag Radivojac Abstract: Understanding how and why biomolecular activity differs among environmental conditions or disease phenotypes is one of the central questions addressed by high-throughput biology. Biomarker discovery, the process of finding and explaining these differences, has proven to be both a methodological challenge in high-dimensional statistics and biologically challenging to interpret. Metagenomics provides a new avenue for biomarker discovery, since changes in the composition and functional activity of microbial communities can provide insight into ecological differences among communities or diagnostic or prognostic power when applied to the human microbiome. We propose the LDA Effect Size (LEfSe) algorithm to discover and explain microbial and functional biomarkers in the human microbiota and other microbiomes. We demonstrate this method to be effective in mining human microbiomes for metagenomic biomarkers associated with mucosal tissues and with different levels of oxygen availability. Similarly, when applied to 16S rRNA gene data describing a murine ulcerative colitis gut community, LEfSe confirms the key role played by Bifidobacterium in this disease and suggests the involvement of additional clades including the Clostridia and Metascardovia. Finally, we provide characterizations of microbial functional activity from metagenomic community sequencing, comparing environmental bacterial and viral microbiomes and distinguishing the infant gut microbiota from adult. A quantitative validation of LEfSe in comparison to existing microbial biomarker discovery methods and to standard statistical approaches (including synthetic data) highlights a lower false positive rate, consistent ranking of biomarkers’ relevance, and concise representations of taxonomic and functional shifts in microbial communities associated with environmental conditions or disease phenotypes. Presentation PDF: Download Abstract TOP LBR16 Monday, July 18: 4:00 p.m. - 4:25 p.m. The Human gut Ecosystem: Gut Microbiome and Host Transcriptome in Breast-fed vs. Formula-fed InfantsRoom: Hall F1 Presenting author: Iddo Friedberg, Miami University, United States Additional authors: Iddo Friedberg, Miami University, United States Scott Schwartz, Texas A&M University, United States Ivan Ivanov, Texas A&M University, United States Laurie Davidson, Texas A&M University, United States Jennifer Goldsby, Texas A&M University, United States David Dahl, Texas A&M University, United States Edward Dougherty, Texas A&M University, United States Damir Herman, University of Arkasas, United States Sharon Donovan, University of Illinois Urbana-Champaign, United States Robert Chapkin, Texas A&M University, United States Area Session Chair: Predrag Radivojac Abstract: We present a study of the gut microbiome and gut transcriptome of breast-fed (BF) and formula-fed (FF) babies, at the age of three months. Here we have established the feasibility of non-invasive data collection to evaluate the impact of nutritional and other environmental exposures on the developing infant gut-microbial ecosystem. At the same time, we have developed a robust computational pipeline for revealing significant correlations between commensal microbiome and the host transcriptome. Two interesting findings are that Firmicutes and Actinobacter are bacterial phyla that differentially colonize FF and BF babies. Also, the gut microbiome of FF babies contains more matches to virulence genes. At the same time, FF babies seem to have a more activated innate immune system as evidenced by their gut transcriptome. In the future, the statistical methodology presented here can be used to study other conditions that are associated with the gut microbiome both in infants and in adults, including obesity, inflammatory bowel diseases and colon cancer. Presentation PDF: Download Abstract TOP LBR17 Tuesday, July 19: 10:45 a.m. - 11:10 a.m. STAGR: Software To Annotate Genome RearrangementRoom: Hall F1 Presenting author: Egor Dolzhenko, Princeton University, United States Area Session Chair: Predrag Radivojac Abstract: Non-evolutionary sequence rearrangements arise in a number of different biological contexts, including somatic recombination, chromatin diminution, chromothrypsis, trans-splicing of RNA transcripts, and development of the product somatic genome from the precursor germline genome in the binucleate ciliates. The algorithm that we have developed is capable of finding and annotating both the rearranged and nonrearranged pieces in the precursor genome and product genome or transcriptome for all of these cases, and may be useful for identifying novel rearrangements where no specialized software exists. The rearranged sequence segments in the scrambled precursor genome of the ciliate Oxytricha present the most challenging case for annotating DNA rearrangements. In Oxytricha, our lab model organism, long noncoding RNA templates help piece together and rearrange DNA segments derived from the precursor genome, while deleting intervening regions and as much as 95% of the precursor genome. In the product genome, the product segments contain short overlapping regions called pointers. The ability to map and annotate the locations of product and precursor segments is critical in understanding the complex developmental processes through which the precursor germline genome develops into the mature, product genome. We have developed and implemented a method for annotating the precursor and product genomes that reduces the problem to finding a maximal weight matching in a bipartite graph. Presentation PDF: Download Abstract TOP LBR18 Tuesday, July 19: 11:15 a.m. - 11:40 a.m. Methods for Phylogenetic Inference of Multidomain EvolutionRoom: Hall F1 Presenting author: Maureen Stolzer, Carnegie Mellon University, United States Additional authors: Maureen Stolzer, Carnegie Mellon University, United States Minli Xu, Carnegie Mellon University, United States Katherine Siewert, Carnegie Mellon University, United States Benjamin Vernot, University of Washington, United States Ravi Chinoy, Carnegie Mellon University, United States Dannie Durand, Carnegie Mellon University, United States Area Session Chair: Chad Myers Abstract: We present algorithms and software for phylogenetic analysis of multidomain protein families. These families evolve via domain shuffling; i.e., domain duplication, insertion, and deletion - processes not captured by current phylogenetic models. Phylogenetics is fundamental to evolutionary analysis, as well as biomedical applications such as function annotation, drug design, and model organism research. Phylogenetic methods for multidomain proteins are urgently needed because of their prevalence and their evolutionary and functional importance. Current phylogenetic methods cannot be applied to multidomain families with varied architectures. These methods rely on the implicit assumption that the entire sequence has the same evolutionary history. However, domains co-occurring in the same protein can have different histories; phylogenetic trees for these domains will have different topologies. Our novel method exploits this phylogenetic incongruence to infer a most parsimonious history of domain duplications, losses, and insertions; the timing of these events; and the ancestral domain architectures. Key features of our approach include (1) explicit representation of domain shuffling events; (2) a model that captures sequence variation within domain families; and (3) algorithms to infer historical events and ancestral domain architectures from comparison of domain phylogenies. We demonstrate the utility of our method with in-depth analyses of well-studied multidomain families. We further present a genome-scale analysis of all the domain families in human. Our results suggest that a remarkably greater amount of domain shuffling may have occurred than predicted by models that do not consider sequence variation and underscore the importance of accurate domain architecture reconstruction for homology-based function prediction. Presentation PDF: Download Abstract TOP LBR19 Tuesday, July 19: 11:45 a.m. - 12:10 p.m. Epistasis Detection in Complex Traits: An Application to Bipolar DisorderRoom: Hall F1 Presenting author: Michael Mooney, Oregon Health & Science University, United States Additional authors: Michael Mooney, Oregon Health & Science University, United States Beth Wilmot, Oregon Health & Science University, United States Shannon McWeeney, Oregon Health & Science University, United States Area Session Chair: Chad Myers Abstract: In recent years, large genome-wide association studies have been successful in revealing a number of genes thought to play a role in complex diseases. Yet, in most cases, the genetic associations discovered have accounted for only a small portion of the “genetic component” of these diseases, as the analyses were focused primarily on the effects of individual SNPs. In an attempt to reveal a greater portion of the genetic component of one such disease, bipolar disorder, we have adapted a machine learning technique, known as a genetic algorithm (GA), to search for multi-locus associations in the context of a large-scale genome-wide association study (GWAS). The GA is guided by the structure of a gene interaction network and is able to find statistically significant multi-locus associations without performing a prohibitively high number of statistical tests. In a GWAS data set from the NIMH-sponsored Bipolar Genome Study, consisting of 2033 cases and 1420 controls, we found three SNP pairs that were jointly associated with the disease (p-values: 7.48E-10, 1.17E-7 and 3.26E-7). We are in the process of attempting to replicate these results in other data sets. Presentation PDF: Download Abstract TOP LBR20 Tuesday, July 19: 12:15 p.m. - 12:40 p.m. A rational approach for aiding directed evolution towards DNA specificity of site-specific recombinasesRoom: Hall F1 Presenting author: Josephine Abi Ghanem, Biotechnology Center of the Technische Universität Dresden, Germany Area Session Chair: Chad Myers Abstract: Cre recombinase has been recently evolved, by using substrate-linked protein evolution, to recognize a sequence present in the 5?-LTR and 3?-LTR (loxLTR) of an integrated HIV-1 provirus. In vitro evolution approaches are generally tedious and time consuming. In addition, they are prone to lead to a diversity of binding profiles with considerable plasticity and relaxed specificity. On the other hand, the Cre/loxP system has been intensively studied structurally. The bulk of information makes the Cre/loxP system an ideal case to dissect the molecular mechanisms responsible for the specificity in Cre/DNA complexes and to be able to accelerate engineering processes. In this work we make use of computational approaches together with large scale sequencing data obtained from in vitro evolution studies to engineer DNA specificity into a relaxed specificity recombinase. From a large set of sequenced clones we first generated a “consensus recombinase”, C16 that contains 16 conserved mutations. C16 shows a relaxed specificity recognizing loxP and loxLTR. By applying molecular modelling and simulation techniques, we generated a new recombinase, C18 that contains two additional residues in the N–terminal domain predicted to switch target specificity. Strikingly, C18 showed virtually no recombination on loxP, while fully recombining the loxLTR sequence, demonstrating that rational design linked to substrate-linked protein evolution can be used to design DNA specificity into evolved recombinases. Presentation PDF: Download Abstract TOP LBR21 Tuesday, July 19: 2:30 p.m. - 2:55 p.m. Co-expression network analysis reveals rewiring of maize transcriptome by domesticationRoom: Hall F1 Presenting author: Roman Briskine, University of Minnesota, United States Additional authors: Roman Briskine, University of Minnesota, United States Ruth Swanson-Wagner, University of Minnesota, United States Robert Schaefer, University of Minnesota, United States Peter Tiffin, University of Minnesota, United States Nathan Springer, University of Minnesota, United States Chad Myers, University of Minnesota, United States Area Session Chair: Chad Myers Abstract: Traditional methods of comparative genomics exploit the abundance of sequence data to extend our knowledge about gene functions and evolutionary processes. However, gene expression profiling represents an attractive alternative for the development of complementary methods. One of the particularly interesting questions addressed by comparative genomics concerns the effects of domestication on wild progenitors. Standard sequence-based methods identified several thousand candidate genes that underlie the process of maize domestication from its wild ancestor, teosinte. Nevertheless, further research is necessary to narrow the list of candidates and extend the characterization of targeted genes. In this study, we built co-expression networks based on high-quality expression data from the diverse varieties of maize and teosinte. Global comparisons of the co-expression networks revealed evidence for transcriptome rewiring due to domestication. We also developed a novel bootstrapping approach to identify specific candidate genes that presumably were direct targets or downstream effects of selection during domestication. We show that our candidate list overlaps considerably with domestication genes identified through previous independent sequence-based analyses, an enrichment that was not obtained using existing approaches for comparative expression analysis. Our study demonstrates the utility of expression analysis applied to the question of maize domestication and provides new insights into robust methods for comparing expression patterns across species. Presentation PDF: Download Abstract TOP LBR22 Tuesday, July 19: 3:00 p.m. - 3:25 p.m. Scalable metabolic reconstruction for metagenomic data and the human microbiomeRoom: Hall F1 Presenting author: Curtis Huttenhower, Harvard School of Public Health, United States Additional authors: Sahar Abubucker, Washington University, United States Nicola Segata, Harvard School of Public Health, United States Alyxandria Schubert, University of Michigan, United States Beltran Rodriguez-Mueller, San Diego State University, United States Jeremy Zucker, Broad Institute, United States Human Microbiome Project consortium , , United States Patrick Schloss, University of Michigan, United States Dirk Gevers, Broad Institute, United States Makedonka Mitreva, Washington University, United States Curtis Huttenhower, Harvard School of Public Health, United States Area Session Chair: Chad Myers Abstract: Microbial communities carry out the majority of biochemical activity on the planet, and they play integral roles in metabolism and immune homeostasis in the human microbiome. Here, we describe HUMAnN, a methodology for determining the functional and metabolic potential of a microbial community by inferring pathways present or absent and their relative abundances directly from short metagenomic reads. We validated this methodology using a collection of four synthetic metagenomes, accurately determining the presence and abundance of pathways and outperforming standard best-hit approaches. Finally, we analyzed 741 samples drawn from 7 body sites on 103 individuals as part of the Human Microbiome Project (HMP), demonstrating the scalability of our methodology and the critical importance of microbial metabolism in the human microbiome. Previous studies have found that no organisms are present in all body sites or individuals; conversely, we find that 19 of 220 pathways are confidently present in every HMP community, and 53 are present in at least 90% of samples. This demonstrates a degree of functional consistency that is lacking at the organismal level - who's there varies, but what they're doing is more constant. Conversely, the relative abundances of most pathways vary among body sites but not individuals, suggesting that community function is dictated by microbial environment and less strongly by the host. This does not yet speak to host genetics, environment, or disease, as the HMP comprises a normal baseline of healthy individuals; each of these represents an additional area for future studies of microbial community function. Presentation PDF: Download Abstract TOP |