ISMB/ECCB 2011 Late Breaking Research

19th Annual International Conference on
Intelligent Systems for Molecular Biology and
10th European Conference on Computational Biology

Late Breaking Research Presentation Schedule

LBR01 Sunday, July 17: 10:45 a.m. - 11:10 a.m.

Potential function of proteins encoded by chimeric transcripts
Room: Hall F1
Presenting author: Milana Frenkel-Morgenstern, Spanish National Cancer Research Centre (CNIO), Spain

Additional authors:
Milana Frenkel-Morgenstern, Spanish National Cancer Research Centre (CNIO), Spain
Iakes Ezkurdia, Spanish National Cancer Research Centre (CNIO), Spain
Michael Tress, Spanish National Cancer Research Centre (CNIO), Spain
Alfonso Valencia, Spanish National Cancer Research Centre (CNIO), Spain

Area Session Chair: Curtis Huttenhower

Abstract:
Chimeric RNAs are produced by trans-splicing of independent RNA transcripts. These RNAs have been reported in many organisms, and they have been identified when analyzing ensembles of ESTs and data from high-throughput paired-end RNA-seq. However, to what extent the resulting chimeric RNAs have been selected during evolution in order to design new functional proteins currently remains unclear.
An extensive review of the composition of the proteins translated from such chimeric transcripts in human, mouse and fruit fly, emphasized the mixture of the missing and complete functional domains of the proteins assembling the chimera. In many cases, such chimeric proteins contain transmembrane domains and signal peptides that can potentially change their cellular location, altering the corresponding cellular processes. By reviewing the protein domains preserved in these chimeras, one can hypothesize that those resulting from trans-splicing may produce proteins with novel functions that in a relevant number of cases can be associated with dominant negative phenotypes.
Indeed, we report intriguing examples of potential chimeric proteins the existence of which was confirmed in mass-spectrometry experiments in human, mouse and fly. The exploration of a number of chimeras encoded by complementary DNA strands in humans provides additional evidence of the potentially complex molecular isoforms that can be generated by alternative transcription.

Presentation PDF: Download Abstract

TOP

LBR02 Sunday, July 17: 11:15 a.m. - 11:40 a.m.

Identification, classification and functional role of viral protein kinases.
Room: Hall F1
Presenting author: Nidhi Tyagi, Indian Institute of Science, India

Additional authors:
Nidhi Tyagi, Indian Institution of Science, in
Narayanaswamy Srinivasan, Indian Institute of Science, in

Area Session Chair: Curtis Huttenhower

Abstract:
Protein kinases encoded by viral genomes play a major role in infection, replication and survival of viruses. Our study focuses on identification, classification and recognition of functional role(s) of various protein kinases from viruses. Using traditional sequence homology detection tools, sequence alignment methods and phylogenetic approach, protein kinases were recognized. 9,82,857 protein sequences from 53,315 viral genomes (including strains) have been used in this analysis. Protein kinases are identified using a combination of profile-based search methods such as PSI-BLAST and RPS-BLAST and HMMER approach. 470 protein kinase domains in 466 proteins from 241 viral genomes have been identified. We classified these viral protein kinases in 30 subfamilies. Remarkably, more than 50% of subfamilies are reported viral-specific as sequences from various eukaryotic/prokaryotic sources sharing homology with these sequences could not be identified. We also hint at their potential biological roles by functional annotation. Surprisingly, some of the largely populated subfamilies share homology with well characterized eukaryotic protein kinase groups such as tyrosine, tyrosine kinase-like, Casein kinase 1, CAML (Ca2+/calmodulin-dependant protein kinase), CMGC (Cyclin-dependant kinases (CDK), Mitogen activated kinases and Glycogen synthase kinases and CDK-like kinases). Phylogenetic analysis of viral kinases along with members from various eukaryotic protein kinase groups suggest close homologous relationship among them indicating horizontal gene transfer. Such viral kinases represent the mutated form of proteins from carcinogenic cells. Some viral kinase homologues participate in metabolic pathways raising concerns for host-pathogen interaction (host protein mimicking).

Presentation PDF: Download Abstract

TOP

LBR03 Sunday, July 17 : 11:45 a.m. - 12:10 p.m.

A structural and dynamical model of human telomerase
Room: Hall F1
Presenting author: Samuel Flores, Uppsala University, Sweden

Area Session Chair: Curtis Huttenhower

Abstract:
Mutations in the telomerase complex disrupt either nucleic acid binding or catalysis, and are the cause of numerous human diseases. Despite its importance, the structure of the human telomerase complex has not been observed crystallographically, nor are its dynamics understood in detail. Fragments of this complex from Tetrahymena thermophila and, more controversially,Tribolium castaneum have been crystallized. Biochemical probes provide important insight into dynamics. In this work we use available structural fragments to build a homology model of human TERT, and validate the result with functional assays. We then generate a trajectory of telomere elongation following a “typewriter” mechanism: the RNA template moves to keep the end of the growing telomere in the active site, disengaging after every 6-residue extension to execute a “carriage return” and go back to its starting position. A hairpin can easily form in the telomere, from DNA residues leaving the telomere-template duplex. The trajectory is consistent with available experimental evidence and suggests focused biochemical experiments for further validation.

Presentation PDF: Download Abstract

TOP

LBR04 Sunday, July 17: 4:00 p.m. - 4:25 p.m.

The multiple Specificity Landscape of Peptide Recognition Domains
Room: Hall F1
Presenting author: David Gfeller, Swiss Institute of Bioinformatics, Switzerland

Additional authors:
David Gfeller, Swiss Institute of Bioinformatics, Switzerland
Frank Butty, University of Toronto, Canada
Marta Wierzbicka, University of Toronto, Canada
Erik Verschueren, CRG-Centre de Regulacio Genomica, Spain
Peter Vanhee, CRG-Centre de Regulacio Genomica, Spain
Haiming Huang, University of Toronto, Canada
Andreas Ernst, University of Toronto, Canada
Nisa Dar, University of Toronto, Canada
Igor Stagljar, University of Toronto, Canada
Luis Serrano, CRG-Centre de Regulacio Genomica, Spain
Sachdev S Sidhu, University of Toronto, Canada
Gary D Bader, University of Toronto, Canada
Philip M Kim, University of Toronto, Canada

Area Session Chair: Curtis Huttenhower

Abstract:
Modular protein interaction domains form the building blocks of eukaryotic signaling pathways. Many of them, known as Peptide Recognition Domains, mediate protein interactions by recognizing short, linear amino acid stretches on the surface of their cognate partners with high specificity. Residues in these stretches are usually assumed to contribute independently to binding, which has led to a simplified understanding of protein interactions. Conversely, we observe in large binding peptide datasets that different residue positions display highly significant correlations for many domains in three distinct families (PDZ, SH3 and WW). These correlation patterns reveal a widespread occurrence of multiple binding specificity and give novel structural insights into protein interactions. We show that multiple specificity more accurately predicts protein interactions and experimentally validate some of the predictions for the human proteins DLG1 and SCRIB. Overall, our results reveal a rich specificity landscape in peptide recognition domains, suggesting new ways of encoding specificity in protein interaction networks.

Presentation PDF: Download Abstract

TOP

LBR05 Sunday, July 17: 2:30 p.m. - 2:55 p.m.

Investigating The Evolution of Novel Enzyme Function And Chemistry Within Structurally Defined Protein Superfamilies
Room: Hall F1
Presenting author: Nicholas Furnham, European Bioinformatics Institute, United Kingdom

Additional authors:
Nicholas Furnham, European Bioinformatics Institute, United Kingdom
Gemma Holliday, European Bioinformatics Institute, United Kingdom
Ian Sillitoe, University College London, United Kingdom
Alison Cuff, University College London, United Kingdom
Christine Orengo, University College London, United Kingdom
Janet Thornton, European Bioinformatics Institute, United Kingdom

Area Session Chair: Curtis Huttenhower

Abstract:
A significant proportion of gene products are annotated as having enzymatic functions, which, as biological catalysis, are essential for life. In addition, many of the targets of pharmaceutical drugs are acting to modify the behavior of enzymes. Thus, an understanding of how enzymes have evolved to undertake the wide variety of reactions they perform is essential to many studies in biology and medicinal chemistry. To unravel this problem requires the combination of protein three-dimensional structural, sequence, phylogenetic and chemistry information. We have combined this variety of data in an automatic pipeline for investigating enzyme functional evolution within structurally defined protein superfamilies. This has permitted us to analysis all enzymatic superfamilies cataloged by the CATH database. In addition to showing relationships between structures and sequences though phylogeny, we are able to show relationships of the small molecule metabolites the enzymes are acting on as well as similarities between the mechanisms by which an enzyme performs its overall reaction. This allows us to demonstrate the power of combining the range of information to show features across multiple superfamilies as well as unique qualities of specific enzyme superfamilies, thus providing a means to improve function prediction and contribute to the design of novel enzyme functions.

Presentation PDF: Download Abstract

TOP

LBR06 Sunday, July 17: 3:00 p.m. - 3:25 p.m.

Quantitative analysis of cellular composition in primary breast tumours deconvolutes molecular signatures and elucidates impact of tumour microenvironment
Room: Hall F1
Presenting author: Yinyin Yuan, Cambridge Research Institute, United Kingdom

Additional authors:
Yinyin Yuan, Cancer Research UK, United Kingdom

Area Session Chair: Curtis Huttenhower

Abstract:
A major obstacle in refining molecular cancer signatures is the heterogeneity of cellular composition in most tumour samples, which is a mixture of cancer, stromal, and immune cells. These cells form an integral part of the tumour microenvironment, but their contributions may obscure the “pure” cancer signal in genomic and transcriptomic data. We show how to extract cellular composition using an automated image analysis system and integrate it with DNA copy number and expression data. Quantifying the percentage of tumour cells allows improving the signal-to-noise ratio in copy-number profiles. We have applied our approach to 290 images of frozen breast tumour samples for which we have also obtained copy-number and gene-expression profiles. Our quantitative results are highly correlated to the qualitative results obtained by human experts on the same samples. On the other hand, we show that location of cancer cells allows distinguishing tumours of different complexity using spatial statistics. Also, quantifying the density of lymphocytes allows distinguishing cases with active immune system from cases with less active immune system. We show that lymphocyte density is predictive of prognosis in ER-negative subtype, which includes both an aggressive form of breast cancer but also an often over-treated patient subgroup.

Presentation PDF: Download Abstract

TOP

LBR07 Sunday, July 17: 3:30 p.m. - 3:55 p.m.

Modulators of microRNA Activity Regulate Glioblastoma Pathogenesis
Room: Hall F1
Presenting author: Pavel Sumazin, Columbia Univesity, United States

Additional authors:
Xuerui Yang, Columbia University, United States

Area Session Chair: Curtis Huttenhower

Abstract:
MicroRNAs (miRs) have been shown to drive pathogenesis and prognosis in cancer tumors, including glioblastoma, the most common and the most aggressive type of primary human brain tumor. MiR activity is modulated by miR-target abundance and by post-transcriptional factors that regulate miRISC-mediated mRNA degradation. MiR-activity modulators are potential key regulators of cancer, but the extent to which they regulate cancer remains unknown. We developed an algorithm to help assemble the repertory of tumor-specific miR-activity modulators as a step towards elucidating their regulatory effect on pathogenesis of cancer. We identified over a hundred glioblastoma miR-activity modulators, which activate or suppress miRISC-mediated regulation through protein interaction or act as miR decoys by titrating miRs away from other mRNAs. We identified dozens of oncogenes and tumor suppressors that are regulated by miR-activity regulators in glioblastoma, including master regulators RUNX1 and PTEN. We experimentally validated WIPF2 as a miRISC-mediated modulator of RUNX1, and PALB2 and WNT7A as miRISC-mediated modulators of PTEN. Moreover, a number of genes, including RUNX1, were identified as decoy modulators of PTEN. Interestingly, PTEN and its 7 modulators bear genetic alterations in 60% of 462 GBM samples. Both modulator silencing and PTEN-3’UTR transfection confirmed these 7 modulators as decoy modulators of PTEN, suggesting that genetic alterations at these genes post-transcriptionally regulate PTEN. In summary, our results suggest that miR modulators post-transcriptionally regulate the expression of master regulators of glioblastoma, thus playing a significant role in its tumorigenesis and progression.

Presentation PDF: Download Abstract

TOP

LBR08 Sunday, July 17: 4:00 p.m. - 4:25 p.m.

Post-transcriptional regulators of microRNA biogenesis regulate pathogenesis of cancer
Room: Hall F1
Presenting author: Wei-Jen Chung, Columbia University, United States

Additional authors:
Pavel Sumazin, Columbia University, United States

Area Session Chair: Curtis Huttenhower

Abstract:
MicroRNAs (miRs) have emerged as key regulators of both normal and pathologic phenotypes, including cancer, but fine grained regulation of their biogenesis is still poorly understood. In order to understand the extent and specificity of miR-biogenesis control, as well as the role of miR-biogenesis regulators in tumorigenesis and cancer progression, we set out to identify these regulators and profile their targets. We developed an algorithm for genome-wide inference of miR-biogenesis regulators, and identified regulators that are specific to individual miRs or miR families in glioma and ovarian cancer. Our algorithm identified known biogenesis regulators, including DGCR8, HNRNPA1, DDX5, LIN28 and SMAD family proteins, and predicted new miR-biogenesis regulators that are common to both cancers or specific to one tumor type. We validated miR-biogenesis regulators that are common to both cancers and target tumor- and prognosis-specific miRs, including oncomirs miR-218, miR-23b, and miR-155. We showed that miR-biogenesis regulators can act before or after cropping by DROSHA, and that they can alter expression of large sets of microRNAs. Our results suggest that miR biogenesis is a complex, context specific, and finely-regulated process, and that miR-biogenesis regulators may influence tumor initiation and progression by altering the expression of specific tumor-suppressor miRs and oncomirs or by modifying large miR programs.

Presentation PDF: Download Abstract

TOP

LBR09 Monday, July 18: 10:45 a.m. - 11:10 a.m.

SpliceGrapher: Predicting Splice Graphs from Diverse Evidence
Room: Hall F1
Presenting author: Asa Ben-Hur, Colorado State University, United States

Additional authors:
Asa Ben-Hur, Colorado State University, United States
Mark Rogers, Colorado State University, United States
Anireddy, S.N. Reddy, Colorado State University, United States

Area Session Chair: Predrag Radivojac

Abstract:
Deep transcriptome sequencing with next-generation sequencing (NGS) provides unprecedented opportunities for researchers to assess the extent of alternative splicing in many species. Although it is inexpensive and easy to obtain whole transcriptome data in this manner, one limitation has been the lack of versatile methods to analyze the data. We present a new method called SpliceGrapher, which is designed to enhance existing gene annotations on the basis of NGS reads and EST alignments. SpliceGrapher predicts splice graphs which are a compact representation of all the ways in which a gene's exons may be assembled. We demonstrate our approach using NGS read data from Arabidopsis and grape, and find that SpliceGrapher's predictions are better aligned with the existing annotations than those of other tools.

Presentation PDF: Download Abstract

TOP

LBR10 Monday, July 18: 11:15 a.m. - 11:40 a.m.

A Comprehensive Computational Model for Analyzing Gene Translation
Room: Hall F1
Presenting author: Tamir Tuller, Weizmann Institute of Science, Israel

Area Session Chair: Predrag Radivojac

Abstract:
We describe a model of gene translation that is based on all the physical and dynamical aspects of this process. The Whole-cell Simulation Translation Model (WSTM) predicts fundamental features of the translation process, including translation rates, protein abundance-levels, ribosomal densities and the relation between all these variables, better than alternative ('non-physical') approaches. In addition, we show that the WSTM model can be used for accurately inferring various variables that could not be inferred by previous predictors, such as genes’ initiation rates and cost of translation.
We find that increasing the number of available ribosomes (or equivalently the initiation rate) increases the genomic translation rate and the mean ribosome density only up to a certain point, beyond which both saturate. Strikingly, assuming that the translation system is tuned to work at the pre-saturation point leads to the best predictions of experimental data. This result may suggest that in all the organisms analyzed, ribosome allocation is optimized to the pre-saturation point. Remarkably, the gap between the performances the WSTM and the alternative predictors is strikingly large in the case of heterologous genes, testifying to the model’s promising biotechnological value in predicting the protein abundance of heterologous proteins before expressing them in the desired host. We find that the different local features of the coding sequence of a gene (codon bias, amino acid charge, and mRNA folding energy) affect its translation elongation (partial correlations of - 0.47, -0.22, and 0.31 respectively with ribosomal density; all p-values < 0.0015).

Presentation PDF: Download Abstract

TOP

LBR11 Monday, July 18: 11:45 a.m. - 12:10 p.m.

Differential expression with RNA-seq: a matter of Depth
Room: Hall F1
Presenting author: Ana Conesa, CIPF, Spain

Additional authors:
Ana Conesa, CIPF, Spain

Area Session Chair: Predrag Radivojac

Abstract:
We investigate the relationship between sequencing depth and differential expression detection in RNA-seq experiments. We show that current statistical approaches have a strong dependency of their differential expression calls on the amount of available reads and that this leads to a high number false discoveries specially of genes of short length, with small fold-change differences including off-target non coding genes. We present a novel methodology -NOISeq- which is robust to this kind of biases. NOISeq, by adopting an empirical approach to model the null distribution of differential expression, captures better the shape of noise in RNA-seq data and obtains more accurate and consistent results.

Presentation PDF: Download Abstract

TOP

LBR12 Monday, July 18: 12:15 p.m. - 12:40 p.m.

Inferring Gene Regulatory Networks from Expression Data using Tree-based Methods
Room: Hall F1
Presenting author: Van Anh Huynh-Thu, University of Liege, Belgium

Additional authors:
Van Anh Huynh-Thu, University of Liege, Belgium

Area Session Chair: Predrag Radivojac

Abstract:
One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray gene expression data. In this work, we present GENIE3, an original algorithm for the inference of GRNs. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene) is predicted from the expression patterns of all the other genes (input genes), using tree-based ensemble methods. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. The whole network is then reconstructed by aggregating putative links over all genes. Our method was evaluated in the context of the Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge, which is an annual international competition aiming at the evaluation of GRN inference algorithms on benchmarks of simulated and real data. The GENIE3 method was best performer on the DREAM4 In Silico Multifactorial challenge in 2009 and on the DREAM5 Network Inference challenge in 2010. In addition, the resulting method is simple and generic, making it adaptable to other types of genomic data and interactions.

Presentation PDF: Download Abstract

TOP

LBR13 Monday, July 18: 2:30 p.m. - 2:55 p.m.

Age matters: Functional annotation tailored to human development improves our understanding of fetal transcriptional profiles
Room: Hall F1
Presenting author: Donna Slonim, Tufts University, United States

Area Session Chair: Predrag Radivojac

Abstract:
Recent successes in detecting fetal RNA in amniotic fluid and in profiling both fetal DNA and RNA from maternal blood suggest novel, comprehensive, and minimally-invasive approaches to prenatal diagnosis and treatment. However, our ability to interpret transcriptional profiles meaningfully depends increasingly on prior knowledge of functionally related gene sets relevant to the subject of the study. We established the DFLAT (Developmental FunctionaL Annotation at Tufts) project to identify developmentally-relevant gene sets that my help us better interpret transcriptional profiles of the living human fetus. To assess the novel contributions of the current DFLAT collection of gene sets, we re-analyzed three of our previously-published data sets: one that identifies fetal RNA in blood from pregnant mothers carrying healthy, full-term fetuses, and two comparing amniotic fluid of common autosomal aneuploid pregnancies to controls (in Trisomy 21 and Trisomy 18). In all three cases, the DFLAT data has facilitated novel, valuable observations. In some cases the new results clarify and refine existing knowledge; in others, they suggest new, early molecular causes for well-studied consequences of common aneuploidies. These new results demonstrate that functional annotation focused on developmentally active proteins can indeed improve our ability to interpret developmental expression profiles.

Presentation PDF: Download Abstract

TOP

LBR14 Monday, July 18: 3:00 p.m. - 3:25 p.m.

Analysis of bacterial genes with disrupted ORFs reveals frequent utilization of Programmed Ribosomal Frameshifting (PRF) and Programmed Transcriptional Realignment (PTR).
Room: Hall F1
Presenting author: Pavel Baranov, University College Cork, Ireland

Additional authors:
Pavel Baranov, University College Cork, Ireland
Virag Sharma, University College Cork, Ireland

Area Session Chair: Predrag Radivojac

Abstract:
Bacterial genome annotations contain a number of CDSs that in spite of disruption(s) of initial reading frame encode a single continuous polypeptide. Such disruptions have different origins: sequencing errors, frameshift or stop codon mutations, as well as instances of unconventional decoding utilization (Recoding). We have extracted over one thousand CDSs with annotated disruptions and found out that about 75% of them can be clustered into 64 groups based on sequence similarity. Analysis of the clusters revealed deep phylogenetic conservation of ORF organization as well as presence of conserved sequence patterns that indicate likely utilization of the programmed ribosomal frameshifting (PRF) and the programmed transcriptional realignment (PTR). Further enrichment of these clusters with homologous nucleotide sequences revealed over six thousand candidate genes utilizing PRF or PTR. Analysis of the patterns of conservation apparently associated with non-triplet decoding revealed presence of both previously characterized frameshift-prone sequences and a few novel ones. Since the starting point of our analysis was a set of genes with already annotated disruptions, it is highly plausible that in this study we have identified only a fraction of all bacterial genes that utilize PRF and PTR. In addition to identification of a large number of alternatively decoded genes, a surprising observation as that about half of them are expressed via PTR - a mechanism that, in contrast with PRF, has not yet received substantial attention.

Presentation PDF: Download Abstract

TOP

LBR15 Monday, July 18: 3:30 p.m. - 3:55 p.m.

Metagenomic Biomarker Discovery and Explanation
Room: Hall F1
Presenting author: Nicola Segata, Harvard School of Public Health, United States

Additional authors:
Jacques Izard, The Forsyth Institute, United States
Levi Waldron, Harvard School of Public Health, United States
Dirk Gevers, Broad Institute, United States
Larisa Miropolsky, Harvard School of Public Health, United States
Wendy Garrett, Harvard School of Public Health, United States
Curtis Huttenhower, Harvard School of Public Health, United States

Area Session Chair: Predrag Radivojac

Abstract:
Understanding how and why biomolecular activity differs among environmental conditions or disease phenotypes is one of the central questions addressed by high-throughput biology. Biomarker discovery, the process of finding and explaining these differences, has proven to be both a methodological challenge in high-dimensional statistics and biologically challenging to interpret. Metagenomics provides a new avenue for biomarker discovery, since changes in the composition and functional activity of microbial communities can provide insight into ecological differences among communities or diagnostic or prognostic power when applied to the human microbiome. We propose the LDA Effect Size (LEfSe) algorithm to discover and explain microbial and functional biomarkers in the human microbiota and other microbiomes. We demonstrate this method to be effective in mining human microbiomes for metagenomic biomarkers associated with mucosal tissues and with different levels of oxygen availability. Similarly, when applied to 16S rRNA gene data describing a murine ulcerative colitis gut community, LEfSe confirms the key role played by Bifidobacterium in this disease and suggests the involvement of additional clades including the Clostridia and Metascardovia. Finally, we provide characterizations of microbial functional activity from metagenomic community sequencing, comparing environmental bacterial and viral microbiomes and distinguishing the infant gut microbiota from adult. A quantitative validation of LEfSe in comparison to existing microbial biomarker discovery methods and to standard statistical approaches (including synthetic data) highlights a lower false positive rate, consistent ranking of biomarkers’ relevance, and concise representations of taxonomic and functional shifts in microbial communities associated with environmental conditions or disease phenotypes.

Presentation PDF: Download Abstract

TOP

LBR16 Monday, July 18: 4:00 p.m. - 4:25 p.m.

The Human gut Ecosystem: Gut Microbiome and Host Transcriptome in Breast-fed vs. Formula-fed Infants
Room: Hall F1
Presenting author: Iddo Friedberg, Miami University, United States

Additional authors:
Iddo Friedberg, Miami University, United States
Scott Schwartz, Texas A&M University, United States
Ivan Ivanov, Texas A&M University, United States
Laurie Davidson, Texas A&M University, United States
Jennifer Goldsby, Texas A&M University, United States
David Dahl, Texas A&M University, United States
Edward Dougherty, Texas A&M University, United States
Damir Herman, University of Arkasas, United States
Sharon Donovan, University of Illinois Urbana-Champaign, United States
Robert Chapkin, Texas A&M University, United States

Area Session Chair: Predrag Radivojac

Abstract:
We present a study of the gut microbiome and gut transcriptome of breast-fed (BF) and formula-fed (FF) babies, at the age of three months. Here we have established the feasibility of non-invasive data collection to evaluate the impact of nutritional and other environmental exposures on the developing infant gut-microbial ecosystem. At the same time, we have developed a robust computational pipeline for revealing significant correlations between commensal microbiome and the host transcriptome. Two interesting findings are that Firmicutes and Actinobacter are bacterial phyla that differentially colonize FF and BF babies. Also, the gut microbiome of FF babies contains more matches to virulence genes. At the same time, FF babies seem to have a more activated innate immune system as evidenced by their gut transcriptome.
In the future, the statistical methodology presented here can be used to study other conditions that are associated with the gut microbiome both in infants and in adults, including obesity, inflammatory bowel diseases and colon cancer.

Presentation PDF: Download Abstract

TOP

LBR17 Tuesday, July 19: 10:45 a.m. - 11:10 a.m.

STAGR: Software To Annotate Genome Rearrangement
Room: Hall F1
Presenting author: Egor Dolzhenko, Princeton University, United States

Area Session Chair: Predrag Radivojac

Abstract:
Non-evolutionary sequence rearrangements arise in a number of different
biological contexts, including somatic recombination, chromatin diminution,
chromothrypsis, trans-splicing of RNA transcripts, and development of the
product somatic genome from the precursor germline genome in the
binucleate ciliates. The algorithm that we have developed is capable of
finding and annotating both the rearranged and nonrearranged pieces in
the precursor genome and product genome or transcriptome for all of these
cases, and may be useful for identifying novel rearrangements where no
specialized software exists. The rearranged sequence segments in the scrambled
precursor genome of the ciliate Oxytricha present the most challenging case for
annotating DNA rearrangements. In Oxytricha, our lab model organism, long
noncoding RNA templates help piece together and rearrange DNA segments
derived from the precursor genome, while deleting intervening regions and
as much as 95% of the precursor genome.

In the product genome, the product segments contain short overlapping
regions called pointers. The ability to map and annotate the locations
of product and precursor segments is critical in understanding the complex
developmental processes through which the precursor germline genome develops
into the mature, product genome. We have developed and implemented a method
for annotating the precursor and product genomes that reduces the problem to
finding a maximal weight matching in a bipartite graph.

Presentation PDF: Download Abstract

TOP

LBR18 Tuesday, July 19: 11:15 a.m. - 11:40 a.m.

Methods for Phylogenetic Inference of Multidomain Evolution
Room: Hall F1
Presenting author: Maureen Stolzer, Carnegie Mellon University, United States

Additional authors:
Maureen Stolzer, Carnegie Mellon University, United States
Minli Xu, Carnegie Mellon University, United States
Katherine Siewert, Carnegie Mellon University, United States
Benjamin Vernot, University of Washington, United States
Ravi Chinoy, Carnegie Mellon University, United States
Dannie Durand, Carnegie Mellon University, United States

Area Session Chair: Chad Myers

Abstract:
We present algorithms and software for phylogenetic analysis of multidomain protein families. These families evolve via domain shuffling; i.e., domain duplication, insertion, and deletion - processes not captured by current phylogenetic models. Phylogenetics is fundamental to evolutionary analysis, as well as biomedical applications such as function annotation, drug design, and model organism research. Phylogenetic methods for multidomain proteins are urgently needed because of their prevalence and their evolutionary and functional importance. Current phylogenetic methods cannot be applied to multidomain families with varied architectures. These methods rely on the implicit assumption that the entire sequence has the same evolutionary history. However, domains co-occurring in the same protein can have different histories; phylogenetic trees for these domains will have different topologies.

Our novel method exploits this phylogenetic incongruence to infer a most parsimonious history of domain duplications, losses, and insertions; the timing of these events; and the ancestral domain architectures. Key features of our approach include (1) explicit representation of domain shuffling events; (2) a model that captures sequence variation within domain families; and (3) algorithms to infer historical events and ancestral domain architectures from comparison of domain phylogenies.

We demonstrate the utility of our method with in-depth analyses of well-studied multidomain families. We further present a genome-scale analysis of all the domain families in human. Our results suggest that a remarkably greater amount of domain shuffling may have occurred than predicted by models that do not consider sequence variation and underscore the importance of accurate domain architecture reconstruction for homology-based function prediction.

Presentation PDF: Download Abstract

TOP

LBR19 Tuesday, July 19: 11:45 a.m. - 12:10 p.m.

Epistasis Detection in Complex Traits: An Application to Bipolar Disorder
Room: Hall F1
Presenting author: Michael Mooney, Oregon Health & Science University, United States

Additional authors:
Michael Mooney, Oregon Health & Science University, United States
Beth Wilmot, Oregon Health & Science University, United States
Shannon McWeeney, Oregon Health & Science University, United States

Area Session Chair: Chad Myers

Abstract:
In recent years, large genome-wide association studies have been successful in revealing a number of genes thought to play a role in complex diseases. Yet, in most cases, the genetic associations discovered have accounted for only a small portion of the “genetic component” of these diseases, as the analyses were focused primarily on the effects of individual SNPs. In an attempt to reveal a greater portion of the genetic component of one such disease, bipolar disorder, we have adapted a machine learning technique, known as a genetic algorithm (GA), to search for multi-locus associations in the context of a large-scale genome-wide association study (GWAS). The GA is guided by the structure of a gene interaction network and is able to find statistically significant multi-locus associations without performing a prohibitively high number of statistical tests. In a GWAS data set from the NIMH-sponsored Bipolar Genome Study, consisting of 2033 cases and 1420 controls, we found three SNP pairs that were jointly associated with the disease (p-values: 7.48E-10, 1.17E-7 and 3.26E-7). We are in the process of attempting to replicate these results in other data sets.

Presentation PDF: Download Abstract

TOP

LBR20 Tuesday, July 19: 12:15 p.m. - 12:40 p.m.

A rational approach for aiding directed evolution towards DNA specificity of site-specific recombinases
Room: Hall F1
Presenting author: Josephine Abi Ghanem, Biotechnology Center of the Technische Universität Dresden, Germany

Area Session Chair: Chad Myers

Abstract:
Cre recombinase has been recently evolved, by using substrate-linked protein evolution, to recognize a sequence present in the 5?-LTR and 3?-LTR (loxLTR) of an integrated HIV-1 provirus. In vitro evolution approaches are generally tedious and time consuming. In addition, they are prone to lead to a diversity of binding profiles with considerable plasticity and relaxed specificity. On the other hand, the Cre/loxP system has been intensively studied structurally. The bulk of information makes the Cre/loxP system an ideal case to dissect the molecular mechanisms responsible for the specificity in Cre/DNA complexes and to be able to accelerate engineering processes. In this work we make use of computational approaches together with large scale sequencing data obtained from in vitro evolution studies to engineer DNA specificity into a relaxed specificity recombinase. From a large set of sequenced clones we first generated a “consensus recombinase”, C16 that contains 16 conserved mutations. C16 shows a relaxed specificity recognizing loxP and loxLTR. By applying molecular modelling and simulation techniques, we generated a new recombinase, C18 that contains two additional residues in the N–terminal domain predicted to switch target specificity. Strikingly, C18 showed virtually no recombination on loxP, while fully recombining the loxLTR sequence, demonstrating that rational design linked to substrate-linked protein evolution can be used to design DNA specificity into evolved recombinases.

Presentation PDF: Download Abstract

TOP

LBR21 Tuesday, July 19: 2:30 p.m. - 2:55 p.m.

Co-expression network analysis reveals rewiring of maize transcriptome by domestication
Room: Hall F1
Presenting author: Roman Briskine, University of Minnesota, United States

Additional authors:
Roman Briskine, University of Minnesota, United States
Ruth Swanson-Wagner, University of Minnesota, United States
Robert Schaefer, University of Minnesota, United States
Peter Tiffin, University of Minnesota, United States
Nathan Springer, University of Minnesota, United States
Chad Myers, University of Minnesota, United States

Area Session Chair: Chad Myers

Abstract:
Traditional methods of comparative genomics exploit the abundance of sequence data to extend our knowledge about gene functions and evolutionary processes. However, gene expression profiling represents an attractive alternative for the development of complementary methods. One of the particularly interesting questions addressed by comparative genomics concerns the effects of domestication on wild progenitors. Standard sequence-based methods identified several thousand candidate genes that underlie the process of maize domestication from its wild ancestor, teosinte. Nevertheless, further research is necessary to narrow the list of candidates and extend the characterization of targeted genes. In this study, we built co-expression networks based on high-quality expression data from the diverse varieties of maize and teosinte. Global comparisons of the co-expression networks revealed evidence for transcriptome rewiring due to domestication. We also developed a novel bootstrapping approach to identify specific candidate genes that presumably were direct targets or downstream effects of selection during domestication. We show that our candidate list overlaps considerably with domestication genes identified through previous independent sequence-based analyses, an enrichment that was not obtained using existing approaches for comparative expression analysis. Our study demonstrates the utility of expression analysis applied to the question of maize domestication and provides new insights into robust methods for comparing expression patterns across species.

Presentation PDF: Download Abstract

TOP

LBR22 Tuesday, July 19: 3:00 p.m. - 3:25 p.m.

Scalable metabolic reconstruction for metagenomic data and the human microbiome
Room: Hall F1
Presenting author: Curtis Huttenhower, Harvard School of Public Health, United States

Additional authors:
Sahar Abubucker, Washington University, United States
Nicola Segata, Harvard School of Public Health, United States
Alyxandria Schubert, University of Michigan, United States
Beltran Rodriguez-Mueller, San Diego State University, United States
Jeremy Zucker, Broad Institute, United States
Human Microbiome Project consortium , , United States
Patrick Schloss, University of Michigan, United States
Dirk Gevers, Broad Institute, United States
Makedonka Mitreva, Washington University, United States
Curtis Huttenhower, Harvard School of Public Health, United States

Area Session Chair: Chad Myers

Abstract:
Microbial communities carry out the majority of biochemical activity on the planet, and they play integral roles in metabolism and immune homeostasis in the human microbiome. Here, we describe HUMAnN, a methodology for determining the functional and metabolic potential of a microbial community by inferring pathways present or absent and their relative abundances directly from short metagenomic reads. We validated this methodology using a collection of four synthetic metagenomes, accurately determining the presence and abundance of pathways and outperforming standard best-hit approaches. Finally, we analyzed 741 samples drawn from 7 body sites on 103 individuals as part of the Human Microbiome Project (HMP), demonstrating the scalability of our methodology and the critical importance of microbial metabolism in the human microbiome. Previous studies have found that no organisms are present in all body sites or individuals; conversely, we find that 19 of 220 pathways are confidently present in every HMP community, and 53 are present in at least 90% of samples. This demonstrates a degree of functional consistency that is lacking at the organismal level - who's there varies, but what they're doing is more constant. Conversely, the relative abundances of most pathways vary among body sites but not individuals, suggesting that community function is dictated by microbial environment and less strongly by the host. This does not yet speak to host genetics, environment, or disease, as the HMP comprises a normal baseline of healthy individuals; each of these represents an additional area for future studies of microbial community function.

Presentation PDF: Download Abstract

TOP