Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide


Evolution and Comparative Genomics

COSI Track Presentations

Attention Presenters - please review the Speaker Information Page available here
Schedule subject to change
Monday, July 9th
10:15 AM-10:20 AM
EvolCompGen: Introduction
Room: Columbus CD
10:20 AM-10:40 AM
Proceedings Presentation: An evolutionary model motivated by physico-chemical properties of amino acids reveals variation among proteins
Room: Columbus CD
  • Edward L. Braun, University of Florida, United States

Presentation Overview: Show

Motivation: The relative rates of amino acid interchanges over evolutionary time are likely to vary among proteins. Variation in those rates has the potential to reveal information about con-straints on proteins. However, the most straightforward model that could be used to estimate relative rates of amino acid substitution is parameter-rich and it is therefore impractical to use for this purpose.
Results: A six-parameter model of amino acid substitution that incorporates information about the physicochemical properties of amino acids was developed. It showed that amino acid side chain volume, polarity, and aromaticity have major impacts on protein evolution. It also revealed variation among proteins in the relative importance of those properties. The same general ap-proach can be used to improve the fit of empirical models, like the commonly used PAM and LG models.
Availability: Perl code and test data are available from https://github.com/ebraun68/sixparam

10:40 AM-10:50 AM
Spatial Patterns of Substitutions in Bacterial Genomes
Room: Columbus CD
  • Daniella F Lato, McMaster University, Canada

Presentation Overview: Show

Increasing evidence supports the notion that different regions of a genome can have distinct molecular properties. This variation is abundant in bacterial genomes where gene expression and essentiality decrease as genomic positions become further from the origin of replication and mutation rate increases. There is limited research on the variation in molecular trends, like substitution rate, between the assortment of genomic structures, such as linear, circular and multi-repliconic genomes. Here, we mapped extant and ancestral substitutions to the phylogeny's of Escherichia coli, Bacillus subtilis, Streptomyces, and Sinorhizobium meliloti, quantifying how many substitutions were at each extant and ancestral position of the genome. Preceding work indicates that the number of substitutions should increase as distance from the origin increases. Our analysis instead, demonstrates that the number of substitutions decreases when moving away from the origin of replication in most of the bacteria analyzed. The bacteria that did not follow this trend was pSymB of S.meliloti and the chromosome of Streptomyces which had the number of substitutions increase with increasing distance from the origin. We aim to explore how these trends affect the functional categories of genes, their placement within the genome, and how substitutions may affect the fitness of an individual.

10:50 AM-11:00 AM
Tracing the Ancestry of Operons in Bacteria
Room: Columbus CD
  • Huy Nguyen, Iowa State University, United States
  • Ashish Jain, Iowa State University, United States
  • Oliver Eulenstein, Iowa State University, United States
  • Iddo Friedberg, Iowa State University, United States

Presentation Overview: Show

Complexity is a fundamental attribute of life. Complex systems are made of parts that together perform functions that a single component, or most subsets containing individual components, cannot. Examples of complex molecular systems in bacteria include protein structures such as the F1F0-ATPase, the ribosome, or the flagellar motor: each one of these structures requires most or all of its components to function properly. At the molecular level, operons are a classic example of a complex system. An operon’s genes are co-transcribed under the control of a single promoter to a polycistronic mRNA molecule, with its gene products forming molecular complexes or metabolic pathways. With the large number of complete bacterial genomes available, we now have the opportunity to examine the evolution of operons and identify possible intermediate states. In this work, we develop a simple vertical evolution model of how operons evolve from individual component genes and orthologous gene blocks or orthoblocks. Utilizing this model, we present two algorithms to reconstruct ancestral operon states using maximum parsimony. Having reconstructed ancestral states, we identify intermediate functional forms and possible exaptations in reconstructed ancestors of operons.

11:00 AM-11:10 AM
A novel MMR pathway in prokaryotes
Room: Columbus CD
  • Ana Maria Rojas, CSIC-IBIS, Spain
  • Jesus Blazquez, CNB-CSIC, Spain

Presentation Overview: Show

Mismatch repair pathway (MMR) is essential to maintain genome stability. While MutS and MutL are essential for performing the initial and steps of the route, those are missing in many Archaea, most Actinobacteria, and other prokaryotes. However, these organisms exhibit similar spontaneous mutation rates to those bearing the MMR proteins.

We have reported NucS, as an endonuclease involved in Mismatch repair (MMR) with no structural homology to known MMR factors. By genetic screenings we found [1] that this protein is required for mutation avoidance and anti-recombination, hallmarks of the canonical MMR in the surrogate model Mycobacterium smegmatis, lacking classical MutS-MutL factors.

Structural bioinformatics coupled to evolutionary studies of NucS indicate a complex making-up of the pathway that involved at least two horizontal gene transfers leading to a disperse distribution pattern in prokaryotes.
Together, these findings indicate that distinct pathways for MMR have evolved at least twice in nature.

11:10 AM-11:20 AM
Generalist species drive microbial dispersion and evolution.
Room: Columbus CD
  • Sira Sriswasdi, The University of Tokyo, Japan
  • Ching-Chia Yang, The University of Tokyo, Japan
  • Wataru Iwasaki, The University of Tokyo, Japan

Presentation Overview: Show

Microbes form fundamental bases of every Earth ecosystem. As their key survival strategies, some microbes adapt to broad ranges of environments, while others specialize to certain habitats. While ecological roles and properties of such "generalists" and "specialists" had been examined in individual ecosystems, general principles that govern their distribution patterns and evolutionary processes have not been characterized. Here, we thoroughly identified microbial generalists and specialists across 61 environments via meta-analysis of community sequencing data sets and reconstructed their evolutionary histories across diverse microbial groups using the Binary-State Speciation and Extinction (BiSSE) model. This revealed that generalist lineages possess 19-fold higher speciation rates and significant persistence advantage over specialists. Yet, we also detected three-fold more frequent generalist-to-specialist transformations than the reverse transformations. These results support a model of microbial evolution in which generalists play key roles in introducing new species and maintaining taxonomic diversity.

11:20 AM-11:30 AM
Evolution of the pangenome and core genome in prokaryotes
Room: Columbus CD
  • Itamar Sela, NCBI/NIH, United States
  • Yuri I Wolf, NCBI/NIH, United States
  • Eugene Koonin, NIH, United States

Presentation Overview: Show

Comparative analyses of complete genomes reveal variations in genomic content, even among closely related organisms. Notably, genome similarity decays with evolutionary distance. Any cluster of genomes is therefore associated with a pangenome and a core-genome, which are the total repertoire of genes in the cluster and the set of genes that are common to all genomes in the cluster, respectively. The distribution of genes frequencies typically follows an asymmetric U shape, which consists of a "cloud" of accessory genes, a "shell" of genes with intermediate frequencies, and the core of essential genes that are present in (almost) all genomes. Here, we study a minimal mathematical model for prokaryotic genome content evolution and analyze 34 clusters of closely related genomes. We relate the genomes similarity decay with the gene frequency distribution, and show that the latter can be reconstructed using the model. We find that selection plays a role in maintaining genes in the core-genome.

11:30 AM-11:40 AM
A package for computing distance metrics for phylogenetic networks
Room: Columbus CD
  • Louxin Zhang

Presentation Overview: Show

Invited Talk

11:40 AM-12:00 PM
Proceedings Presentation: Deconvolution and phylogeny inference of structural variations in tumor genomic samples
Room: Columbus CD
  • Jesse Eaton, Carnegie Mellon University, United States
  • Jingyi Wang, Carnegie Mellon University, United States
  • Russell Schwartz, Carnegie Mellon University, United States

Presentation Overview: Show

Motivation: Phylogenetic reconstruction of tumor evolution has emerged as a crucial tool for making sense of the complexity of emerging cancer genomic data sets. Despite the growing use of phylogenetics in cancer studies, though, the field has only slowly adapted to many ways that tumor evolution differs from classic species evolution. One crucial question in that regard is how to handle inference of structural variations (SVs), which are a major mechanism of evolution in cancers but have been largely neglected in tumor phylogenetics to date, in part due to the challenges of reliably detecting and typing SVs and interpreting them phylogenetically.

Results: We present a novel method for reconstructing evolutionary trajectories of SVs from bulk whole-genome sequence data via joint deconvolution and phylogenetics, to infer clonal subpopulations and reconstruct their ancestry. We establish a novel likelihood model for joint deconvolution and phylogenetic inference on bulk SV data and formulate an associated optimization algorithm. We demonstrate the approach to be efficient and accurate for realistic scenarios of SV mutation on simulated data. Application to breast cancer genomic data from The Cancer Genome Atlas (TCGA) shows it to be practical and effective at reconstructing features of SV-driven evolution in single tumors.

Availability and Implementation: Python source code and associated documentation are available at https://github.com/jaebird123/tusv

Contact: Russell Schwartz (russells@andrew.cmu.edu)

12:00 PM-12:10 PM
Inferring Parsimonious Migration Histories for Metastatic Cancers
Room: Columbus CD
  • Mohammed El-Kebir, University of Illinois at Urbana-Champaign, United States
  • Gryte Satas, Brown University, United States
  • Ben Raphael, Princeton University, United States

Presentation Overview: Show

Metastasis is the migration of cancerous cells from a primary tumor to other anatomical sites. While metastasis was long thought to result from monoclonal seeding, or single cellular migrations, recent phylogenetic analyses of metastatic cancers have reported complex patterns of cellular migrations between sites, including polyclonal migrations and reseeding. However, accurate determination of migration patterns from somatic mutation data is complicated by intra-tumor heterogeneity and discordance between clonal lineage and cellular migration. We introduce MACHINA, a multi-objective optimization algorithm that jointly infers clonal lineages and parsimonious migration histories of metastatic cancers from DNA sequencing data. MACHINA analysis of data from multiple cancers reveals that migration patterns are often not uniquely determined from sequencing data alone, and that complicated migration patterns among primary tumors and metastases may be less prevalent than previously reported. MACHINA’s rigorous analysis of migration histories will aid in studies of the drivers of metastasis.

12:10 PM-12:20 PM
Genetic and transcriptional instability alters cancer cell line drug response
Room: Columbus CD
  • Benjamin Siranosian, Harvard University, United States
  • Uri Ben-David, Harvard University, United States
  • Rameen Beroukhim, Harvard University, United States
  • Todd Golub, Harvard University, United States

Presentation Overview: Show

Inconsistencies in cell line-based studies jeopardize reproducibility of cancer research. Natural evolution leading to genetic and transcriptional heterogeneity within cancer cell lines may contribute to such inconsistencies. We therefore performed genetic and transcriptomic characterization of 27 strains of the common breast cancer cell line MCF7 and assessed the response of those strains to 321 anti-cancer compounds. We found extensive genomic variation across strains, resulting in disparate drug responses. Genetic variation occurred at all levels – point mutations, rearrangements, and copy number changes – and affected multiple oncogenes and tumor suppressor genes. Similar observations were obtained across 23 strains of the lung cancer cell line A549, as well as in multiple other cell lines, indicating that genomic variation is a general property of cancer cell lines. Genetic changes resulted in substantial differences in gene expression programs, cell morphology and proliferation, and strikingly high variability in drug response. Over 75% of the compounds that exhibited strong activity in some of the strains were completely inactive in others. Genomic analyses of single cell-derived clones showed that ongoing instability quickly translated into cell line heterogeneity, even in cell populations originating from a single cell. These findings have broad practical implications for cell line-based research.

12:20 PM-12:30 PM
Regional transmission and antibiotic resistance evolution of the hospital pathogen Klebsiella pneumoniae
Room: Columbus CD
  • Zena Lapp, University of Michigan, United States
  • Jennifer Han, University of Pennsylvania, United States
  • Evan Snitkin, University of Michigan, United States

Presentation Overview: Show

Transmission and evolution of multi-drug resistant organisms (MDROs) represents a global public health threat. The CDC considers the MDRO carbapenem-resistant Klebsiella pneumoniae (CRKP) an urgent threat due to limited treatment options. We aim to understand how this nosocomial pathogen spreads between hospitals and evolves antibiotic resistance. Whole genome sequences (WGS) of ~400 isolates from 11 long-term acute care hospitals in the Los Angeles area were supplemented with geographically diverse isolates to understand how CRKP entered and spread through the region. Variants associated with antibiotic resistance were overlaid on the transmission network to discern where resistance emerged and how it spread between facilities. Using WGS, we elucidated a regional pathogen transmission network in an endemic setting. Transmission into the area occurred multiple times and although inter-facility transmission is common, intra-facility transmission drives disease prevalence. Resistance to the last-line drug colistin arose and disseminated through single nucleotide variants, indels, and large insertions in known resistance genes. This research demonstrates that WGS of pathogen isolates provides sufficient information to reconstruct transmission networks in endemic areas and thus could guide infection prevention efforts. Additionally, the enhanced understanding of antibiotic resistance evolution provided by WGS may inform antibiotic stewardship to reduce MDRO emergence and prevalence.

12:30 PM-12:40 PM
A non-parametric statistical test for determining fine-scale temporal variation patterns in evolving populations
Room: Columbus CD
  • Minjung Kwak, Yeungnam University, South Korea
  • Seokwoo Kang, Pusan National University, South Korea
  • Giltae Song, Pusan National University, South Korea

Presentation Overview: Show

Abnormal variations are frequent in clonal genome evolution of cancers. Such aberrational variations often function as a driver in cancer cell growth. Understanding fundamental evolutionary dynamics underlying these variations in tumor metastasis still is understudied owing to their genetic complexity.

Recently, whole genome sequencing empowers to determine genome variations in short-term evolution of cell populations. This approach has been applied to evolving populations of unicellular organisms including yeast. It is substantial progress in evolutionary genomics to examine sequence changes at such fine-scale resolution. However, existing statistical tests for analyzing variation temporal changes in multiple time-points are limited to identify the full spectrum of intermediate changes.

We designed a new statistical approach based on Kolmogorov-Smirnov test and integrated it into a software tool for determining the variation patterns in fine-scale temporal resolution in experimental evolution studies. We validated our method using simulation data and analyzed yeast (Saccharomyces cerevisiae) W303 strain genomes from 40 populations at 12 time-points using our software tool.

12:40 PM-2:00 PM
Lunch Break
2:00 PM-2:20 PM
Proceedings Presentation: Accurate prediction of orthologs in the presence of divergence after duplication
Room: Columbus CD
  • Manuel Lafond, University of Ottawa, Canada
  • Mona Meghdari Miardan, University of Ottawa, Canada
  • David Sankoff, University of Ottawa, Canada

Presentation Overview: Show

Motivation: When gene duplication occurs, one of the copies may become free of selective pressure and evolve at an accelerated pace. This has important consequences on the prediction of orthology relationships, since two orthologous genes separated by divergence after duplication may differ in both sequence and function. In this work, we make the distinction between the primary orthologs, which have not been affected by accelerated mutation rates on their evolutionary path, and the secondary orthologs, which have.
Similarity-based prediction methods will tend to miss secondary orthologs, whereas phylogeny-based methods cannot separate primary and secondary orthologs. However, both types of orthology have applications in important areas such as gene function prediction and phylogenetic reconstruction, motivating the need for methods that can distinguish the two types.

Results: We formalize the notion of divergence after duplication, and provide a theoretical basis for the inference of primary and secondary orthologs.
We then put these ideas to practice with the HyPPO (Hybrid Prediction of Paralogs and Orthologs) framework, which combines ideas from both similarity and phylogeny approaches. We apply our method to simulated and empirical datasets, and show that we achieve superior accuracy in predicting primary orthologs, secondary orthologs and paralogs.
Availability: HyPPO is a modular framework with a core developed in Python, and is provided with a variety of C++ modules. The source code is available at https://github.com/manuellafond/HyPPO.

2:20 PM-2:30 PM
Testing the “ortholog conjecture” with phylogenetic and pairwise methods under realistic simulations of evolution after duplication, and after fish whole genome duplication
Room: Columbus CD
  • Tina Begum, University of Lausanne, Switzerland
  • Martha Liliana Serrano-Serrano, University of Lausanne, Switzerland
  • Marc Robinson-Rechavi, University of Lausanne, Switzerland

Presentation Overview: Show

In comparative genomics, phylogenetic analyses of gene expression have great potential for addressing a wide range of biological questions. Notably the recently controversial “Ortholog Conjecture” (OC) that assumes paralogs evolve functionally faster than orthologs. Using pairwise comparisons of tissue specificity, we proposed support for the OC [1]. Using the phylogenetic comparative method (PCM) on time calibrated empirical gene trees; Dunn et al. [2] found that support for the OC was lost. We performed simulation under different models of evolution, overlooked by Dunn et al, notably changes in sequence evolutionary rate and models of functional shift. Although PCM is more sensitive, concomitant increases in evolutionary rates of sequences and of trait after duplication could lead to a false rejection of the OC. Interestingly, the pairwise method outperforms the PCM in this scenario for all models of evolution. To revalidate our result empirically, we used fish whole genome duplicate gene trees and transcriptomes from 11 fishes. Contra Dunn et al. [2], we found that the OC holds true for fish ohnologs.

[1] Kryuchkova-Mostacci N, Robinson-Rechavi M (2016). PLoS Comput Biol.12: e1005274.
[2] Dunn CW, Zapata F, Munro C, Siebert S, Hejnol A (2018). Proc Natl Acad Sci U S A. http://www.pnas.org/cgi/doi/10.1073/pnas.1707515115

2:30 PM-2:40 PM
Reconstructing the history of syntenies through SuperReconciliation
Room: Columbus CD
  • Nadia El-Mabrouk, University of Montreal, Canada
  • Katharina T. Huber, University of East Anglia, Norwich, United Kingdom
  • Manuel Lafond, University of Ottawa, Canada
  • Vincent Moulton, University of East Anglia, Norwich, United Kingdom
  • Miguel Sautie Castellanos, University of Montreal, Canada

Presentation Overview: Show

Gene and species tree reconciliation is used to recover the history of gene gain and loss explaining the evolution of gene families. It consists of an embedding of a gene tree, usually obtained from a gene sequence alignment, into a species tree.

However, reconciliation only permits considering each gene family individually, assuming an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is clearly not suited for genes appearing grouped in syntenic blocks, such as operons in bacteria, which are more plausibly the result of segmental gene gain and loss events.

In this presentation, we will introduce the SuperReconciliation concept for inferring the most parsimonious history of segmental gain and loss events leading to a set of present-day syntenies from a single ancestral one. In other words, we will extend the Duplication-Loss reconciliation model to the reconciliation of a set of trees, accounting for segmental duplications and losses.

From a complexity point of view, the problem is shown to be NP-hard. We will present an exact algorithm allowing to solve the problem, and present a proof of concept on the opioid receptor genes.

2:40 PM-2:50 PM
OMA standalone: orthology inference among public and custom genomes and transcriptomes
Room: Columbus CD
  • Christophe Dessimoz, University of Lausanne, Switzerland

Presentation Overview: Show

Genomes and transcriptomes are now typically sequenced by individual labs, but their analysis often remains challenging. One essential step in many analyses lies in identifying orthologs—corresponding genes across multiple species—but this is far from trivial computationally. The OMA (Orthologous MAtrix) database is a leading resource for identifying orthologs among publicly available complete genomes. In this talk, I will describe the OMA pipeline available as a stand-alone program for Linux and Mac. When run on a cluster, it has native support for the LSF, SGE, PBS Pro, and Slurm job schedulers and can scale up to thousands of parallel processes. Another key feature of OMA standalone is that users can combine custom data with existing public data by exporting precomputed genomes from the OMA database, which currently contains over 2000 complete genomes. I will showcase typical applications of OMA standalone. OMA Standalone is available at http://omabrowser.org/standalone under the permissible open source Mozilla Public License Version 2.0.

2:50 PM-3:00 PM
Characterization of human orphan genes and its functions using annotated and unannotated genomic and massive RNA-seq data and metadata
Room: Columbus CD
  • Urminder Singh, Iowa State University, United States
  • Arun Seetharam, Iowa State University, United States
  • Zebulun Arendsee, Iowa State University, United States
  • Eve Wurtele, Iowa State University, United States

Presentation Overview: Show

How do new gene functions originate? This is one of the most intriguing questions of biology. Ancient genes change their functionality over time by gene duplication and divergence. More recently, it is apparent that new genes (orphan genes) arise de novo from within the genome. To decipher the origin of new genes and understand how they may function, we conducted an in-depth analysis of the human genome and transcriptome. We compiled a list of putative orphan genes in the human genome using genes from Ensemble (v. 91), orphan genes from literature, and randomly selected 10,000 ORFs over 200nt some of which might be un-annotated orphan genes. We created a pipeline to download 8,619 runs of public RNA-seq data (~15 TB) and metadata from the NCBI-SRA/GEO databases and mapped the human transcriptome from Ensembl and our list of putative orphan genes to these data to get a comprehensive expression profile for the human transcripts. Meta-analysis of this data showed orphan genes and some unannotated ORFs are transcribed at rates similar to those of ancient genes. Our work uses massive amounts RNA-seq data and metadata to provide a powerful approach to identifying potential new genes and developing hypotheses about their functions.

3:00 PM-3:20 PM
Proceedings Presentation: Inference of Species Phylogenies from Bi-allelic Markers Using Pseudo-likelihood
Room: Columbus CD
  • Jiafan Zhu, Rice University, United States
  • Luay Nakhleh, Rice University, United States

Presentation Overview: Show

Phylogenetic networks represent reticulate evolutionary histories. Statistical methods for their inference under the multispecies
coalescent have recently been developed. A particularly powerful approach uses data that consist of bi-allelic markers (e.g., single nucleotide polymorphism data) and allows for exact likelihood
computations of phylogenetic networks while numerically integrating over all possible gene trees per marker. While the approach has good accuracy in terms of
estimating the network and its parameters, likelihood computations remain a major computational bottleneck and limit the method's applicability.
In this paper, we first demonstrate why likelihood computations of networks take orders of magnitude more time when compared to trees. We then propose
an approach for inference of phylogenetic networks based on pseudo-likelihood using bi-allelic markers. We demonstrate the scalability and accuracy of phylogenetic network inference via pseudo-likelihood computations on simulated data. Furthermore, we demonstrate aspects of robustness of the method to violations in the underlying
assumptions of the employed statistical model. Finally, we demonstrate the application of the method to biological data.
The proposed method allows for analyzing larger data sets in terms of the numbers of taxa and reticulation events. While pseudo-likelihood had been proposed before for
data consisting of gene trees, the work here uses sequence data directly, offering several advantages as we discuss.
The methods have been implemented in PhyloNet (\textit{http://bioinfocs.rice.edu/phylonet}).

3:20 PM-3:30 PM
A unified maximum likelihood approach to B cell repertoire phylogenetics
Room: Columbus CD
  • Kenneth B. Hoehn, Yale University, United States
  • Steven H. Kleinstein, Yale University, United States

Presentation Overview: Show

Affinity maturation in B cells is an evolutionary process of descent from a germline sequence through somatic hypermutation (SHM), and clonal selection. Because of the similarity between affinity maturation and evolution in natural populations, phylogenetics has a long history of use in studying B cell clonal lineages. Common applications include constructing B cell clonal lineage trees, quantifying SHM and clonal selection, and reconstructing unobserved intermediate sequences and germline alleles. However, affinity maturation presents problems for model-based phylogenetic analysis. SHM is biased by sequence context, violating important assumptions in most substitution models. We previously introduced the HLP17 substitution model which accounted for this by explicitly modelling SHM bias. Unfortunately, this model is highly parametric, making parameter estimation imprecise on small lineages. To address this, we allow certain parameters to be shared among lineages within a repertoire, allowing for far greater power in parameter estimation, and for joint reconstruction of lineage trees using all available sequence information in the repertoire. We show how confidence intervals are estimated for these parameters, and how they may be used to detect processes such as dysregulation of clonal selection and SHM. All of these features are implemented within the program IgPhyML.

3:30 PM-3:40 PM
Mirage: Splice-Aware Multiple Sequence Alignment of Protein Isoforms
Room: Columbus CD
  • Alex Nord, University of Montana, United States
  • Peter Hornbeck, Cell Signaling Technology, United States
  • Travis Wheeler, University of Montana, United States

Presentation Overview: Show

Alternative splicing presents a challenge for traditional multiple sequence alignment (MSA) tools when aligning protein isoforms. Mirage is a novel tool that accurately aligns isoforms by first mapping proteins to their encoding genomic sequence, and then aligning proteins to one another based on the genomic coordinates of their constitutive codons. Mirage's resulting MSAs display the underlying exonic structures of individual isoforms. Mirage combines original implementations of alignment and graph algorithms with existing software tools to maximize the number of protein sequences that successfully map back to their species' genomes. The memory overhead of Mirage is low enough to run on a standard desktop computer and the runtime of Mirage is competitive with popular MSA tools. The isoform MSAs produced by Mirage are significantly more accurate than those produced by existing MSA tools. Mirage is now being used for sequence alignment in phosphosite.org, a web service for understanding post-translational modification of protein sequences. Mirage alignments can help identify annotation errors and have revealed the ubiquity of "alternative reading frames" (ARFs) in which discrete exons encode multiple open reading frames as overlapping spliced segments of genomic sequence that are frameshifted. Mirage has identified putative ARFs in 7% of human genes.

3:40 PM-3:50 PM
Protein-Protein Interactions of Stress-Response Genes are Conserved in Subterranean and Fossorial Animals and Cluster Unambiguously to their Shared Ecology
Room: Columbus CD
  • Gon Carmi, Bar-Ilan University, Israel
  • Somnath Tagore, Bar-Ilan University, Israel
  • Alessandro Gorohovski, Bar-Ilan University, Israel
  • Aviad Sivan, Bar-Ilan University, Israel
  • Dorith Raviv-Shay, Bar-Ilan University, Israel
  • Milana Frenkel-Morgenstern, The Azrieli Faculty of Medicine, Bar-Ilan University, Israel

Presentation Overview: Show

Subterranean ecology provides a unique natural experiment of convergent evolution of unrelated taxa due to shared physical conditions and/or stresses. A principal example is the evolution of subterranean mammals in comparison to fossorial animals. In this study, we present a comparative analysis of protein-protein interactions between stress-response genes among 14 subterranean and fossorial animals. We found complete sub-clustering of these animals to their ecologies based on the protein-protein interaction network properties. We developed and applied a novel ‘evolution of protein domains’ model for evolution of proteins which is based on ‘mix and merge’ of protein domains. We define evolutionary events, translocation, as reciprocal exchange of a protein domain between two groups of orthologues between organisms, implemented on an assembled comprehensive collection of 76 organisms. We found translocations of key protein domains that are involved in pathways for sensing and regulating responses to environmental cues such as light and oxygen. Proteins containing these domains were found to be hubs in their PPI sub-networks. The distribution of hubs in ecology-specific protein-protein sub-networks accounts for the sub-clustering by ecology.

3:50 PM-4:00 PM
The evolution, conservation, and covariation of the Psp envelope-stress-response system
Room: Columbus CD
  • Janani Ravi, Michigan State University, United States
  • Vivek Anantharaman, NCBI, United States
  • L Aravind, NCBI, United States
  • Maria Laura Gennaro, Rutgers University, United States

Presentation Overview: Show

The phage shock protein (Psp) stress-response system protects bacteria from envelope stress and stabilizes the cell membrane. The key effector protein, PspA, is found in diverse bacterial, archaeal and plant phyla. Despite the prevalence of the functional Psp system, the various genomic contexts of Psp proteins, as well as their evolution across the kingdoms of life, have not been characterized.
We developed a computational pipeline for comparative genomics and protein sequence-structure-function analyses to identify sequence homologs, phyletic patterns, domain architectures, gene neighborhoods, sequence conservation and evolution of the proteins and domains of interest across the tree of life (~6000 completed genomes).
We first determined PspA-containing species across the tree of life followed by the other known cognate partners of PspA (pspBC, pspMN, liaIGF). Using contextual information from conserved gene neighborhoods and their domain architectures, we delineated the phyletic patterns of all the Psp members. In addition to systematically identifying all possible ‘flavors’ and neighborhoods of the known Psp systems, we could also trace their evolution (e.g., back to LUCA for PspA) leading us to several interesting observations as to their occurrence and co-migration, suggesting their function and role in stress-response systems dependent and independent of PspA that are often lineage-specific.

4:00 PM-4:40 PM
Coffee Break