Posters

Poster numbers will be assigned May 30th.
If you can not find your poster below that probably means you have not yet confirmed you will be attending ISMB/ECCB 2015. To confirm your poster find the poster acceptence email there will be a confirmation link. Click on it and follow the instructions.

If you need further assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category B - 'Comparative Genomics'
B01 - Using bioinformatics to elucidate the structure, function and evolution of an enigmatic class of multifunctional eukaryotic proteins – the caleosins
Negusse Kitaba, University of South Wales,
Farzana Rahman , University of South Wales,
Denis Murphy, University of South Wales,
Short Abstract: The caleosin group is a major family of proteins that occur in two major eukaryotic clades, namely Viridiplantae and Fungi. This pattern of occurrence is not consistent with the evolution of caleosin genes from a common ancestor because the Fungi, along with animals and many protists, are members of the Opisthokonta, while the Viridiplantae are derived from green algal predecessors. This suggestion that caleosin genes may have been acquired in one of the current clades via by horizontal gene transfer from the other.
We have studied the variation in caleosin-related gene and protein sequences across several hundred species and have developed de novo structural information for the protein using a combination of computational prediction and experimental work. Protein structure prediction suggests that the calcium-binding domain is widely conserved across species, while there is large variation in the loop region of the structure. While the biological functions of caleosins have yet to be determined in detail it is clear that these proteins have several subcellular locations and participate in a range of physiological processes in both plants and fungi. Our biochemical and modeling analyses demonstrate that two forms of caleosins are present in plants, a lipid-droplet form and a bilayer membrane form. Both caleosin forms have canonical calcium binding and phosphorylation domains and appear to have peroxygenase activity. In this presentation we describe further modelling and bioinformatics studies that are beginning to shed light on the origin and functions of this intriguing group of proteins.
TOP
B02 - Applying Bioinformatic tools to explore the evolution of the mRNA processing machinery in the fungal kingdom
Alejandro Rodríguez-Iglesias, CBGP-UPM, Spain
Marco Marconi, CBGP-UPM, Spain
Ane Sesma-Galarraga, CBGP-UPM, Spain
Mark Wilkinson, CBGP-UPM, Spain
Short Abstract: 1. Background:
The maturation of eukaryotic mRNA consists of a series of co-transcriptionally stages, including processes like splicing and 3'-end polyadenylation. These two machineries have already been characterized in Saccharomyces cerevisiae, Schizosaccharomyces pombe and Homo sapiens, but the knowledge in filamentous fungi is scarce. Our goal is to undertake a complete bioinformatic analysis and characterization of the evolutionary story of both the polyadenylation and the splicing machinery in the major branches (~30 species) of the fungal kingdom.

2. Results:
Our analysis suggests that both machineries present a moderate conservation pattern throughout the species studied. Interestingly, we discovered variable conservation criteria based on the phylogenetic classification of the fungi. In some cases, the protein architecture indicated a species-specific conservation of certain domains. Differences were also found at the genomic and transcriptomic level when analyzing both machineries, regarding introns/UTRs architecture, cut-sites selection and protein-binding motifs.
TOP
B03 - A Comparative Analysis of the Genetic Relationships Between the Pathogens of Ebola Hemorrhagic Fever, Marburg Virus, HIV, Hepatitis A, Hepatitis B, Hepatitis C, Hepatitis D, and Hepatitis E
Olaitan Awe, University of Ibadan, Nigeria
Segun Fatumo, University of Ibadan, Nigeria
Olugbenga Oluwagbemi, Covenant University, Nigeria
Angela Makolo, University of Ibadan, Nigeria
Short Abstract: Ebola is a public health problem and a global monster currently ravaging many nations of the world especially West Africa. Ebola viruses are highly pathogenic, exotic agents that can cause severe hemorrhagic fever disease in human and/or nonhuman primates. Ebola is a member of the negative-stranded RNA virus family Filoviridae. Ebola is a very deadly virus. A comparison of gene content and genome architecture of Ebola Hemorrhagic Fever, Marburg virus, HIV, Hepatitis A, Hepatitis B, Hepatitis C, Hepatitis D, Hepatitis E; major, eight related pathogens with different life cycles and disease pathology, revealed a conserved core protein sequence of genes in large syntenic polycistronic gene clusters. In this paper, we highlight the genetics of the Ebola genome with the genome of seven other viruses to identify points of significant similarities and disparities.
The basic structure of Ebola is long and filamentious, essentially bacilliform, but the virus often takes on a "U" shape, and the particles can be up to 14,000 nm in length and average 80 nm in diameter. Genomics provides an unprecedented opportunity to probe in minute detail into the genomes of West Africa's most recent deadly disease - Ebola Hemorrhagic Fever. Here we report comparative genomics of Ebola strain, Zaire ebolavirus isolate Ebola virus/H.sapiens-wt/SLE/2014/Makona-EM095, complete genome. Knowledge gained from this comparative analysis can help provide innovative methods in solving the Ebola menace. An integrative knowledge of genetics and skills in bioinformatics can form a formidable tool in fighting the Ebola menace.
TOP
B04 - Entropy-driven partitioning of the hierarchical protein space
Nadav Rappoport, Hebrew University of Jerusalem, Israel
Michal Linial, Hebrew University of Jerusalem, Israel
Short Abstract: Modern protein sequencing techniques have led to the determination of >90 million protein sequences. ProtoNet is a clustering system that provides a continuous hierarchical agglomerative clustering tree for all proteins. While ProtoNet performs unsupervised classification of all included proteins, finding an optimal level of granularity for the purpose of focusing on protein functional groups remain elusive. Here, we ask whether knowledge-based annotations on protein families can support the automatic unsupervised methods for identifying high-quality protein families.
We present a method that yields within the ProtoNet hierarchy an optimal partition of clusters, relative to manual annotation schemes. The method’s principle is to minimize the entropy-derived distance between annotation-based partitions and all available hierarchical partitions. We describe the “Best Front” partition of 2,478,328 proteins from UniRef50. Of 4,929,553 ProtoNet tree clusters, BF based on Pfam annotations contain 26,891 clusters. The high quality of the partition is validated by the close correspondence with the set of clusters that best describe thousands of keywords of Pfam. The BF is shown to be superior to naïve cut in the ProtoNet tree that yields a similar number of clusters. Finally, we used parameters intrinsic to the clustering process to enrich a priori the BF’s clusters. We present the entropy-based method’s benefit in overcoming the unavoidable limitations of nested clusters in ProtoNet.
We suggest that this automatic information-based cluster selection can be useful for other large-scale annotation schemes, as well as for systematically testing and comparing putative families derived from alternative clustering methods.
TOP
B05 - Protein Structure Novelty has Regressed 20 Years
John-Marc Chandonia, Lawrence Berkeley National Laboratory, United States
John-Marc Chandonia, Berkeley National Lab, United States
Steven Brenner, University of California, Berkeley, United States
Short Abstract: The number of new protein structures deposited every month in the PDB has steadily increased, and is now at over 750 structures per month. On average, fewer than 15 of these structures (i.e., 2%) represent the first solved structure from a Pfam protein family. Fifteen families per month is the lowest rate at which families have been structurally characterized in nearly 20 years, despite vastly more efficient technology. Today, less than half as many families are newly structurally characterized every month as during the heyday of Structural Genomics, between 2003 and 2007. Because the rate of sequencing has outpaced the rate of structural characterization of families, the fraction of large protein families with a known structure peaked 7 years ago, and is 10% lower today than it was at its peak. This makes curation of protein structure classification databases easier, but interpretation of sequence variation is more challenging than would otherwise be the case.
TOP
B06 - Convergent losses of decay mechanisms and rapid turnover of symbiosis genes in mycorrhizal mutualists
Emmanuelle Morin, INRA UMR 1136, France
Francis Martin, INRA UMR 1136, France
Annegret Kohler, INRA UMR 1136, France
Alan Kuo, Joint Genome Institute, United States
László Nagy, MTA Szegedi Biológiai Központ, Hungary
Igor Griegoriev, Joint Genome Institute, United States
David Hibett, Clark University, United States
Short Abstract: Ectomycorrhizal (ECM) fungi provide crucial ecological services in interacting with most forest trees. They are portrayed as mutualists trading plant host photoassimilates for nutrients. A major goal of mycorrhizal studies is to identify the ‘symbiosis genes’ that encode the molecules that mediate and regulate symbiosis development and the coordinated symbiotic metabolic pathways.
To identify the genetic innovations that led to the mycorrhizal lifestyle, a comparative genomics project has been implemented by the MGI consortium. We have conducted the first broad, comparative phylogenomic analysis of mycorrhizal fungi, drawing on 49 fungal genomes, 18 of which were sequenced for this study. The 18 new fungal sequences included 13 mycorrhizal genomes. The analyses of these genomes and fossils suggested that in comparison to wood decayers, that evolved over 300 million years ago, ECM fungi emerged more recently from several species of wood and litter decayers, and then spread out across lineages less than 200 million years ago. It appears that mushroom-forming fungi evolved a complex mechanism for breakdown of plant cell walls in ‘white rot’ which was then cast aside by those lineages that evolved ECM associations. Transcript profilings showed that in mycorrhizal lineages there is a huge turnover in genes that are involved in the symbiosis. Many of these have no homologs in even closely related species, suggesting that the evolution of mutualism is associated with striking genetic innovation. We have now a better understanding on how plants and fungi developed symbiotic relationships.
TOP
B07 - Nomenclature of the olfactory receptor gene family
Tsviya Olender, The Weizmann Institute of Science, Israel
Elspeth Bruford, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus,
Doron Lancet, The Weizmann Institute of Science, Israel
Short Abstract: Olfactory Receptors (ORs) are G protein-coupled receptors with a crucial role in odor detection. There are ~1000 OR genes and pseudogenes in a typical mammalian genome, however the number of functional ORs varies among species reflecting their adaptation to different environments, a process which involves gene duplication/deletion events. While for human the current OR nomenclature is based on sequence similarity classification, for other mammals such a nomenclature has not yet been adopted, thus concealing important structural and functional insights. The difficulty stems from the complex orthology relationship among the ORs. We developed the Mutual Maximum Similarity (MMS) algorithm, a systematic classifier for assigning a human-based nomenclature to any OR gene based on detecting hierarchical similarity relationships between any two species. We used the MMS algorithm to compare mouse and rat OR repertoires to human, dog and opossum, and assigned a symbol for each rodent gene. In mouse, 31% of the symbols assigned were identical to human symbols, reflecting orthology. An additional 63% of the symbols were classified into pre-defined OR subfamilies; the remainder (6%) were classified into novel OR subfamilies. In rat, 86% of the genes were assigned the same symbol as their mouse ortholog. The suggested nomenclature was further supported by synteny and phylogenetic analyses. Using symbol comparison only we identified species-specific expansions in mouse, rat and human, demonstrating the power of this unified nomenclature system in generating a framework for studying mammalian OR evolution. This nomenclature will be expanded to other mammals in due course.
TOP
B08 - The transcriptome evolution of spatial organization in primate brains
Qian Li, CAS-MPG Partner Institute for Computational Biology, China
Yasuhiro Go, Department of Brain Sciences, Center for Novel Science Initiatives, National Institutes of Natural Sciences, Japan
Philipp Khaitovich, CAS-MPG Partner Institute for Computational Biology, China
Short Abstract: The human brain is a remarkably complex organ providing our species with a set of unique cognitive abilities. It is known to be composed of different regions that are responsible for different functions. However, the mechanism underlying the different human brain regions and their specific functions remains unclear. Here, we sequenced the transcriptome of 8 brain regions, including 5 cortical regions and 3 non-cortical regions in humans and 3 non-human primates (chimpanzees, gorillas and gibbons) in order to explore the molecular features underlying the phenotype differences. Regarding the gene expression level, we detected more genes showing specificity in human lineage in 7 of the 8 brain regions (excluding hippocampus) than on chimpanzee lineage using gorillas and gibbons as evolutionary outgroups. Further analysis revealed genes whose expressions are specific to a certain brain region and human lineage, which showed involvement in neuropsychiatric disorders. Our analysis provides insights into the evolution of brain regions of humans.
TOP
B09 - The Proteome Quality Index: 10 reasons why comparative genomics may be flawed
Jan Zaucha, University of Bristol,
Jonathan Stalhacke, University of Bristol,
Matt Oates, University of Bristol,
Natalie Thurlby, University of Bristol,
Owen Rackham, University of Bristol,
Hai Fang, University of Bristol,
Ben Smithers, University of Bristol,
Julian Gough, University of Bristol,
Short Abstract: Previously published work, especially in the field of comparative genomics, may have reached spurious conclusions. Proteomes are the end products of genome sequencing projects, but de novo assembly is still a challenging task and detailed investigations have demonstrated that the quality of the results is not consistent across all projects. PQI, the recently published Proteome Quality Index, is a web resource, which provides a quality measure for 3,200 proteomes from complete sequencing projects. Researches in comparative genomics can use PQI to easily select high-quality proteomes for their work, which can then also be downloaded directly from the website. Additionally, we encourage data providers and reviewers of novel assembly projects to use PQI to assess the quality of sequencing efforts.

PQI is a constantly updated comprehensive database of proteomes that includes entries from multiple providers including NCBI and ENSEMBL. For each proteome PQI provides a human-readable 1-5 star rating determined by numerous scoring metrics taking into account the protein composition and phylogenetic placement. Additionally, PQI features an ever-growing compendium of manually curated information including user scores as well as comments on select proteomes from experts in the field.

Proteome quality assessment is not a straightforward task, but with PQI we hope to seed discussion and draw attention to its importance.
TOP
B10 - What sequence information can reveal: The functional evolution of arrestins in deuterostomes
Henrike Indrischek, University Of Leipzig, Germany
Sonja Prohaska, University of Leipzig, Germany
Peter Stadler, University of Leipzig, Germany
Short Abstract: The cytosolic arrestin proteins mediate desensitization of activated G-protein coupled receptors via competitive binding of the receptor or via receptor internalization mediated by clathrin binding motifs. As different arrestin conformations can result in specific signaling outcomes, this protein family is a possible target in drug therapeutics. The aim of the current study was to improve the existing incomplete and error-prone annotations of arrestin genes in order to reveal details about the functional evolution of these proteins.
Identity and number of arrestin paralogs were determined searching vertebrate genomes or, alternatively, gene expression data with individual profile Hidden Markov models for each exon and paralog. Unlike standard gene prediction methods, our pipeline can detect exons situated on different scaffolds and assign them to the same gene, increasing completeness of the annotation.
We uncovered the interesting duplication- and deletion history of arrestin paralogs in deuterostomes including tandem duplications, pseudogenization and retrogene formation. At the root of vertebrates, two whole genome duplications have given rise to four arrestin paralogs from a single vertebrate arrestin as it is found in ciona today. An additional clathrin binding motif was gained after the duplications. The precursor of visual arrestins lost the first clathrin binding motif establishing a functional difference in arresting G-protein coupled signaling by non-visual and visual arrestins.
The current work shows how an improved annotation of a multi-exon gene family can result in a detailed understanding of the link between gene architecture and functional evolution.
TOP
B11 - Superbugs in the Supermarket? Evolution of antibiotic resistance on American factory farms
Abigail McGuire, Broad Institute, United States
Daria Van Tyne, Harvard Medical School, Mass Eye and Ear Infirmary, Broad Institute, United States
Francois Lebreton, Harvard Medical School, Mass Eye and Ear Infirmary, Broad Institute, United States
Gustavo Cerqueira, Broad Institute, United States
Margaret Priest, Broad Institute, United States
Jose Saavedra, Harvard Medical School, Mass Eye and Ear Infirmary, United States
Sarah Clock, Consumer Reports, United States
Michael Crupain, Consumer Reports, United States
Urvashi Rangan, Consumer Reports, United States
Michael Gilmore, Harvard Medical School, Mass Eye and Ear Infirmary, Broad Institute, United States
Ashlee Earl, Broad Institute, United States
Short Abstract: Factory farms are unique, human-created ecosystems that provide the perfect setting for development of antibiotic resistance. Agricultural bacterial strains, routinely exposed to antibiotics, can serve as a conduit for movement of resistance genes from soil ecologies, where they exist naturally, to hospital endemic strains. Our previous work showed that epidemic, multidrug-resistant, hospital-adapted Enterococcus faecium evolved from animal strains, suggesting a direct link between farm and clinic (Lebreton et al., 2013). We whole genome sequenced 92 strains of enterococci isolated from chicken breasts purchased at supermarkets by Consumer Reports (CR) in 2013 to analyze their gene content and relatedness to previously sequenced enterococci. Based on gene content, we observed striking levels of resistance to medically important classes of antibiotics, particularly those approved by the FDA for use as growth promoters in chicken production. Twenty-nine percent of CR strains contained recognizable bacteriophages encoding putative resistance genes, indicating that bacteriophage may represent an underappreciated mode of antibiotic resistance gene transmission. Based on their phylogenetic relatedness to previously sequenced enterococci (112 strains in 23 species), we determined that the CR strains represented six enterococcal species, and were enriched in the clinically important species E. faecium and E. faecalis. Poultry-associated E. faecium were very closely related to multidrug-resistant, hospital-adapted E. faecium, indicating that bacteria evolving within the factory farm ecosystem are just a few steps removed from hospital-adapted, antibiotic-resistant “superbugs”.
TOP
B12 - Large-scale modular comparative genomics: the Grid apporach
Fotis Psomopoulos, Center for Research and Technology Hellas, Greece
Olga Vrousgou, Aristotle University of Thessaloniki, Greece
Pericles Mitkas, Aristotle University of Thessaloniki, Greece
Short Abstract: Motivation
==========
In the era of Big Data in Life Sciences, efficient processing and analysing of vast amounts of sequence data is becoming an ever daunting challenge. Among such analyses, sequence alignment is one of the most commonly used procedures, as it provides useful insights on the functionality and relationship of the involved entities. At the same time however, it is one of the most common computational bottlenecks in several bioinformatics workflows, especially when combined with the construction of families and phylogenetic profiles.

Methods
=======
We have designed and implemented a time-efficient distributed modular application for sequence alignment, phylogenetic profiling and clustering of protein sequences, by utilizing the European Grid Infrastructure. Specifically, the application comprises three main components; (a)BLAST alignment, (b) construction of phylogenetic profiles based on the produced alignment scores and (c) clustering of entities using the MCL algorithm. These modules have been selected as they represent a common aspect of a vast majority of bionformatics workflows. It is important to note that the modules can be combined independently, and ultimately provide 4 different modes of operation.

Results
=======
We have evaluated the application through several different scenarios, ranging from targeted investigations of enzymes participating in selected pathways against a custom database to produce functional groups, to large scale comparisons at the pangenome level. In all cases, the optimal utilization of the Grid with regards to the respective modules, allowed us to achieve significant speedup, in the order of 3.5x with respect to traditional approaches.
TOP
B13 - Enabling genotype associations of large insertions and deletions by canonicalizing variants across collections of bacterial strains
Alex Salazar, Broad Institute of MIT and Harvard, United States
Christopher Desjardins, Broad Institute of MIT and Harvard, United States
Ashlee Earl, Broad Institute of MIT and Harvard, United States
Thomas Abeel, Broad Institute of MIT and Harvard, Netherlands
Short Abstract: Variant-focused comparative genomics allows researchers to study the evolution of genetic characteristics in bacterial populations, without the difficulty of whole genome assembly and alignment. Unfortunately, this method has largely been limited to the comparison of small chromosomal variants, such as single nucleotide variants and small insertion and deletions (indels), due to the challenges of predicting larger variants with short-read sequencing technology. Two recent variant callers (Pilon and MindTheGap) that can predict indels greater than 50nt, allow comparative investigations of large variants across multiple genomes. However, because read and assembly quality often vary, predictions of the same large indel in multiple genomes will often differ from sample to sample, making downstream comparative analysis difficult. To address this challenge, we present Emu, an algorithm that can identify alternate representations of the same predicted large indel and canonicalize them to a single representation. In a benchmark analysis, we simulated 200 large indels (ranging from 50nt to 10322nt) across previously published genomes of Mycobacterium tuberculosis (Mtb) and used Pilon and MindTheGap to identify them. We found that both variant callers predicted up to 5751 different representations of the simulated indels and that Emu managed to effectively reduce them by up to 96% with high accuracy. We extended our analysis to two real sequencing data sets that included a previously published collection of 1,017 sequenced genomes of Mtb. By applying Emu to raw variant predictions, the number of unique large variants was reduced by 92%, resulting in a threefold increase of large variants associated with major Mtb phylogenetic
TOP
B14 - Methodological choices in phylogenetic analysis
Kiran Battula, National Institute of Nutrition, India
Short Abstract: With the development of novel technologies, increasing data and the availability of many analysis tools, it becomes necessary to define the choice of methods that are relevant to study goals on testing a specific biological hypothesis. The present study reviews some of the freely available phylogenetic tools implementing different methods to analyze various kinds of genomic data and choice of their use to address specific biological questions.
TOP
B15 - EUPAN:a large-scale pan-genome analysis pipeline for large eukaryotic genomes
chaochun wei, , China
Zhiqiang Hu, Shanghai Jiaot Tong University, China
Short Abstract: Single nucleotide variations (SNVs) are routinely studied to explain the within-species differences. However, recent studies in bacteria, plants, drosophila and human suggest gene presence-absence variation is also of great importance and has unique roles in species differentiation. Thanks to the rapid decrease of sequencing cost, pan-genome analysis, aiming to reveal gene presence-absence variation within a species, became popular recently. However, pan-genome analysis for animals and plants is still limited due to the large sizes and high complexities of their genomes. Existing studies assembled each genome completely, which required high sequencing depths (>100x) and limited number of individual genomes were allowed.
Here we present a pipeline, EUPAN, especially developed for large-scale eukaryotic pan-genome analysis based on a “map-to-pan” strategy. The comprehensive sequence dataset of the species was generated first, and then gene presence-absence was decided for each individuals by mapping raw reads to the comprehensive sequence dataset.
We applied this pipeline to a pan-genome from >400 rice genomes, which to our knowledge, is the largest pan-genome analysis for large genomes to date. Our analysis suggested that within-species or even within-subspecies variation can be well studied with reasonable sequencing depths of ~20x. The phylogenic tree built from distributed/dispensable gene presence/absence was highly consistent with that built from SNPs. To conclude, our pipeline enables large-scale eukaryotic pan-genome analysis at a relatively lower sequencing depth. Moreover, our pipeline can be applied to any large-scale whole genome re-sequencing projects with reasonable sequencing depth.
TOP
B16 - Evolutionary histories of phospho-motifs in the human genome
Shujiro Okuda, Niigata University, Japan
Hisayoshi Yoshizaki, Kanazawa Medical University, Japan
Short Abstract: Protein phosphorylation is a post-translational modification that is essential for a wide range of eukaryotic physiological processes, such as transcription, cytoskeletal regulation, cell metabolism, and signal transduction. Although more than 200,000 phosphorylation sites have been reported in the human genome, the physiological roles of most remain unknown. In this study, we performed the assessment of functional phosphorylation signaling using a comparative genome analysis of phosphorylation motifs. We described the evolutionary patterns of conservation of these and comparative genomic data for 93,101 phosphosites and 1,003,756 potential phosphosites in human phosphomotifs, using 178 phosphomotifs identified in a previous study that occupied 69% of known phosphosites in public databases. From comparative genomic analyses using genomes from nine species from yeast to humans, we describe an overview of the evolutionary patterns of phosphomotif acquisition and indicate the dependence on motif structures. Our characterizations of phosphorylation motif structures and assessments of evolutionary conservation of phosphosites reveal physiological roles of unreported phosphosites. In addition, we show that interactions between protein groups that share motifs are likely to be helpful for inferring kinase-substrate interaction networks. Our computational methods can be used to elucidate the relationships between phosphorylation signaling and cellular functions.
TOP
B17 - Multi­-species integration of miRNA binding site data to improve target prediction accuracy for synaptic miRNAs
Samuel Heron, Edinburgh University,
Ian Simpson, Edinburgh University,
Short Abstract: Micro-RNAs (miRNA) are thought to play an important role in the modulation of synaptic strength during activity dependent synaptic plasticity by regulating protein synthesis. Identifying the target mRNAs of the miRNAs involved is a crucial part of understanding how this regulation takes place, but is confounded by the small size and sequence variability of miRNA binding sites (typically as few as 6-8 nucleotides). Recent studies have attempted to improve the identification of miRNA targets by isolating miRNAs and their mRNA targets as part of the RISC complex using antibodies raised against the Arg2 protein in both human and mice and sequencing the resulting RNA pools. We have taken miRNA binding site predictions from a recent study investigating the role of NMDA receptors in miRNA expression regulation in the rat and filtered them using known binding locations in human and mouse to improve miRNA binding prediction accuracy for functional studies in the rat. We present our semi­-supervised classification procedure to refine predictions based on integrated inter-­species data as well as functional, pathway and disease over-representation analyses to aid biological interpretation.
TOP
B18 - Variational Bayes algorithms for inferring the structure of pre-WGD ancestral genomes
Yoichiro Nakatani, University of Dublin, Trinity College, Ireland
Aoife McLysaght, University of Dublin, Trinity College, Ireland
Short Abstract: Background:
Ohnologs, or genes duplicated by whole-genome duplication (WGD), are often associated with human diseases, and it is therefore important to make a comprehensive catalog of ohnologs. High-confidence identification of ohnologs hinges on synteny analysis and inference of pre- and post-WGD ancestral genome structures, but the ancient timing of teleost and vertebrate WGD events impedes high accuracy inference. Because of this difficulty, previous studies resulted in low-coverage reconstructions, excluding a large part of the human genome with ambiguous synteny.

Methods:
With the aim of explicitly dealing with reconstruction uncertainty, we developed a probabilistic model of macro-synteny conservation and devised algorithms for inferring the structure of pre-WGD genomes. In this model, non-WGD genomes are comprised of conserved synteny blocks, and each block is associated with a distribution over pre-WGD chromosomes. The pre-WGD chromosomes are characterized by distinct gene distributions over post-WGD chromosomes in present-day genomes. Then, those probability distributions determine the likelihood of observed orthologs between non- and post-WGD genomes. Finally, the posterior distributions can be computed from observed ortholog distributions by variational Bayes inference algorithms.

Results:
By applying the macro-synteny model to vertebrate genomes, we obtained a high-coverage reconstruction of pre-WGD ancestral genomes, where regions of modern genomes were assigned to pre-WGD chromosomes with reconstruction probability. The probability represents confidence or uncertainty in reconstruction due to incomplete genome assembly, erroneous ortholog annotation, intensive local rearrangements, etc. This method is expected to perform effectively especially when gene order has been shuffled extensively and macro-synteny has been preserved relatively strongly.
TOP
B19 - Creating a Multi Genome Graph by Minimizing Shannon Information
Leily Rabbani, Max Planck Institute for Developmental Biology, D­72076 Tübingen, Germany, Germany
Short Abstract: The amount of sequence data has increased exponentially during the last decade. This applies especially to genome (re)sequencing data, and the challenge is how to efficiently represent and interrogate such data. To address this challenge, we are developing an algorithm that builds a simplified data structure and returns a graph as a multi-genome reference. It aids in representing genomes that differ over a range of diversities and tackles the bias against using only one reference genome.
A model that is specific for individual sequences is used for training the entire set of genome sequences and obtaining pairwise alignments. It enables the algorithm to capture the data structure and analyzes genomes with a range of differences together. One of the algorithm goals is clustering similar regions of genomes and using the representative of each cluster rather than all its members. It results in removing the unimportant variation and saving a noticeable amount of memory storage. These representative elements will be used later as nodes on the graph. Nodes will be connected by the edges that show their order on the original genome sequence. The graph represents all the data in the way that minimizes Shannon information. Thus, there will be no parameter to adjust.
Using arithmetic encoding, we can apply the multi-genome graph to compress the input sequences. A superior compression performance shows that our model fits sequence properties better than existing DNA compression programs. In addition, it allows for global genome comparison by computing the common information content.
TOP
B20 - A Systematic Approach for the Identification of Gene Losses from Genome Alignments
Virag Sharma, Max Planck Institute of Molecular Cell Biology and Genetics, Germany
Bjoern Langer, Max Planck Institute of Molecular Cell Biology and Genetics, Germany
Leo Foerster, Technical University, Dresden, Germany
Pradeep Kiruvale, Technical University, Dresden, Germany
Anas Elghafari, Technical University, Dresden, Germany
Michael Hiller, Max Planck Institute of Molecular Cell Biology and Genetics, Germany
Short Abstract: Losses of ancestral genes in descendant species are not uncommon and some of them are often associated with altered or completely lost phenotypes in such species. For instance, the loss of the ability to synthesize Vitamin C in certain mammals is attributed to the non-functional Gulo gene in their genomes. We have developed a computational pipeline that systematically searches for gene losses across different species. Given a reference species, a list of ancestral genes and a genome alignment of the reference species with other species, our pipeline is able to identify the different types of gene inactivating mutations that occur in other species’ orthologous genes. We strictly control for assembly gaps, low quality genomic sequences, alignment artifacts and changes in gene structures to avoid mistaking such artifacts for inactivating mutations. Our pipeline is highly specific and sensitive. In order to associate gene losses with phenotypic changes, we ran the pipeline on a multiple genome alignment of 29 species with mouse as the reference and focused on gene losses in mammals which are either completely blind or have reduced vision. Our pipeline reports inactivation of several known and potentially novel genes involved in vision-related functions, eye development and structural components of the eye in these species. Therefore, our pipeline will be a valuable tool for the compilation of a gene-loss catalogue for genomes that will be sequenced in the future. Additionally our pipeline will also provide the basis to systematically link phenotypic changes to genomic changes using approaches like Forward Genomics.
TOP
B22 - Nonribosomal peptide synthetase A-domain – substrate interaction investigation
Candice Ryan, Rhodes University, South Africa
Ozlem Tastan Bishop, Rhodes University, South Africa
Kevin Lobb, Rhodes University, South Africa
Short Abstract: Nonribosomal peptides (NRPs), synthesized by bacterial nonribosomal peptide synthetases (NRPSs), have important properties useful in reaction to plant phytopathogens. Resistance has led to NRPS substrate specificity being investigated to generate novel natural compounds. A-domains of NRPS are responsible for the amino acid activation and substrate specificity and therefore were the focus of the study. Initial investigation was conducted on a few representative NRPSs for which there is a crystal structure. This was then expanded to include a larger subset of sequences known to contain NRPS modules. Substrate specificity of sequences was determined followed by a phylogenetic study to determine the evolutionary relation of the sequences. A docking investigation was conducted on 5 known crystal structures with 39 ligands, consisting of the 20 standard amino acids in both L and D configurations, where available, and a few non-standard amino acids. The ability of the synthetases to take up these substrates was analyzed in silico and residues that were found to interact regularly with differing substrates were considered to be critical in binding. Future work will include in silico mutation of identified interacting residues, homology modeling of the altered structure followed by re-docking to give insight into the activity of these synthetases. Molecular dynamics will be used to obtain accurate binding energies and to probe the longer-term stability of these ligands within the binding sites. These residues could be ultimately modified in vitro to alter substrate uptake by NRPSs and aid in development of novel natural products for use as plant phytopathogens.
TOP
B23 - Functional basis of microorganism classification
Yana Bromberg, Rutgers University, United States
Chengsheng Zhu, Rutgers University, United States
Tom Delmont, Marine Biological Laboratory, United States
Timothy Vogel, Université de Lyon, France
Short Abstract: Correctly identifying nearest “neighbors” of a given microorganism is important in 
industrial and clinical applications, where close relationships imply similar uses and/or 
treatments. Microbial classification based on similarity of physiological and genetic 
organism traits (polyphasic similarity) is experimentally difficult and, arguably, 
subjective. Evolutionary relatedness, inferred from phylogenetic markers, facilitates 
classification but does not guarantee functional identity of the members of the same 
taxa or the lack of similarity between the members of different taxa. Using over thirteen 
hundred sequenced bacterial genomes we built a novel function‐based microorganism 
classification scheme, functional‐repertoire similarity‐based organism network (FuSiON). 
Our scheme is phenetic, based on a network of quantitatively defined organism 
relationships across the known prokaryotic space. It correlates significantly with the 
current taxonomy, but the observed discrepancies reveal and quantify both (1) the 
inconsistency of levels of functional diversity among the different taxa and (2) an 
(unsurprising) bias towards prioritizing, for classification purposes, relatively minor 
organism traits of particular interest to humans. Here we show that our network‐based 
organism classification is more robust in handling organism diversity than the traditional 
pairwise comparison‐based metrics. FuSiON highlights the environmental impact as a 
major driving force of microorganism diversification. Our approach provides a 
complementary view to cladistic assignments and holds important clues for further 
exploration of the microbial lifestyles. FuSiON is a more practical fit for biomedical, 
industrial, and ecological applications, as many of these rely on understanding the 
functional capabilities of the microbes in their environment, and are less concerned with 
phylogenetic descent.
TOP
B24 - Using inter- and intra-species gene copy number variation to identify genes involved in xenobiotic metabolism in parasitic nematodes.
David Curran, University of Calgary, Canada
John Gilleard, University of Calgary, Canada
James Wasmuth, University of Calgary, Canada
Short Abstract: Parasitic nematodes are estimated to infect one third of the human population, causing death and morbidity, while others that infect livestock cause billions of dollars of economic loss annually. They are traditionally controlled using drugs, but resistance has arisen rapidly and has spread across the globe. Studies on the mechanisms of nematode drug resistance have primarily focused on mutations to the drug target, while much less is known about how they metabolize and excrete drugs. In insects, virtually all resistance to DDT is due to an up-regulation in a single cytochrome P450 (cyp), the first step in the drug metabolism pathway (DMP). Historically it has been thought that parasitic nematodes had lost their DMP genes, but recently sequenced genomes challenge this and show the reality is more complex.
We are focusing on six species: four free-living species, the livestock parasite Haemonchus contortus, and the human hookworm Necator americanus. Our goal is to identify the DMP genes in these species and use phylogenomic approaches to distinguish between those that metabolise endogenous and xenobiotic biochemicals. Enzymes with xenobiotic substrates tend to be under distinctive selective pressure; we hypothesize that those genes stable across species will be important for cellular processes, and as such constitute good therapeutic targets, while those exhibiting higher gain/loss or sequence divergence might be important DMP genes, and so play a role in drug resistance. This analysis is applied to all of the relevant DMP gene families: cyp, fmo, dhs, gst, ugt, ssu, and various ABC-transporters.
TOP
B25 - Past their primates : comparative evolutionary genomics of great ape Y chromosomes
Samarth Rangavittal, The Pennsylvania State University , United States
Short Abstract: The male-specific region of Y chromosomes (MSY) of chimpanzee and human have been found to be highly divergent with more than 30% of non-homologous sequences. In contrast, the female genomes of the four sequenced great ape species - human, chimpanzee, gorilla, and orangutan - have diverged from each other by less than 3%. To resolve this dichotomy, the gorilla and orangutan Y chromosomes need to be assembled. This would complement the existing reference sequences for human and chimpanzee euchromatic Y - thus enabling comprehensive great ape MSY comparative analysis.

In this study, we sequenced whole genome amplified flow-sorted DNA from the gorilla and orangutan Y chromosomes with both short-read (Illumina) and long-read (PacBio) technologies. We developed in silico strategies for increasing the amount of available Y chromosomal data, by utilizing differential coverage-based information. The Y-specific sequence data was then assembled with a variety of short read assemblers (SPAdes, DISCOVAR de novo), scaffolding software (SSPACE), hybrid tools (PBJelly, SSPACE-LR) and long-read assemblers (HGAP+Celera, Falcon).

This process led us to obtain novel insights into the optimal recipe for primate Y chromosome assembly, combining Illumina and PacBio technologies. Utilizing the generated draft gorilla and orangutan Y assemblies, we estimate the divergence level, evaluate copy number of ampliconic genes, and detected rearrangements between great ape Y chromosomes. Our results indicate that great ape Y chromosomes are remarkably different in size, repeat content, and gene variation. We also demonstrate the utility of the novel Y chromosome sequences to conservation genetics.
TOP
B26 - Identification of complex functional relations among genes through association rule learning
Riccardo Percudani, Dipartimento di Biochimica, Italy
Nicola Doniselli, Dipartimento di Bioscienze, Università di Parma, Italy
Alessandro Dal Palù, Dipartimento di Matematica, Università di Parma, Italy
Pietro Cravedi , Università di Parma, Italy
Short Abstract: The analysis of correlated and anti-correlated phylogenetic profiles is a widely used method to predict functional associations among genes. As previously pointed out, these methods corresponds to pairwise logic implications between genes of the type A → B or A → not B, respectively, i.e. the presence of gene A in a genome implies the presence of B, or the presence of gene A implies the absence of B. However, for the presence of analogous proteins, alternative pathways, and pathway branching points, the evolutionary and functional associations among genes can be far more complex than pairwise relations (1). No general algorithms have been implemented for the discovery of complex logic implications involving three or more genes. Here we describe the adaptation of the widely used Apriori algorithm of association rule learning (2,3) to the identification of complex patterns of gene presence/absence across genomes. As a proof of principle of the application of this data mining technique, we report on the identification of a missing gene in purine catabolism through the identification of an association rule relating six different genes involved in the pathway.

1. Bowers, et. al. Science (80). 306, 2246–2249 (2004).
2. Agrawal, R. & Srikant, R. in Proc. 20th Int. Conf. Very Large Data Bases 487–499 (1994).
3. Borgelt, C. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2, 437–456 (2012).
TOP
B27 - On the limits of computational functional genomics for bacterial lifestyle prediction
Eudes Vieira Barbosa, University of Southern Denmark, Denmark
Richard Röttger, University of Southern Denmark, Denmark
Anne-Christin Hauschild, Max Planck Institute, Denmark
Vasco Azevedo, Federal University of Minas Gerais, Brazil
Jan Baumbach, University of Southern Denmark, Denmark
Short Abstract: As actinobacteria occupy various niches in diverse habitats, one may assume the existence of lifestyle-specific genomic features. We analyzed 240 actinobacteria classified into four pathogenicity classes: human pathogens, broad-spectrum pathogens, opportunistic pathogens, and non-pathogenic. To identify lifestyle-specific gene signatures, we combined evolutionary sequence analysis approaches with statistical learning methods (Random Forest with feature selection, model tuning and robustness analysis). In summary, we show that we indeed find signature genes that differentiate pathogens from non-pathogens. When trying to classify the different pathogenicity lifestyles though, it appears that too many confounding factors unbalance our data sets such that we cannot differentiate, for instance, a strain-specific from a lifestyle-specific gene. In conclusion, we illustrate that even in the post-genome era and despite next-generation sequencing technology our ability to efficiently deduce real-world conclusions, such as pathogenicity classification, remains quite limited.
TOP
B28 - Understanding operon evolution using an event-driven model and phylogenetic visualizatons
Iddo Friedberg, Miami University, United States
David Ream, Miami University, United States
Asma Bankapur, Miami University, United States
Short Abstract: Gene blocks are genes co-located on the chromosome. In many cases, genes blocks are conserved between bacterial species, sometimes as operons, when genes are co-transcribed. The conservation is rarely absolute: gene loss, gain, duplication,
block splitting, and block fusion are frequently observed. An open question in bacterial molecular evolution is that of the formation and breakup of gene blocks, for which several models have been proposed. These models, however, are not generally applicable to all types of gene blocks, and consequently cannot be used to broadly compare and study gene block evolution. To address this problem we introduce an event-based
method for tracking gene block evolution in bacteria.

In my talk will explain this method, and demonstrate a new visualization technique we call phylomatrices. I will show how we can easily gauge operon conservation, and discover interesting clade-based aberrations as well as horizontal gene transfers.
TOP

View Posters By Category

Search Posters:


TOP