Posters

Poster numbers will be assigned May 30th.
If you can not find your poster below that probably means you have not yet confirmed you will be attending ISMB/ECCB 2015. To confirm your poster find the poster acceptence email there will be a confirmation link. Click on it and follow the instructions.

If you need further assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category C - 'Education'
C01 - Identification of small RNA pathway genes using patterns of phylogenetic conservation and divergence.
Short Abstract: Small RNAs such as microRNAs and small interfering RNAs (siRNAs) require protein cofactors to promote their biogenesis and mediate their silencing functions. Even though small RNA pathways are widely distributed among animal, plant, fungal, and protist phyla, these pathways diverge or are lost in particular taxonomic clades. We used phylogenetic conservation patterns to identify new small RNA cofactor genes. We compared 86 divergent eukaryotic genome sequences to discern the sets of genes that show similar phylogenetic profiles with known small RNA cofactor genes. The top predictions from this phylogenetic screen were tested for defects in RNA interference and a large fraction of the candidate genes showed defects as strong as validated small RNA cofactor genes, revealing new components in the pathway. RNA splicing components were the most enriched class of new small RNA cofactors identified, suggesting a deep connection between the mechanism of RNA splicing and small RNA-mediated gene silencing.
TOP
C02 - Isolation and Cloning of PAP-I Encoding The Anti-viral Protein From Pokeweed
Short Abstract: Pokeweed antiviral protein (PAP) isolated from Phytolacca americana and P. acinosa plants, inhibits translation by catalytically removing a specific adenine residue from the large rRNA of the 60s subunit of eukaryotic ribosomes. The aim from study compares genomic structure of PAP-I and analysis it molecular relationship with other ribosome-inactivating proteins (RIPs) in GenBank. Total DNA was extracted from the late summer leaves of P. americana using gene specific primer. A polymerase chain reaction (PCR) product of 868 base pair was selected based on the size of P. americana protein (PAP-I). After elution the product was purified and cloned into pTZ57R/T vector and mobilized into E. coli strain DH5α and sequenced. DNA sequence analyses of PAP-I showed nucleotide and amino acid homology up to 98-82% and 94-26%, respectively with the sequences of other RIPs in GenBank. A phylogenetic analysis confirm that PAP-I under study belong to one chain Type-I RIP (PAP-I).
TOP
C03 - Emerging methods in protein co‐evolution
Short Abstract: Co‐evolution is an essential component of evolution that contributes to maintain the structure of ecological and molecular networks while allowing species, proteins and genes to change and adapt over time. A wide range of co‐evolution‐inspired computational methods has been designed for: protein modeling, detection of binding sites, deciphering protein mechanisms of action, prediction of protein–protein interaction partners and reconstruction of protein complexes and interaction networks. Interestingly, recent important breakthroughs in the field have resulted in a remarkable improved capacity to predict interactions between proteins, and contacts between different protein residues. While co‐evolution‐based approaches have been developed independently over the last several decades, we propose that unification under a common framework would be a major step forward in the understanding of the molecular basis of co‐evolution.
TOP
C04 - Comparative proteomics reveals a significant bias toward alternative protein isoforms with conserved structure and function
Short Abstract: As part of a comprehensive analysis of experimental spectra from two large publicly available mass spectrometry databases we provide a detailed overview of the population of alternatively spliced protein isoforms detectable by peptide identification methods. We found that 150 genes expressed multiple alternative protein isoforms. This constitutes the largest set of reliably confirmed alternatively spliced proteins yet discovered.

Alternative isoforms generated from interchangeable homologous exons and from short indels were significantly enriched, both in human experiments and parallel analyses of mouse and Drosophila proteomics experiments. Our results show that a surprisingly high proportion (25%) of the detected alternative isoforms are only subtly different from their constitutive counterparts.
The evidence of a strong bias towards subtle differences in coding sequence and likely conserved cellular function and structure is remarkable and strongly suggests that the translation of alternative transcripts may be subject to selective constraints.
TOP
C05 - Determinants of protein evolutionary rates in light of ENCODE functional genomics
Short Abstract: The aim is to understand how the complex anatomy and developmental processes of animal influence the evolution of protein-coding genes.

The influence of different parameters, from gene size to expression levels, on the evolution of proteins has been previously studied in yeast, Drosophila and mammals. Here we investigate these relations further, especially taking in account gene expression and chromatin organization in different organs and different developmental stages. For expression we used a microarray experiment over zebrafish development as well as the RNA-seq data from ENCODE for 22 different tissues of mouse. We also used chromatin accessibility in mouse tissues, and we use ENCODE data to define which transcript is used as reference to compute gene length, intron number, etc. We find strong differences between tissues or developmental stages in impact of expression on evolutionary rate. Over all tissues, an interesting result is that evolutionary rate is better correlated with maximal expression in one tissue then with average expression value over all tissues.
TOP
C06 - Integration of NGS Data with LC-MS Proteomic Data in Microbial Studies
Short Abstract: Next Generation Sequencing (NGS) platforms offer a rapid way in which to sequence the genomes of microbial pathogens. Important for science agencies in microbial trace back studies, and from a global perspective with regard to understanding transmission of communicable pathogens, both clinically and in regulatory food safety. Harnessing NGS data and mass spectroscopy (MS) for the detection of expressed of proteins, offers a powerful set of tools to understand the aetiology of pathogens in relation to public health.
We have developed novel bioinformatics methods to utilize the hypothesis driven mechanism of NGS data; compared against MS protein masses for the presence/absence of novel genes. Developed methods for screening nonsynonymous SNPs that can be compared to protein mass shifts and detect the presence of protein variants. Implemented whole genomic analysis of bacteria through constructing phylogenies of genomic core genes to predict nonsynonymous SNPs, and also the presence of novel genes from genomic analysis. Data has been compared to unique protein MS data to correlate gene expression. Expressed proteins from MS data have been identified through an in-house developed mass/retention time binning algorithm, and correlated to the presence of novel genes and gene variants, which are predicted from comparative genomics of NGS data.
Developing analytical methods using NGS genomic data as a hypothesis driven mechanism enables data integration across technologies such as proteomic data. Enabling better biological understanding, an important toolset, and the future potential to compare heterogeneous datasets from a number of sources, such as transcriptomics and metabolic data.
TOP
C07 - Deep sequencing survey of alternative splicing in human and mouse
Short Abstract: We perform comparative analysis of alternative splicing by using RNA-seq data in 16 human cell lines and 30 mouse tissues. Approximately 75% of internal exons of human protein-coding genes have orthologs in mouse and vice versa. In the annotation alone there is evidence for conservation of alternative splicing: exons that are annotated alternative in one species tend to be annotated alternative in the other species. There is a negative correlation between the strengths of the consensus sequences at a splice junction and the variability of splicing ratio (percent-spliced-in,PSI) as measured by RNA-seq. This correlation is more prominent for donor sites compared to acceptor sites because the latter have more degenerate consensus sequences. Despite having biologically different samples the variability of splice junction usage is to large extent conserved between human and mouse. At the same time, splice junctions that are annotated as alternative are used more variably in the cognate species than in non-cognate species. There is evidence for conservation of association between splicing factor expression levels and splicing ratios of alternative splice junctions. At the same time, splicing regulatory networks inferred from this associations have different modular structure presumably reflecting different activated pathways in tissues and cell lines. In spite of different annotation depths, the relative contributions of major types of elementary splicing events in human and mouse are very similar. Of these, the most frequent in human are exon skipping (50%), alternative acceptor site usage (13%), alternative donor site usage (9%), multiple-exon-skipping (6%), and intron retention (5%).
TOP
C08 - Comparison of D. melanogaster and C. elegans Developmental Stages by modENCODE RNA-Seq data
Short Abstract: Drosophila melanogaster and Caenorhabditis elegans are two well-studied model organisms in developmental biology. Their morphological development differ greatly, yet we postulated that there may nonetheless be underlying shared developmental programs employing orthologous genes. We used modENCODE RNA-Seq data to perform a transcriptome-wide comparison of their developmental time courses to address this question. Our approach centers on using stage-associated orthologous genes to link the two organisms. For every stage in each organism, we select stage-associated genes which are defined as relatively highly expressed at that stage compared with others. We tested the dependence of a pair of D. melanogaster and C. elegans stages in terms of orthologous gene expression—the number of orthologous gene pairs associated with both stages.
We first carried out the test on pairs of stages within D. melanogaster and C. elegans respectively, and we found that temporally adjacent stages in both species exhibit high dependence in gene expression, supporting the validity of this approach. When comparing fly with worm, we observed a strong colinearity of their developmental time courses from early embryos to late larvae. Another parallel collinear pattern is found between fly white prepupae through adults and worm late embryos through adults. Investigating stage-associated genes overlapped between stages shows that many- to-one fly-worm orthologs are key factors leading to the two collinear patterns. Some orthologs are known to play similar roles in both organisms, and their mapping in this study may help inform their functions in the development of D. melanogaster and C. elegans.
TOP
C09 - Harnessing related species and samples data to create and optimise a draft genome sequence for Leishmania colombiensis
Short Abstract: Advances in DNA sequencing technologies require genome assembly strategies that exploit related species and sample data to develop and improve new draft sequences. The majority of current DNA sequencing methods create large volumes of short overlapping sequences: for de novo assembly, these require either long DNA sequences, or guidance (contiguation) using related species information. Contiguation uses related genomes to align, order and orient genome assemblies in development: here we apply this approach to a paired-end short read library to create a draft genome sequence for Leishmania colombiensis. Initial de novo assemblies and sequence alignments with other Leishmania indicated that this new species was genetically distinct, and was approximately equidistant from L. tarauntolae and the L. braziliensis complex. The Leishmania colombiensis genome sequence was sampled from a dog in Colombia (MCAN/CO/1986/CL223) and was improved with a sample from the same species (MCAN/CO/1985/CL085). The L. tarauntolae and braziliensis genome sequences as well as a de novo assembly of CL085 contiguated the CL223 de novo assembly. The addition of a sample from within the species identified chromosomal differences unique to L. colombiensis that would have been incorrectly aligned using related species data only. This efficient scheme produces draft genome sequences that reflect more accurately intra-species variation and also allows the improved comparison of nucleotide and structural diversity between species. Discovering the genetic basis for phenotypic variability is crucial for tackling the acute global disease burden caused by leishmaniasis.
TOP
C10 - The genome and transcriptome of Pyronema confluens: a window into fungal evolution
Short Abstract: In the last decade, genomes of many filamentous ascomycetes have been sequenced and are invaluable for the analysis of the evolution of species and for understanding their physiological and morphological properties. However, while there are at least ten genome sequences available for each of the more derived groups of filamentous ascomycetes, only one genome from the basal group of Pezizomycetes has been sequenced, namely that the black truffle, a fungus with a specialized life-style and fruiting body. Therefore, we sequenced the genome and transcriptome of the Pezizomycete Pyronema confluens, a saprobe with typical apothecia as fruiting bodies. The genome was assembled from a combination of Roche/454 and Illumina/Solexa reads. It has a size of 50 Mb, and a predicted 13369 protein-coding genes. The intron content is higher than in most derived ascomycetes, but lower than in the truffle genome, confirming a tendency towards higher intron content in the basal fungal lineages. There is no macrosynteny with other sequenced fungal genomes; however, there is significant microsynteny with the truffle genome. We analyzed expression levels by RNA-seq for genes with different degrees of evolutionary conservation to find out if genes with different lineage-specificities are preferentially expressed under sexual development versus vegetative growth. Interestingly, the highest percentage of genes upregulated during sexual development is found among the P. confluens orphan genes. This might indicate that, similar to the situation in animals, genes associated with sexual reproduction evolve more rapidly than genes with other functions.
TOP
C11 - The evolution of cis-regulatory features in yeast de novo genes
Short Abstract: De novo genes are new genes that originate from non-coding DNA rather than being duplicated from parent genes. Their short evolution time and lack of parent genes provide a chance to understand the evolution of cis-regulatory elements in the initial stage of gene emergence. Although a few reports have discussed cis-regulatory elements in new genes, knowledge of the characteristics of the elements in de novo genes is lacking. We conducted a comprehensive investigation to depict the emergence and establishment of cis-regulatory elements in de novo yeast genes. We found that the number of transcription factor binding sites (TFBSs) in de novo genes rapidly increased and became comparable to the number in established genes. This phenomenon might have resulted from certain characteristics of de novo genes, namely, relatively frequent TFBS gain events, an unexpectedly high number of preexisting TFBSs, and experiencing lower selection pressure in the promoter regions of these genes. Furthermore, the differences of promoter architecture between de novo genes and duplicated new genes suggest that new genes of different origins might employ distinct regulatory strategies. Finally, our functional analyses inferred that de novo genes might be related to reproduction. Our observations suggest that de novo genes and duplicated new genes possess mutually distinct regulatory characteristics, and might play different roles in evolution.
TOP
C12 - Quantitative evolution of alternative splicing in closely related Drosophila species
Short Abstract: At least 20% of Drosophila genes are alternatively spliced. The patterns of evolution of alternatively spliced genes in mammals and insects differ, and alternative splicing in insects is less studied than in mammals.
We used transcriptomes of D. melanogaster, D. simulans, and D. yakuba (Illumina 36 bp + 36 bp paired reads, six replicates for each species) to study changes of the expression level of whole genes and changes of inclusion level of alternatively spliced segments. More than 139 million reads were mapped to the reference genomes. 45% of genes changed their expression since the branching of the D. yakuba clade from the common ancestor of D. simulans and D. melanogaster. This list is enriched in genes involved in alternative splicing, post-translational modification of proteins, signal transduction, transmembrane transport, and perception of visual light and eye development. Among alternatively spliced segments not shorter than 30 codons and good coverage in the three species, 21% had significantly different inclusion levels in the considered species. The rate of nonsynonymous substitutions was higher in alternative segments with changed inclusion level than in ones with conserved inclusion level. There were no significant differences in synonymous substitution rates between these classes of alternative segments.
Evolution of vision played an important role in the recent evolution of Drosophila spp. In Drosophila, changes of inclusion level of alternative segments are correlated with higher rate of amino acid changes in their coding sequence, but not with the changes of rate of synonymous substitutions.
TOP
C13 - Are changes in lifestyle associated with bursts of HGT in proteobacteria?
Short Abstract: Their genetic configuration enables the strains of the Enterobacteriaceae family to not only survive in extreme environments, but also in freshwater and soil. Conducting a life as either commensals or pathogens of mammals, they can adapt to changing living conditions. Lateral gene transfer (LGT) has occurred often in the evolution of
prokaryotes and is assumed to spread newly acquired skills between different bacterial species. Evidence suggests that bacteria can adapt to new or changed environments by acquiring external genes, a mechanism that is mediated by lateral gene transfer. This suggests that changes in lifestyle may coincide with bursts of LGT events.

To test this assumption, we compiled data for several lifestyle traits
across the Escherichia coli, Shigella and Salmonella strains in our data
set. We obtained orthologous groups for all proteins using the program
ProteinOrtho and built phyletic patterns based on gene presence and
absence. From them, we reconstructed ancestral lifestyles with the
GLOOME Gain Loss Mapping Engine under a Maximum Likelihood framework. We
also identified putative lateral gene transfers in the same species. We
then compared the distribution of lateral gene transfer events to the
distribution of lifestyle changes across the branches of the phylogeny.
Our fully resolved phylogeny encompasses 25 Salmonella and 61
Escherichia coli/Shigella strains.
TOP
C14 - Finding alternative splicing events in human mesenchymal cells using ESLiM new algorithm
Short Abstract: Alternative splicing processes transform pre-mRNAs into mature mRNAs throughout inclusion/exclusion of different exonic regions, generating different isoforms of a protein. Consequently, the same gene produces a variable number of transcripts that considerably amplify the proteome magnitude (Graveley et al., 2001). Also, organisms might express different isoforms in a tissue type specific manner. Even more, it is considered that splicing of mRNAs appeared in evolution for the onset of multicellular organisms (Irimia et al., 2007).
However, the study of alternative splicing events has been hazardous for the complexity of its combinatorial nature and the technical limitations of available experiments. Affymetrix Human Exon microarrays appeared as a solution platform, designed with a wide spanning mapping over known coding regions. Nevertheless, the dynamic regulation of specific spliced variants in human cells has not been much explored. Such study has been tackled here, comprising a highly plastic cell population known as mesenchymal stromal cells (MSCs), using Exon Arrays as basal technology. These cells, extensively delivered in cellular therapies for their great tissue regenerative and immunomodulatory capabilities, can exert their adaptive phenotypes through differentially spliced transcripts. We have applied an ad-hoc R-developed algorithm called ESLiM, over a set of 15 samples from separate tissues. Based on linear regression models, ESLiM accurately assesses each exon inclusion onto the final transcript per analysed tissue, using a robust abundance measure of the whole gene as a baseline that avoids undesired probe effect biases. A list of significant exon:gene pairs have been identified for hMSCs.
TOP
C15 - Rate of Operon Evolution in Proteobacteria
Short Abstract: How operons evolve in an open question in genome evolution, and several models have been proposed to explain observed evolutionary patterns. However, the community lacks a method to describe operon evolution, which is necessary for a large-scale examination of the forces shaping this genomic feature. We propose a computational method to classify operons by their evolutionary trajectory. This method will enable us to examine how operons evolve on a case-by-case basis, and classify their evolutionary parameters.

We propose that the construction and/or destruction of clusters of co-transcribed genes can be described as a sequential series of defined events. By examining these events we can employ statistical learning to classify evolutionary paths of operons, and connect these paths with biological function. At the most basic level, clusters of genes can undergo a few basic operations. Clusters can be broken into smaller groups, and constituent genes can be duplicated, deleted, rearranged, or fused.

Using a set of 36 proteobacteria species and 46 different experimentally verified E. coli operons, containing at least five genes, we have examined the frequency of events for the study of operon evolution. Event tracking has allowed us to compare the relative rate of evolution against evolutionary time. Some operons appear to have consistently high or low frequency of events making them fast or slow evolving, respectively. Slow evolvers comprise essential gene complexes, such as ATPsynthase and components of the transcriptome. Fast evolving operons tend to be non-essential, like the utilization of alternate catabolites and certain transporters.
TOP
C16 - Multiple genome comparison based on overlap regions of pairwise local alignments
Short Abstract: Comparative approaches are an important source of information in the analysis of newly sequenced genomes. On the level of genes, the use of reciprocal BLAST hits is the most widely accepted approach used in gene annotation and the inference of homologies. However, it is a notoriously slow process, especially when it comes to all-against-all comparisons of a large number of genomes.

Recently, Mancheron et al. [1] pointed out that for many goals in multiple genome comparison all-against-all BLAST comparisons are too involved. Instead they suggest to identify regions of strong overlaps in a set of pairwise local alignments between one target genome and any number of reference genomes.
A strong overlap is a region in the target genome that aligns to at least one segment in every reference genome. Any such region not contained in a bigger one is called a maximum overlapping interval (MOI).

In [2] we re-visit the above problem and introduce a series of new algorithms for MOI computation that improve the asymptotic time complexity as well as the practical performance over the approach of Mancheron et al. We also study several generalizations of the problem including the identification of MOIs that align only to some of the specified reference genomes.


[1] Mancheron A., Uricaru R., Rivals E.: An alternative approach to multiple genome comparison. Nucleic Acids Res. 2011, 39:e101.
[2] Jahn K., Sudek H., Stoye J.: Multiple genome comparison based on overlap regions of pairwise local alignments. BMC Bioinformatics 13(Suppl. 19): S7, 2012
TOP
C17 - Reconstruction of Tree-child Phylogenetic Networks from Softwired Clusters
Short Abstract: Phylogenetic networks are models of evolutionary histories that allow for the representation of reticulate evolutionary events like recombinations, hybridizations, or horizontal gene transfers, where one species does not derive directly from a single species, but from the interaction of several (usually, two) species.

In phylogenetic analysis, it is common to compute phylogenetic trees from more than one dataset. For example, a phylogenetic tree can be constructed for each gene separately, or several phylogenetic trees can be constructed using different methods. To better accurately reconstruct the evolutionary history of all considered taxa, to take into account the set C of all clusters represented by at least one of these phylogenetic trees. In general, however, some of the clusters of the different trees will be incompatible, which means that there will be no single phylogenetic tree representing C. Therefore, several recent publications have studied the construction of a phylogenetic network representing C in the softwired sense, that is, in such a way that C is the set of clusters of all phylogenetic trees embedded in the phylogenetic network.

In this work we solve the following problem: given a set of clusters C, is there any tree-child phylogenetic network N such that its set of softwired clusters is C? We present an efficient algorithm to obtain, when any exists, all tree-child phylogenetic networks such that C is its set of softwired clusters.
TOP
C18 - Identifying pathogenic strains of Escherichia based on local and global amino acid compositional genome signatures
Short Abstract: Genome-wide compositional properties of bacterial genomes have been already demonstrated to be adequate tools for classifying bacterial organisms. Information such as codon and oligonucleotide usage was observed as species-dependent, while global amino acid composition was utilized as the means to identify thermophilic species. Furthermore, compositional features of individual genes or genomic regions have served as criteria for identifying putative horizontal gene transfer events in bacterial species.

In this work, we investigate whether compositional features of protein sequences encoded in the complete genomic sequences of 25 species/strains belonging to the proteobacterial genus Escherichia, can be used to accurately discriminate between pathogenic and non-pathogenic strains of those bacteria. Importantly, a commonly used phylogeny-based approach (based on 16S rDNA sequences) fails to discriminate species/strains with respect to pathogenicity. We construct local (based on local compositional bias as detected by CAST) and global (based on amino acid composition) signatures computed from complete Escherichia genomes, which are provided as input to supervised and unsupervised machine learning methods. Our results indicate that both signature types contain relevant information and are able to accurately discriminate species/strains according to their pathogenicity.

Moreover, by simulating random partial collections of these genomes we assess the minimum number of genes/features needed to accurately predict the pathogenicity of Escherichia species, thus establishing a framework for predicting pathogenicity even with incomplete genomes as input, as for example in metagenomics studies.
TOP
C19 - DeEP: Deconstructive Evolutionary Pipeline
Short Abstract: Species in the genus Xanthomonas are mostly plant pathogens bacteria. They are capable of infecting more than 100 plant species with each pathovar infecting a specific host and having a defined life style within the host. Recent studies have found tendencies of gene losses and gains in this genus, yet . To date, these gain and loss events have not been linked to-date with pathogenic behavior, virulence or host specificity. We developed a comparative genomics pipeline that can lead to the identification of candidate genes responsible for certain pathogenic characteristics. The pipeline named DeEP (Deconstructive Evolutionary Pipeline) answers two main questions: what genes were most likely gained and or lost in the evolutionary history of this genus? And how are they correlated with the characteristics of interest?. In order to accomplish this, DeEP has two steps. The first step is to detect evidence of micro-evolutionary events to make an Ancestral Reconstruction of the genus. This allows one to find gained and lost genes, by performing a statistical analysis of a continuous-time Markov chain with five states, including the presences, the absences, the duplications, and the evidence of other micro-evolutionary events (Lateral Gene Transfer, and Homologous Recombination). The second step uses a Phylogenetically Independent Contrasts (PIC) to correlate the gained and lost genes with pathogenic characteristics such as host-specificity and lifestyle within the plant. The genes and evolutionary events identified by resulting from the pipeline will help you develop hypotheses that can be experimentally tested.
TOP
C20 - Beyond 1:1 Orthology Assertions
Short Abstract: The Mouse Genome Informatics (MGI) resource, the model organism database for the laboratory mouse, has for 20 years curated 1:1 orthology assertions between mouse, human, and rat protein-coding genes in collaboration with the human and rat genome annotation teams and gene nomenclature committees. With comprehensively sequenced genomes available for comparative analysis, phylogenetic analysis clearly identifies cases where descent from common ancestor does not always define a 1:1 relationship, but rather that gene duplication following an ancestral speciation event more correctly results in M:N gene sets. This has implications for the study of human biology in the mouse system and for the presentation of inferential functional and disease assertions. MGI has restructured its database to accommodate such homology classes with concurrent changes in presentation of data and in representation of human diseases associated with mouse genes by curation of comparative or experimental data. We incorporate all the homology classes for mammalian species mouse, human, rat, chimp, cattle, dog and rhesus macaque, and will next extend our data to include chicken and zebrafish protein-coding gene classes. While 1:1 assertions predominate (>90%) we now more clearly represent cases such as the Serpina1 gene class (1 human, 5 mouse, 1 rat), and provide better cross-referencing among related genes, the diseases that have been studied in respect to those genes, and the relationship between genomic features in related genomes. I will present the work process and data summary for this project. This work is supported by NIH NHGRI grant HG000330. http://www.informatics.jax.org
TOP
C21 - Scythe - Selection of Conserved Transcripts by Homology Evaluation
Short Abstract: Genomic resources for comparative analyses between species are growing rapidly. In addition, high throughput sequencing of transcriptomes has greatly increased the number of annotated transcripts. However, not all genomes are equally well explored and may vary drastically regarding quality of gene model annotation. It is common practice to use the longest splice variant of a gene as its representative form, but this is not always the most similar form between species. Choosing non-matching isoforms can severely affect downstream analyses and may result in artefacts, leading to problems during multiple sequence alignment, phylogeny reconstruction, and potentially false positives during detection of selective pressures.
Scythe (Selection of Conserved Transcripts by Homology Evaluation) selects the best fitting gene models for orthologous genes between two or more species via a global alignment method. The algorithm starts out with the best matching pair of sequences between species for a gene locus. From the remaining species in the pool, it then iteratively adds sequences that are a best match for any of the already selected gene models.
The algorithm is implemented in Python. So far, Scythe supports input files in gff3 (Generic Feature Format Version 3) and tab-separated files as available from the ENSEMBL database to establish locus-gene model identifier relations and input from popular clustering tools like orthoMCL or Proteinortho as well as data from ENSEMBL for ortholog relations.
Currently, we are applying this algorithm for analyses of genomes in the family of true grasses (Poaceae) as provided by Phytozome.
TOP
C22 - Comparative genomics and transcriptome structure of the psychrotrophic bacteria Exiguobacterium antarcticum B7
Short Abstract: The bacterium Exiguobacterium antarcticum is a extremophile able to survive in temperatures range of -3°C to 42°C. Until the moment, three isolates were recorded, two of biofilms of Antarctic lakes and one of cold desert in India. Micro-organisms that grow and survive in the Antarctic environment are exposed to high UV radiation, variations in its thermal regime, water, salinity and low nutrient levels. Thus, the elucidation of the genomic attributes of this organism is extremely important to know the genic repertoire related to adaptation to extreme environments. In this study, we conducted a comparative analysis of the E. antarcticum B7 and two others species of the genus Exiguobacterium, as well as the analysis of the structural transcriptome for validation and correction of sRNA annotation. The functional annotation of the genome the E. antarcticum B7 identified 2772 CDS and 760 among these are proteins without known function. The three species share a high number of genes (2029) and the genes related to response to heat stress, osmotic, oxidative and DNA repair are more conserved with homologs of E. sibiricum 255-15 than with thermophilic bacterium Exiguobacterium AT1b. In transcriptome analyzes we identified the expression of 51 sRNA, 4 false-positive genes that were deleted and 5 annotated because they were not predicted in automatic annotation. The de novo assembly identified 6 new transcripts not represented in the genomic sequence, demonstrating that the RNA-Seq technique besides being an important tool for correction of annotation can be also used to refine the genome assembly.
TOP
C23 - KOMODO: a web tool for detecting and visualizing biased distribution of groups of homologous genes in monophyletic taxa
Short Abstract: The enrichment analysis is nowadays a standard procedure to interpret ‘omics’ experiments that generate large gene lists as outputs, such as transcriptomics and protemics. However, despite the success of enrichment analysis use in these classes of experiments, there is a surprising lack of application of this methodology to survey other categories of large-scale biological data available, such as annotated genomes to a common controlled vocabulary. To allow scientists to start exploring this data space we developed Kegg Orthology enrichMent-Online DetectiOn (KOMODO), a web tool that surveys groups of taxa and detects significantly enriched groups of homologous genes in one taxon when compared with another. The results are displayed in their proper biochemical roles in a visual, explorative way, allowing users to easily formulate and investigate biological hypotheses regarding the taxonomical distribution of homologous genes. We validated KOMODO by analyzing portions of central carbon metabolism in two taxa extensively studied regarding their carbon metabolism profile (Enterobacteriaceae and Lactobacillales). We detected several enzymatic activities significantly enriched related to known key metabolic traits in these taxa, such as the distinct fates of pyruvate (lactate production in Lactobacillales and its complete oxidation in Enterobacteriaceae), demonstrating that KOMODO could detect biologically meaningful differences in the frequencies of shared genomic elements among taxa. We are now using KOMODO to select homologous enriched in fungal genomes when compared with Eukarya as potential targets for therapeutics with interesting in silico results. KOMODO was published at Nucleic Acids Research Web Server Issue and is freely available at http://komodotool.org.
TOP
C24 - Functional gene network prediction based on conservation of gene expression patterns
Short Abstract: Currently sequenced genome is increasing and functions of half of all genes are unknown. A functional gene relationship is effective to understand gene functions because higher biological functions are came from genetic interactions. For example, comprehensive gene expression data enable computational prediction of gene regulation network. However, gene relationship generated from genome wide data tend noisy. Another effective information is sequence conservation over species. This information is one of most powerful but the function of orthologous gene must be known. We combined conservation and gene expression pattern into prediction of gene expression-based functional gene network to make a prediction more reliable and more comprehensive. When a functional gene relationship is conserved between two species, their relationships are detected in both species by genome wide, noisy data. We propose new quantitative definition of gene relationship conservation.
Here we show the more reliable functional gene network based on conservation of gene relationship. COXPRESdb is used as functional gene relationship database in single species based on gene expression patterns. We compared between human and mouse. As the result of introducing conservation, only reproductive relationships are used and noisy relationships are omitted. We applied community detection algorithm to the network to understand its structure and reduce noise. Since a community member genes of the network are related each other, the community has some functions. We succeeded to detect 90 highly enriched functional gene communities such as ribosomal proteins, immune system and cell cycle genes without any prior knowledge.
TOP
C25 - The genomes of four tapeworm species reveal adaptations to parasitism
Short Abstract: Tapeworms (Cestoda) is the only one of the major group of human helminth parasites for which there has been no genome sequence available. Here we present the first genome sequences of four species of tapeworms; Echinococcus multilocularis (the fox tapeworm), E. granulosus (the hydatid tapeworm), Taenia solium (the pork tapeworm) and Hymenolepis microstoma (the rodent tapeworm). The genome of E. multilocularis is highly finished, with 89% of the sequence contained in 9 chromosome scaffolds, and more than half of the ~10.200 gene models are manually curated. We find extreme losses of genes and pathways that are ubiquitous in other animals, including 34 homeobox families, several determinants of stem cell fate, immune system kinases and metabolic enzymes. RNA-seq reveals 308 polycistonic transcripts and splice-leader trans-splicing for 13% of E. multilocularis genes. We identify potential drug targets, including some with existing pharmaceutical compounds, providing a rich resource for drug development.
TOP
C26 - Fractionation of flowering plant genomes
Short Abstract: Evolutionary patterns throughout the 150 million years of flowering plant history are strikingly different from those prevalent in the mammals, insects and other domains. In flowering plants whole genome duplication (WGD) followed by the massive loss of duplicate genes (fractionation) occur cyclically in all lineages. WGD and fractionation greatly complicate the task of reconstructing the evolution of these plants, especially the evolution of gene content and gene order. Analysis of gene order rearrangement based on inversion, transposition and reciprocal translocation will mistakenly interpret the fractionated genome as resulting from many more reciprocal translocations than actually occurred. Even more important than this is the difficulty fractionation causes in the algorithmic inference of the ancestral genomes of WGD descendants. The wholesale disruption of adjacencies by fractionation drastically cuts down on the length of chromosomal fragments that can be reconstructed on the basis of common adjacencies in two or more genomes or `"subgenomes" of a WGD descendant. We have been developing a comprehensive suite of "consolidation algorithms" to analyze plant genomes according to models where fractionation plays as important a role as rearrangement, rather than as an annoying source of error. We apply this to the genomic sequences of flowering plants that have recently become available.
TOP
C27 - Analysis of extended kinome of zebrafish with immensely over represented PIM kinase subfamily
Short Abstract: In recent times, zebrafish has garnered popularity as a model organism to study human cancers. Present study was aimed at function annotation of zebrafish proteome to provide insights into components of various cellular processes, which may shade light on the mechanism of tumor development. Annotation using highly sensitive remote homology detection methods revealed “substantial expansion” of the protein kinase family in zebrafish compared to humans, which constitutes over 3% of the entire proteome with 1200 members. Subsequent classification of kinases into subfamilies indicated presence of large number of CAMK kinases, occupying about one third of the kinome. This increase in CAMK subfamily was attributed to massive increase in group of PIM kinases playing crucial roles in cell cycle regulation and growth. Identities of the PIM kinases were further ascertained by use of a two-way BLAST search using SWISS-PROT database containing well annotated PIM kinases. Extensive sequence comparison between human and zebrafish PIM kinases revealed high conservation of the residues that maintain the enzyme in constitutively active conformation. Substrate binding sites were found to be poorly conserved indicating alternate substrate binding modes in zebrafish. Plethora of PIM kinase substrates were detected in zebrafish, equilibrating the increase in enzymes. Regulatory elements in 3’UTRs were found to be conserved in 10% of the PIM kinases suggesting other modes of regulation being active. In short, study of PIM kinases, one of the preferred cancer therapy targets, in zebrafish has opened up new avenues of research to verify the model organism status of zebrafish.
TOP
C28 - Biopython Developments for Genome Visualisation, Comparative Genomics, and Metabolic Pathways
Short Abstract: Biopython (Cock et al 2009) includes a range modules for specific biological visualisation needs, producing both vector based output (e.g. PDF, SVG) and bitmap images (e.g. PNG), which will be illustrated with real biological applications.

Originally a separate library, GenomeDiagram (Pritchard et al 2006) was included into Biopython in 2009. This offered both linear and circular diagrams, particularly suitable for bacteria and viral scale genomes. Additional feature sigils have been added, and enhancements including cross links between genome tracks which allows complex comparative figures to be generated (e.g. Swanson et al 2012), including mimicking the output of the Artemis Comparison Tool (ACT, Carver et al 2005).

For larger genomes Biopython's chromosome diagram module is more suited, intended for the display of SNPs or other loci, and has been used in published work describing resistance genes in potato (Jupe et al 2012).

In the latest work we will illustrate customised visualisations of metabolic pathways from KEGG, where the pathways are loaded from the KGML descriptions and then overlaid on the manually laid out pathway images from KEGG. This will be illustrated using comparative metabolic pathways in the bacterial genus Dickeya (Pritchard et al, unpublished).
TOP
C29 - Pan-genome and phylogenomics of the genus Corynebacterium
Short Abstract: The pan-genome is a recent area of genomics and has as objective the genomic comparison of different strains of the same bacterial species. However, this concept can be extended to other taxa, such as the genus level. With the advent of new sequencing technologies, there was a massive deposit of genome sequences in public databases. Along with this accelerated filing information of prokaryotic genomes, it is noticed a great interest in generating data related to the group of Actinobacteria, which makes up the third largest phylogenetic branch with sequencing projects. The genus Corynebacterium belongs to the taxon Actinobacteria and currently comprises 89 species of medical, veterinary and biotechnological importance. It includes C. diphtheriae, the causative agent of diphtheria; opportunistic pathogens such as C. jeikeium, which is responsible for severe nosocomial infections in humans; and non-pathogenic species such as C. glutamicum, which is highly utilized in industrial amino acid production. To better understand the different lifestyles represented by the members of this genus, we performed a pan-genomic comparative analysis of 44 Corynebacterium genomes. For this study we used the EDGAR software to predict the pan-genome, the core genome and singletons, which were calculated based on 10.000 permutations of all the sequenced genomes. We found that the Corynebacterium pan-genome currently consists of approx. 22.000 genes; with a core genome of 580 genes and 235 singletons per genome. A phylogenomic analysis was performed through the combination of single-gene and whole-genome approaches to obtain a highly resolved tree of phylogeny.
TOP
C30 - Environmental Pressure Imprinted in Genomes through Protein Disorder
Short Abstract: Many organisms have been able to adapt to extreme habitats. Such prokaryotes, referred to as extremophiles, need to change their genetic code with respect to their non-extremophile relatives to survive in the extreme. Here, we reveal that differences between organisms from distinct habitats are imprinted upon a single feature of protein structure, namely the fraction of proteins with long regions that are predicted to be disordered. We ran various prediction methods on 46 entirely sequenced genomes representing organisms from diverse habitats and taxonomy and found that the overall composition of proteins with long disordered regions is linked to extreme conditions. In fact, the overall percentage of proteins with disordered regions was more similar between organisms of similar habitats than between organisms of similar taxonomy. For example, proteins from archaean and bacterial halophiles that survive high salt conditions tend to have substantially more disordered regions than their taxonomic neighbours. More generally, our finding that a microscopic feature as coarse-grained as the overall content in proteins with disordered regions correlates with such a complex macroscopic variable, as the environment remains surprising and will have to be investigated through future case-by-case studies of the underlying molecular mechanisms.
TOP
C31 - Functions of giant proteins in Bacterial Genomes
Short Abstract: Large proteins in bacterial genomes have long been a field of interest. In a study of 1636 published bacterial genomes, a total of 5.3 million genes were identified using the Prodigal algorithm. The average protein size was 312 amino acids while surprisingly, 1.5% (80502 proteins) of the proteins had a length of more than 1000 amino acids and was found in 1635 genomes out of 1636 genomes, making them relatively common. Each protein was compared to the Pfam-A protein family database which describes the function or structure of each conserved domain, which was used to deduce potential functions of giant proteins. Of 80500 giant proteins, 68341 (84.9%) had a match to the Pfam-A database using default cutoffs. Most of the proteins match only one domain in Pfam-A (730 proteins) while the rest have 2-11 matches to the database models. The 68341 proteins with matches composed 4741 unique domain combinations, 2396 of which are found in only one protein. The most common domain (found in 6037 proteins) is a integral membrane protein motif (PF00873), found alone or together with the motif which constructs a beta-barrel allowing for export of a variety of substrates in Gram negative bacteria (65 proteins, outer membrane efflux motif, PF02321). Other frequent architectures include RNA polymerase domains and TonB dependent receptors which transport signals from the surroundings into the cell and activates transcription of target genes. The analysis presented here illustrates how giant proteins serve key functions in signaling and regulation of bacteria.
TOP
C32 - Genome-wide identification, characterization and expression analysis of calcium-dependent protein kinase genes in barley (Hordeum vulgare L.)
Short Abstract: Calcium-dependent protein kinases (CDPKs) are important sensors of Ca2+flux in plant cells in response to various environmental stresses like cold, drought or salt stress. Genome sequence analysis of Arabidospis and rice has led to the identification and comparative studies of multigene families of these calcium signaling protein kinases.
In this study we identify and characterize CDPK complement of model cereal barley (Hordeum vulgare L.). Comparative analysis encompassed phylogeny reconstruction based on newly available barley genome as well as established model genomes (among others: Oryza stiva, Arabidopsis thaliana, Brachypodium distachyon). The presence of functional gene copies was verified based on characteristic CDPK architecture: seronine-threonine kinase domain and four regulatory EF-hand motifs. In silico verification was followed by measurements of transcript expression via qRT-PCR. Relative expression of CDPK genes has been analyzed during vegetative growth stage under intensifying drought stress conditions. Majority of studied genes (16) has shown distinct changes in patterns of expression during exposure to stress.
Our study underscores the involvement of CDPK kinase complement in drought adaptation and need for a coordinated systems biology approach to unravel physiological roles of differentially expressed genes in regulation of multiple processes.
The work was supported by the European Regional Development Fund through the Innovative Economy Program for Poland 2007-2013, project POLAPGEN-BD no. WND-POIG.01.03.01-00-101/08.
TOP
C33 - Chaperone Machinery of the Coelacanth
Short Abstract: The response to stress has been investigated in humans and other organisms, but there is little known about the stress response of the coelacanth. The coelacanth is a fish that, from the fossil record, was thought to have become extinct about 80 million years ago, until a living specimen was found in 1938. The coelacanth did not experience the whole genome duplication of teleost fish, so that it can be used to investigate the evolution of the stress response in tetrapods and other lineages.
Here, based on the recent transcriptome data of the Indonesian coelacanth, Latimeria menadoensis, the molecular chaperones were analysed. Using the human Hsp40, Hsp90 and selected co-chaperones, coelacanth homologues were retrieved. The results showed that most of the Hsp40, Hsp90 and Hsp90 co-chaperones are encoded in the coelacanth genome. In humans, there are 49 different Hsp40 proteins. The J domain, which is the main signature of all Hsp40s, is necessary for co-chaperone activity. Using the J domain of human and coelacanth Hsp40s, phylogenetic analysis was performed. Sequence and motif analysis was used to investigate the differences between the two organisms. Further, the 3D structures of some coelacanth proteins were calculated and compared to human homologues. Some interesting differences were identified, such as DnaJB13 which is predicted to be a non-functional Hsp40 in humans due to a corrupted HPD motif, while the coelacanth homologue has an intact HPD. Overall, our analyses suggested that all of the chaperone machinery between human and coelacanth is highly conserved.
TOP

View Posters By Category

Search Posters:


TOP