ISMB/ECCB 2011 Posters

19th Annual International Conference on
Intelligent Systems for Molecular Biology and
10th European Conference on Computational Biology

Accepted Posters

Category 'D'- Comparative Genomics'

Poster D01

Characterization of a mannose-binding Insecticidal lectin gene from Allium sativum (Garlic)

Nusrah Afolabi-Balogun Agriculture research Council-Vegetable and Ornamental Plant Institute

Hajia Inuwa (Ahmadu Bello University , Biochemistry); Lynelle Van Emmenes (Agriculture Research Council , Plant Breeding); Patrick Adeboola (Agriculture Research Council , Plant Breeding); Ibrahim Sanni (Ahmadu Bello University, Biochemistry); Andrew Nok (Ahmadu Bello University, Biochemistry); ishiyaku Mohammed (Ahmadu Bello University, plant Science);

Short Abstract: The full-length cDNA of monocot mannose-binding insecticidal agglutinin was synthesized using with Reverse 5’-TTGTAGAAACTAGTAGGAACCCTAT-3’ and Forward 5’-ATGGCCAGGAACCTACTGACGAACG-3’. Nucleotide-nucleotide BLAST of the ASAi full-length cDNA sequence on NCBI website (http://www.ncbi.nlm.nih.gov) revealed 362 bp, and had a 327, 216 and 162 bp open reading frame (ORF) when considered from frame +1, -3 and -1and positions 25 to 351, 85 to 300 and 18 to 182 encoding 108aa, 54aa and 71aa respectively. Multiple alignments of ASAi amino acids with those of seven other MMBLs revealed three highly conserved domains for sugar-binding [Gln (Q), Asp (D), Asn (N) and Tyr (Y)] namely QDNY
Signal peptide prediction on website (http://www.cbs.dtu.dk/services/singalP) showed that ASAi, like most other mannose-binding lectins from Allium species (Van Damme et al., 1995; Yao et al., 2003; Zhao et al., 2003), encoded a lectin precursor with two signal peptide of 12aa and 1-22aa, and Most likely cleavage site between positions 22 and 23.
Phylogenetic analysis indicated that 20 MMBLs including ASAi belonged to an extendedPhylogenetic analysis indicated that 20 MMBLs including ASAi belonged to an extended superfamily. Gene Ontology analysis indicate one biological process with GO ID: 0006952 representing defense response, with secondary IDs GO: 0002217 GO: 0042829.

Poster D02

Bacterial Phylogenomics

Robert Stones Food and Environment Research Agency

Short Abstract: Foodborne pathogens such Escherichia coli and Salmonella strains are of serious concern to public health. It is estimated that there are from 2 to 4 million cases of salmonellosis occur in the U.S. annually, and is rising. Many methods have been developed for detection and analysis of foodborne pathogens. Conventional culture methods require many days for presumptive results; several rapid methods are available but still require days to confirm results. These methods are slow to develop new diagnostics for closely related bacterial strains. And there is the increasing need to have a greater understanding of the evolution and biology of pathogens, which are becoming more and more of a concern to public health.

However, advances both in rapid next generation DNA sequencing technology, and phylogenetic analytical methods of large datasets across many taxa, has enabled the potential to compare whole genomes of bacteria; important to both clinical and regulatory science agencies. Enabling a better understanding and in much greater detail, the biology and evolution of bacterial strains, and how they are increasingly becoming a danger to public health.

The ability to construct whole phylogenies of complete bacterial genomes, through advances in rapid genome annotation is becoming more apparent. Development of novel gene clustering and analytical pipelines, coupled with rapid phylogenetic analysis presented in this work, will enable a better understanding of bacterial genomics, and help protect public health. This work presents recent results and methods for constructing phylogenies from entire genomes of foodborne bacteria, namely Bacterial Phylogenomics.

Poster D03

Detecting rare copy number variations (CNVs) with sparse coding

Andreas Mitterecker Johannes Kepler University

Djork-Arné Clevert (Johannes Kepler University, Institute of Bioinformatics); Andreas Mayr (Johannes Kepler University, Institute of Bioinformatics); An De Bondt (Janssen Pharmaceutica, Research & Development); Willem Talloen (Janssen Pharmaceutica, Research & Development); Hinrich Göhlmann (Janssen Pharmaceutica, Research & Development); Sepp Hochreiter (Johannes Kepler University, Institute of Bioinformatics);

Short Abstract: High-density oligonucleotide genotyping microarrays, especially Affymetrix SNP6 chips, are widely used for high-resolution copy number analysis. In order to identify CNVs more reliable, we have proposed a Maximum a posteriori factor analysis model called cn.FARMS. The latent variable, the factor, captures the simultaneous increase or decrease of DNA amount at neighboring chromosome locations measured by the intensity of oligonucleotide probes. This increase or decrease indicates amplification or deletion of a DNA region that is a CNV. cn.FARMS considerably reduces the false discovery rate (FDR) by combining adjacent chromosome locations to an ensemble voting (agreement of multiple measurements) instead of relying on a single measurement as other methods do.

Standard factor analysis assumes a Gaussian factor distribution which, however, is a wrong assumption for CNVs. Redon et al. 2006 showed that most CNVs affect less than three individuals out of 270 HapMap samples. These rare events are hard to detect by cn.FARMS as they would be interpreted as noise. Therefore we propose a factor analysis model with a Laplacian prior, which leads to a sparse factor distribution. But now we face another problem: the likelihood becomes analytically intractable. We tackled this problem applying an algorithm that employs a variational expectation maximization algorithm to the sparse prior, which optimizes a lower bound on the likelihood.

We have applied the Laplacian cn.FARMS model on the HapMap dataset to detect CNVs. We could verify most of published copy number variable regions and found new ones. However many known CNVs seem to be false positives.

Poster D04

Tools for comparative metagenome analysis

Kathrin Aßhauer University of Göttingen

Thomas Lingner (University of Göttingen, Department of Bioinformatics); Peter Meinicke (University of Göttingen, Department of Bioinformatics);

Short Abstract: Metagenomics provides an approach to the analysis of microbial communities from environmental and clinical samples. In this context, the advances of next-generation sequencing technologies allow insights into the enormous taxonomic and functional diversity even of complex microbial communities (Daniel, Nat Rev Microbiol, 2005). However, this development demands new bioinformatics tools which can efficiently deal with metagenomic data sets on a large-scale.

We developed Taxy and CoMet, two freely available tools that allow to quickly determine the phylogenetic and functional composition of large metagenomic data sets. In contrast to currently existing approaches for taxonomic profiling, which rely on sequencing read classification, Taxy is based on mixture modeling of the overall nucleotide distribution of a metagenomic sample. This completely novel approach is insensitive to read length variation and thus provides a better comparability of samples across different sequencing platforms. Taxy is available as a Matlab/Octave toolbox as well as an interactive tool for Windows.

The web-server CoMet focuses on the comparative functional profiling based on assignments of metagenomic reads to Pfam protein domains. CoMet combines a speed-optimized implementation of the Orphelia gene prediction engine (Hoff et al., Nucleic Acids Res, 2009) with Pfam domain detection using the UFO approach (Meinicke, BMC Genomics, 2009). Based on the resulting domain frequency profiles comparative statistical analyses are performed to identify and to visualize functional differences of metagenomic samples. The CoMet pipeline is accessible via an easy-to-use web interface at http://comet.gobics.de.

Poster D05

Identification of transcription factors and their correlation with the high diversity of Stramenopiles.

Francisco/Javier Buitrago/Florez Universidad de los Andes

Francisco Buitrago (Universidad de los Andes) Diego Mauricio Riaño-Pachon (Universidad de los Andes, Biological Sciences); Silvia Restrepo (Universidad de los Andes, Biological Sciences);

Short Abstract: Transcription factors (TFs) regulate spatial and temporal gene expression by binding to DNA and either activating or repressing the action of RNA polymerases; in addition to TFs, other Transcriptional regulators (TRs) participate in transcriptional modulation. With the availability of genome sequences for several organisms and computational strategies for gene functional annotation, the entire set of TFs and TRs can be identified, described, and compared between species and lineages. The diversity among Stramenopiles is striking; they range from large multicellular seaweeds to tiny unicellular species, they are present in freshwater, marine and terrestrial habitats and embrace many ecologically important algal (e.g. diatoms, brown algae, chrysophytes), and heterotrophic (e.g., Oomycetes) groups. In order to find TF and TR genes in the deduced proteomes of Stramenopiles, we followed and extended the approach developed in Perez et al. 2010. Briefly, it exploits the presence of protein domains and their combinations, in the form of boolean rules, that are specific for different families of TFs and TRs. We applied an enlarged set of rules to the deduced proteomes of 9 different Stramenopiles, identifying more than 400 different regulatory genes in each species belonging to up to 126 different gene families. The identification of this class of regulatory genes will constitute and important resource that could be exploited in gene functional characterization and evolutionary analyses. All TFs/TRs families found will be publicly released via a web database.
References.
Pérez-Rodríguez P, Riaño-Pachón DM, Corrêa LG, Rensing SA, Kersten B, Mueller-Roeber B. 2010. Nucleic Acids Res. 38(Database issue):D822-7

Poster D06

SeqXML and OrthoXML: standards for sequence and orthology information

Thomas Schmitt Stockholm University

David Messina (Stockholm University, Department of Biochemistry and Biophysics); Fabian Schreiber (Stockholm University, Department of Biochemistry and Biophysics); Erik LL Sonnhammer (Stockholm University, Department of Biochemistry and Biophysics);

Short Abstract: Today’s orthology databases lack standards. Users must contend with different ortholog representations from each provider, and the providers themselves must independently gather and parse the input sequence data. These burdensome and redundant procedures make data comparison and integration difficult. We have designed two XML-based formats, SeqXML and OrthoXML, to solve these problems.

SeqXML is a lightweight format for sequence records – the input for orthology prediction. It stores the same sequence and meta data as typical FASTA format records, but overcomes common problems such as unstructured metadata in the header and erroneous sequence content. XML provides validation to prevent data integrity problems that are frequent in FASTA files. The range of applications for SeqXML is broad, and we provide read/write functions for BioJava, BioPerl, and Biopython.

OrthoXML was designed to represent orthologs from any source in a consistent and structured way, yet cater to specific needs such as scoring schemes or meta information. A unified format is particularly valuable for ortholog consumers that want to combine lots of sources, e.g. for gene annotation projects.

Reference proteomes for 61 organisms are already available in SeqXML, and 10 orthology databases have signed on to OrthoXML. Adoption by the entire field would substantially facilitate exchange and quality control of sequence and orthology information.

Please visit http://SeqXML.org or http://OrthoXML.org for more details.

Poster D07

A Practical Algorithm for Ancestral Rearrangement Reconstruction

Jakub Kovac Comenius University

Brona Brejova (Comenius University, Department of Computer Science); Tomas Vinar (Comenius University, Department of Applied Informatics );

Short Abstract: This poster is based on Proceedings Submission 85

Motivation: Genome rearrangements are a valuable source of information about early evolution, as well as an important factor in speciation processes. Reconstruction of ancestral gene orders on a phylogeny is thus one of the crucial tools contributing to understanding of evolution of genome organization. For most models of evolution, this problem is NP-hard. Existing heuristic approaches impose restrictions on the used model or input genome architectures.
Results: We have developed a universal method for reconstruction of ancestral gene orders by parsimony (PIVO) using an iterative local optimization procedure. Our method can be applied to different rearrangement models. Combined with a sufficiently rich model, such as the double cut and join (DCJ), it can support a mixture of different chromosomal architectures in the same tree. We show that PIVO can outperform previously used steinerization framework and achieves better results on real data than previously published methods.
Availability and Implementation: Datasets, reconstructed histories, and the software can be downloaded at http://compbio.fmph.uniba.sk/pivo/.

Poster D08

Curiosities from the Drosophila proteome

Michael Tress Centro Nacional de Investigaciones Oncologicas

Bernd Bodenmiller (ETH Zurich, Institute of Molecular Systems Biology); Ruedi Aebersold (ETH Zurich, Institute of Molecular Systems Biology); Alfonso Valencia (Centro Nacional de Investigaciones Oncologicas, Structural and Comptational Biology Programme);

Short Abstract: Mature mRNA is generated from primary precursor mRNA transcripts (pre-mRNA) by the joining of exonic sequences. In the standard procedure (cis-splicing) exons from the same pre-mRNA molecule are joined. However, splicing can also occur between two separately transcribed pre-mRNA molecules. This process is termed “trans-splicing”.
There is concrete evidence that trans-splicing is important to the function of at least two Drosophila genes, modifier of mdg4 (mod(mdg4)) and longitudinals lacking (lola). A recent study has confirmed that the majority of lola and mod(mdg4) splice variants are indeed generated through trans-splicing.
We have analysed two large-scale studies of the Drosophila melangaster proteome. The experiments showed that a surprising number of the variants from lola and mod(mdg4) are indeed expressed as stable proteins in vivo. The transcripts that gave rise to these isoforms had all been shown to be generated almost entirely by trans-splicing. The large numbers of expressed protein isoforms detected for lola and mod(mdg4) suggests that trans-splicing may play an even greater role in Drosophila alternative splicing than previously thought.
Transcripts that generate proteins with premature stop codons are generally believed to be degraded by the nonsense mediated decay (NMD) pathway. Another recent study investigated down-regulation by NMD in Drosophila. As part of the paper the authors provided a high confidence set of more than 50 NMD targets. Analysis of the two large-scale Drosophila proteomics studies showed that at least four of the isoforms tagged as targeted for NMD were also expressed as proteins.

Poster D09

A new Perl toolkit for the detection and analysis of polymorphic loci and their application for bacterial typing

Luis M Rodriguez-R Institut de Recherche pour le Développement

Ralf KOEBNIK (Institut de Recherche pour le Développement) Luis M Rodriguez-R (Institut de Recherche pour le Développement, UMR Résistance des Plantes aux Bioagresseurs); Boris Szurek (Institut de Recherche pour le Développement, UMR Résistance des Plantes aux Bioagresseurs); Christine Pourcel (Université Paris-Sud, Institut de Génétique et Microbiologie); Ralf Koebnik (Institut de Recherche pour le Développement, UMR Résistance des Plantes aux Bioagresseurs);

Short Abstract: Since the introduction of DNA fingerprinting during the 1990s, molecular typing has been extensively employed in human medicine, veterinary medicine and plant pathology for a wide range of pathogens and based on distinct polymorphic loci. The explosion of genomic data during the last decade empowered the rational design and development of typing techniques enabling the prediction and detection of specific loci with different key features, like sequence similarity, repetitive regions or domains architecture. Loci employed for bacterial typing include Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR), Variable Number of Tandem Repeats (VNTR) and Simple Sequence Repeats (SSR), among others. Hitherto, several tools have been developed for the detection and comparison of such loci and some information systems have been developed either locus-specific or organism-specific. Here, we present a Perl library and a set of scripts for the analysis of variable loci used for bacterial typing based on highly customizable rules. The library allows the detection, grouping and group extension and comparison of any locus definable with the supported rules. We successfully employed the library for the analysis of VNTRs as a test case on genomes of different qualities from the genus Xanthomonas, a group of plant pathogens belonging to the gamma division of Proteobacteria. We used the library to reproduce and automate a previous workflow that required high human intervention, without losing results quality and reducing the risk of human error. The tool is accessible at http://bionfo-prod.mpl.ird.fr/xantho/ and is being used for the development of laboratory genotyping tools of xanthomonads.

Poster D10

Protein Intrinsic Disorder in Viruses and Virus-Host Interactions

Yu-An Dong Technische Universität München

Burkhard Rost (Technische Universität München, Informatics); Markus Schmidberger (Technische Universität München, Informatics);

Short Abstract: Protein intrinsic disorder plays an important role in signaling and regulation and thus is more prevalent in higher eukaryotes than bacteria. Many viral genomes, however, exhibit surprisingly high disorder content. Furthermore, there is significant variation of disorder content among viruses. Several herpesviruses, which are large DNA viruses, are compared with several common RNA viruses, for both their own disorder content and that of their human interaction partners. RNA viruses are less disordered, but their human protein interaction targets are more disordered. Distribution of disorder is correlated with phylogeny and viral life cycle. Patterns of disorder-order interactions are investigated in both human and virus-human context.

Poster D11

GECKO2: A software tool for de novo gene cluster detection

Katharina Jahn Universität Bielefeld

Leon Kuchenbecker (Universität Bielefeld, Technische Fakultät); Sebastian Böcker (Friedrich-Schiller-Universitaet Jena, Institut fuer Informatik ); Jens Stoye (Universität Bielefeld, Technische Fakultät);

Short Abstract: Co-localized genes that remain clustered across multiple genomes are known to provide highly informative signals for functional analysis. Incomplete conservation caused by micro-rearrangements, gene losses, gene insertions, as well as difficulties in establishing gene homologies turn the detection of such gene clusters into a computationally hard problem.

We developed a flexible platform for de novo detection of conserved gene clusters in multiple genomes. The underlying models increase robustness against incomplete conservation patterns essentially over the widely used max-gap model.
Our software constitutes a reasoned trade-off between algorithmic efficiency and model complexity
that allows for an efficient analysis of large genome sets.
The significance of the predicted clusters is evaluated in a statistical framework integrated into our software.
To facilitate the analysis, we offer different search scenarios and provide interactive visualization of the results.
The combination of these features presents a valuable improvement over existing approaches
to the systematic analysis of gene cluster conservation in multiple genomes.

Poster D12

Automated Segmentation of DNA Sequences

Brona Brejova Comenius University Bratislava

Michal Burger (Comenius University, Computer Science); Tomas Vinar (Comenius University, Applied Informatics);

Short Abstract: This poster is based on Proceedings Submission 143.

Most algorithms for reconstruction of evolutionary histories involving
large-scale events such as duplications, deletions or rearrangements,
work on sequences of predetermined markers. Typically, protein coding
genes or other functional elements are used as makers. However,
markers defined in this way ignore information included in non-coding
sequences, are prone to errors in annotation, and may even introduce
artifacts due to partial gene copies or chimeric genes.

We propose the problem of sequence segmentation where the goal is to
automatically select suitable markers based on sequence homology alone.
We design an algorithm for this problem which can tolerate certain
amount of inaccuracies in the input alignments and still produce
segmentation of the sequence to markers with high coverage and accuracy.
We test our algorithm on several artificial and real data sets
representing complex clusters of segmental duplications.

Poster D13

Antimicrobial Resistance and susceptility profiling of four pathogenic organisms using UV irradiation.

Olugbenga Taiwo Covenant University

Solomon Oranusi (Covenant University, Biological Sciences); Enejo Abah (Covenant University, Biological Sciences); Conrad omonhinmin (Covenant University, Biological Sciences);

Short Abstract: Antimicrobial Resistance and Susceptibility Profiling of four Pathogenic Organisms Using UV Irradiation.
1Taiwo, O.S., 1Oranusi U.S., 2Omonhinmin A. C., and 1Abah, S.E.
1Microbiology Unit, Department of Biological Sciences,
College of Science & Technology, Covenant University, Ota.
2Applied Biology & Biotechnology Unit, Department of Biological Sciences,
College of Science & Technology, Covenant University, Ota.
ABSTRACT
This poster is based on proceedings submission of Studies on antimicrobial resistance and susceptibility profile and effects of UV irradiation in conferring susceptibility and resistance was carried out on two gram positive (Staphylococcus aureus, Enterococcus faecalis ATCC29212); and two gram negative organisms (Escherichia coli ATCC25922,Shigella dysenteriae 006). The test organisms were subjected to UV irradiation for 0hr, 1hr, 3hrs,5hrs and 7hrs,and check afterward for antimicrobial sensitivity.Antimicrobial sensitivity test showed E.coli ATCC 25922, initially susceptible to COT and COL acquired resistance to both with increased exposure to UV irradiation. S. aureus initially resistant to COT became susceptible after 5hrs UV exposure.Statistical analyses showed a marked difference in the susceptibility of E. coli ATCC25922,E.faecalis ATCC29212 and S. aureus to antibiotics but not with the exposure time regime (? = 0.05). Conversely, S.dysenteriae 006 recorded a significant change (? = 0.05) in the exposure time without a noticeable change amongst the antibiotics.The trend line analyses of the mean susceptibility of the various organisms to the time of exposure showed a possibility of high increase in susceptibility in S. dysenteriae 006, a slim possibility of increase in E. faecalis ATCC29212 and S. aureus, and a decrease for E.coli.keyword:UVIrradiation,Resistance,Susceptibility and Induced Mutation.

Poster D14

Structure-based Whole Genome Realignment Reveals Putative Non-coding RNAs

Sebastian Will Massachusetts Institute of Technology

Michael Yu (Massachusetts Institute of Technology, CSAIL); Bonnie Berger (Massachusetts Institute of Technology, Mathematics, CSAIL);

Short Abstract: This poster is based on Proceedings Submission 233

Motivation: Current whole genome alignment approaches align according
to sequence-similarity but do not explicitly consider molecular
structure. Therefore, non-coding RNAs (ncRNAs) with high structural
but low sequence conservation are typically poorly aligned. Such
non-coding RNAs often remain hidden from genome-wide de novo
prediction screens that require visible structural conservation in the
whole genome alignment. In consequence, ncRNA prediction by recent
genomic screens is strongly biased towards high sequence conservation.
Even when the alignment quality is sufficient for the identification
of a ncRNA, improving the alignment can be advantageous for
further comparative analysis.

Results: We present an original approach for realigning whole genomes
to reveal previously undetected ncRNAs. Our realignment pipeline
combines de-novo prediction of structural RNAs and a novel
structure-based realignment algorithm. For the latter we introduce a
novel banding technique in the dynamic programming algorithm of the
established RNA alignment method LocARNA. This limits the deviation
from the original alignment. The benefits of this strategy are
two-fold. First, the runtime of this otherwise costly computation on a
genomic scale is significantly reduced. Second, controlling the
deviation from the original alignment ensures a conservative
realignment, increasing the confidence in predictions from such
alignments.

We apply the pipeline to a whole genome alignment of the twelve
assembled Drosophilids genomes. Due to our realignment strategy, we
predict thousands of high confidence putative ncRNAs that could not be
identified from the whole genome alignment. Particularly we improve
the prediction of ncRNAs with low sequence conservation which are
intrinsically more challenging for de novo prediction.

Availability: The deviation-limited realignment algorithm is freely
available as part of the open-source package LocARNA at
http://www.bioinf.uni-freiburg.de/Software/LocARNA/.

Poster D15

Degree of synteny between the genomes of radiation-resistant species of Deinococcus does not reflect their extreme dsDNA break repair capabilities

Jelena Repar Rudjer Boskovic Institute

Ksenija Zahradka (Rudjer Boskovic Institute, Division of Molecular Biology); Davor Zahradka (Rudjer Boskovic Institute, Division of Molecular Biology);

Short Abstract: Deinococcus radiodurans, a highly radiation resistant bacterium, is able to accurately reassemble it's genome after it has been shattered into hundreds of pieces. Extreme radiation resistance is a typical attribute of members of Deinococcus genus implicating similar DNA repair capabilities. Such fidelity of dsDNA break repair would imply high degree of gene order conservation across genomes of deinococci. We have investigated the degree of synteny between 5 currently available completely sequenced genomes of genus Deinococcus. Whole genome alignments were performed by open source MUMer software implementing a suffix tree alignment algorithm. Maximal unique matching subsequences were detected for each genome pair and visualized as dot plots. Notably, the dot plot analysis revealed very low degree of synteny in all genome comparisons showing, therefore, that the ability of these deinococci to accurately reassemble DNA after extreme doses of gamma radiation is not mirrored in the degree of synteny between their genomes.

Poster D16

Protein sequences of the human genome compared to those of several model organisms

Edda Kloppmann TU Munich

Marco Punta (TU Munich and IAS, Informatics, Bioinformatics); Burkhard Rost (TU Munich, Bioinformatics and Computational Biology);

Short Abstract: In the last decade, the (nearly) complete DNA sequences of the human genome and numerous other organisms have become available. With this information in hand, large-scale comparative analysis has become feasible. Here, we present a sequence level comparison (PSI-Blast) between the annotated human protein sequences and those of well-known model organisms such as, among others, mouse, zebrafish, rice and yeast.

For assessing alignment similarity, we used different combinations of E-value, alignment length and sequence identity. We discuss differences and similarities in relation to the occurrence of several protein structural features, including transmembrane, coiled-coil and disordered regions. We found that even closely related organisms show significant diversity at the protein sequence level.

Poster D17

Integrating multiple species to identify essential promoter elements

Julia Lasserre Max-Planck Institute for Molecular Genetics

Ho-Ryun Chung (Max-Planck Institute for Molecular Genetics, Computational Biology); Klaus-Robert Müller (Technische Universität, Machine Learning); Martin Vingron (Max-Planck Institute for Molecular Genetics, Computational Biology);

Short Abstract: Regulation of gene expression encoded in the promoter is an essential characteristic of eukaryotes. Existing methods for extracting promoter features traditionally involve motif finding along a sequence in a species-specific manner. However, some of the detectable features arise from transcriptional mutational biases. Such biases occur as a consequence of the transcriptional process and differ between species. They do not constitute regulatory signals, yet they overcome them. This project aims at identifying generic regulatory sequence signals of polymerase II promoters across metazoan genomes. We therefore study promoters of several evolutionarily distant species together. By doing so, the goal is to emphasize the regulatory signals and to isolate them from a background of process-induced features. We use multi-task feature learning, a method that allows to extract features that are common to related datasets. We identify a set of words that are useful for the detection of promoters in all species.

Poster D18

OPE - Optimal Pairwise Epistasis

Benjamin Goudey NICTA and University of Melbourne

Armita Zarnegar (NICTA VRL, Australian Mathematical Society) Eder Kikianty (NICTA VRL) Dave Rawlinson (NICTA VRL) Qiao Wang (NICTA) John Markham (NICTA) Richard Campbell (NICTA, University of Melbourne, Department of Electrical and Electronic Engineering); Michael Inouye (The Walter and Eliza Hall Institute of Medical Research) Geoff Macintyre (NICTA, University of Melbourne, COmputer Science and Software Enginering); Gad Abraham (NICTA, University of Melbourne, Computer Science and Software Enginering); Izhak Haviv (Baker IDI Heart & Diabetes Institute) Adam Kowalczyk (NICTA)

Short Abstract: Motivation: Most genome-wide association studies (GWAS) to date use a single-locus analysis strategy, essentially ignoring the existence of interaction between SNPs. Previously published methods that account for interactions do not scale to real datasets, typically taking days or weeks of computation on non-commodity systems. In addition, most interaction methods are limited by only considering SNPs with strong marginal effects (statistically significant 1st order association with a phenotype).

We have developed a novel approach to assess all possible pairwise SNP interactions using rigorous computation of extreme tails of dedicated statistics. Analysis of a dataset with 300K SNPs takes <6 hour on a PC, with further improvements anticipated.

Results: We tested our approach on two independent GWAS studies of Celiac disease. In one study, one million statistically significant interactions were found, and over 98% were independently replicated in the second. The major histo-compatibility complex (MHC) and chromosome X were significantly enriched for interactions. Among replicated interactions, 1,000 had no significant marginal effect. Of autosomal, interacting SNPs with no marginal effect, 145 (from 151 pairs) were expression quantitative trait loci. Top putative interactions supported known mechanisms for T-cell attenuation and clinical targets for immunological control. Many interactions included SNPs not described previously as affecting Celiac disease risk.

Conclusion: We show that the calculation of higher order SNP interactions is feasible in reasonable time and that interaction is both pervasive in the genome and biologically relevant. In particular, this makes a systematic exploration of higher order interactions practical, especially with cluster or cloud computing.

Poster D19

QuartetS: a fast and accurate program for large-scale orthology detection

Jaques Reifman US Army Medical Research and Materiel Command

Chenggang Yu (Biotechnology HPC Software Applications Institute ) Nela Zavaljevski (Biotechnology HPC Software Applications Institute ) Valmik Desai (Biotechnology HPC Software Applications Institute )

Short Abstract: The unparalleled growth in the availability of genomic data offers both a challenge to develop orthology detection methods that are simultaneously accurate and high throughput and an opportunity to improve orthology detection by leveraging evolutionary evidence in the accumulated sequenced genomes. Here, we report a novel orthology detection method, termed QuartetS, that exploits evolutionary evidence in a computationally efficient manner. Based on the well-established evolutionary concept that gene duplication events can be used to discriminate homologous genes, QuartetS uses an approximate phylogenetic analysis of quartet gene trees to infer the occurrence of duplication events and discriminate paralogous from orthologous genes. We used function- and phylogeny-based metrics to perform a large-scale, systematic comparison of the orthology predictions of QuartetS with those of four other methods [bi-directional best hit (BBH), outgroup, OMA, and QuartetS-C (QuartetS followed by clustering)], involving 624 bacterial genomes and >2 million genes. We found that QuartetS slightly, but consistently, outperformed the highly specific OMA method and that, while consuming only 0.5% additional computational time, QuartetS predicted 50% more orthologs with a 50% lower false positive rate than the widely used BBH method. We conclude that, for large-scale phylogenetic and functional analysis, QuartetS and QuartetS-C should be preferred, respectively, in applications where high accuracy and high throughput are required.

Poster D20

Computational analysis of metagenomes and metatranscriptomes: challenges and opportunities.

Larisa Kiseleva Okinawa Institute of Science and Technology

Igor Goryanin (Okinawa Institute of Science and Technology)

Short Abstract: Recent advances in ultra-high throughput sequencing technologies, which do not require cloning or PCR amplification, and can produce huge numbers of DNA reads at an affordable cost, have boosted the number and scope of metagenomic sequencing projects. Now we can read DNA directly from environmental samples and compare the biological diversity and the functional activity of different microbial communities. While metagenomics provides information on the gene content, metatranscriptomic aims at understanding gene expression patterns of microbial population. Such meta- data has enormous potential in variety of areas: revealing novel sequences, genes, pathways; discovering potentially important enzymes; studying early events in the evolution of gene families etc.

Along with the progress in molecular sequencing methods, new powerful tools for computational analysis of obtained sequence data have been developed. This review will describe new methods for interactively exploring, analyzing and comparing multiple metagenomic datasets. We also apply our experimental data on several publicly available annotation pipelines and sequence analysis tools (1,2,3) and compare the obtained results.

References:
1. Gerlach et al., WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads. BMC Bioinformatics 2009, 10:430.
2. Huson et al., MEGAN analysis of metagenomic data. Genome research 2007, 17(3):337-386.
3. Meyer et al., The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 2008, 9:386.

Poster D21

A strategy integrating genomic and metabolic contexts to propose candidate genes for orphan enzymes

Adam SMITH CEA/Genoscope

Marcel SALANOUBAT (CNRS/UMR8030, CEA/Genoscope); Claudine MEDIGUE (CNRS/UMR8030, CEA/Genoscope); David VALLENET (CNRS/UMR8030, CEA/Genoscope); Alain VIARI (INRIA, Grenoble);

Short Abstract: Despite an ongoing experimental and bibliographical effort, approximately 27% of all enzymatic activities recognised by the IUBMB are still orphan enzymes (i.e. metabolic reactions with no known protein/nucleic sequences associated) in the UniProt database. Sequence homology-based methods are traditionally used to transfer functional annotations to newly sequenced genes, and are obviously of no help in this case. Only integrative, context-based similarity-free methods can address this problem. Here, we present a new two-step method explicitly combining genomic and metabolic contextual information in order to propose candidate genes for orphan enzymes. A first step involves locating Genomic Metabolons (i.e. groups of co-localised genes coding proteins catalysing reactions belonging to a same metabolic process, allowing for gene and reaction gaps) using an exact algorithm, genome by genome. The second step integrates metabolons over 860 available prokaryotic genomes, in order to propose the best candidates for orphan enzymes. We have benchmarked our method computationally, as well as validating it by locating a candidate Escherichia coli K12 gene for an orphan enzyme involved in allantoin degredation. A web interface is under development to proffer these results to bio-analyst users of MicroScope, the prokaryotic comparative genomics platform. The method will thus help bioanalysts to discover new genes corresponding to orphan enzymes.

Poster D22

Species-specific open chromatin has evidence of branch-specific selection.

Nathan Sheffield Duke

Yoichiro Shibata (Duke, Institute for Genome Sciences and Policy); Olivier Fedrigo (Duke, Institute for Genome Sciences and Policy); Greg Wray (Duke, Institute for Genome Sciences and Policy); Greg Crawford (Duke, Institute for Genome Sciences and Policy); Terrence Furey (UNC, Genetics);

Short Abstract: There is considerable evidence for differential gene expression among humans and other primates. These expression differences are likely probably partially caused by unknown differences in gene regulation. Mapping DNaseI hypersensitivity sites is a general measure of chromatin accessibility or openness, and is used to identify all types of regulatory elements, including promoters, enhancers, silencers, and insulators. We used DNase-Seq to identify all DNaseI hypersensitive sites in the same cell type (fibroblasts) isolated from human, chimp, and macaque. We identified approximately 1000 human- and chimp-specific increases and decreases in chromatin openness, which we polarize using the macaque outgroup. For the same samples, we also quantitated expression using DGE-seq. We find examples of chromatin differences that correspond to expression differences more often than expected. For example, presence of a species-specific DNase hypersensitive site is often correlated with species-specific expression of a nearby gene. We also tested regions of differential chromatin for sequence conservation and evidence of branch-specific positive selection; we found that DNase hypersensitive sites with species differences more often have evidence for positive selection on the differing branch. This is true for both human-specific and chimp-specific differential open chromatin. Similarly, open chromatin that is shared across species contains more conserved elements than open chromatin that differs across species. These results suggest that sequence evolution corresponds with chromatin structure differences, which in turn may lead to altered gene expression. Our results also highlight specific cases with the testable hypothesis that differential gene expression is caused by functional mutations in regulatory sequence.

Poster D23

Comparative Genomics for expression QTL

Steffen Moeller University of Lübeck

Steffen Möller (University of Lübeck) Michael Brehler (University of Lübeck, Department of Dermatology); Georg Zeplin (University of Lübeck, Department of Dermatology); Saleh Ibrahim (University of Lübeck, Department of Dermatology);

Short Abstract: In a series of projects the genotype information of individuals is now combined with molecular phenotypes. When applied with transcriptomics or proteomics data, this allows to determine those chromosomal regions that are associated with the variation of a gene expression level. Such expression QTL, especially those located within QTL for clinical phenotypes, help formulating hypotheses for the etiology of polygenic diseases.

We have seen comparisons of expression QTL for different tissues of the same individuals and with the advent of recombinant inbred strains we also have the data to compare genetically identical individuals in different environments. This work provides the technical infrastructure for allowing such comparative analyses and, thinking this further, also supports the comparison across species. This was motivated by earlier work showing the strong connections between Multiple Sclerosis and its rodent animal model EAE.

Firstly, we were interested in orthologous genes that have expression QTL in both species. This is similar to the direct comparison of differentially expressed genes between projects, but here we also have the direct notion of a location within a disease-associated chromosomal region in at least one of the species. Furthermore, we are interested in such controlling regions that are syntenic to each other. This would be indicative of the location of a molecular mechanism conserved between two animal modles. Gene homologies and genomic syntenies are contributed by Ensembl Compara. The system, an extension to the Interactive QTL System (TISQ), provides searches for syntenic regions controlling orthologuous genes both graphically and with statistical summaries.

Poster D24

Comparative genome analysis of completely sequenced Thermus scotoductus SA-01 and Thermus thermophilus HB27 and HB8

Benjamin Kumwenda University of Pretoria

Derrick Litthauer (University of Free State, Biochemistry);

Short Abstract: Although studies of thermophilic organisms dates back to 1960’s with over 50 species isolated world-wide, only three strains Thermus scotoductus SA-01, Thermus thermophilus HB8 and HB27 have been completely sequenced so far. It was impossible until now to do a comprehensive comparative genomic study of thermus species to gain more understanding on their natural transformation which is considerd to be their major way of survival in high temperature environments. This work does a comprehensive comparative genomic study and further investigates genome rearrangement and its effect on metabolic network clustering.

Complete chromosome global alignment was analysed using M-GCAT software. Metabolic pathways were reconstructed using pathway tools software. Genome organisation was investigated by identifying breakpoints in SA-01 in comparison with HB8 and HB27. A cross-clustering coefficient was used to analyse levels of metabolic networks clustering. Genomic islands were predicted by SeqWord Sniffer, IslandViewer, IslandPick and IslandPath. Statistical parameters of accummulation of mutation in homologs as determined by BLAST were analysed.

Homologous genes in T. scotoductus have been found in both chromosomes and plasmids of HB8 and HB27. The three genomes are closely related and highly conserved with mutations around 2-3 nucleotides per 100 bp. Insignificant differences in the number of breakpoints were observed in SA-01 with reference to HB8 and HB27, with HB27 having fewer breakpoints. This finding agrees with clustering results where HB27 is well clustered seconded by HB8. SA-01 was found to have the least number of genomic islands.

Accepted Posters

Preparing your Poster - Information and Poster Size
Poster Schedule
Vienna Poster Printing Services
Poster Categories
Search for a Poster

Attention Poster Authors: The ideal poster size should be max. 1.30 m (130 cm) high x 0.90 m (90 cm) wide. Fasteners (Velcro / double sided tape) will be provided at the site, please DO NOT bring tape, tacks or pins. View a diagram of the the poster board here

Posters Display Schedule:

Odd Numbered posters:

Set-up timeframe: Sunday, July 17, 7:30 a.m. - 10:00 a.m.
Author poster presentations: Monday, July 18, 12:40 p.m. - 2:30 p.m.
Removal timeframe: Monday, July 18, 2:30 p.m. - 3:30 p.m.*

Even Numbered posters:

Set-up timeframe: Monday, July 18, 3:30 p.m. - 4:30 p.m.
Author poster presentations: Tuesday, July 19, 12:40 p.m. - 2:30 p.m.
Removal timeframe: Tuesday, July 19, 2:30 p.m. - 4:00 p.m.*

* Posters that are not removed by the designated time may be taken down by the organizers and discarded. Please be sure to remove your poster within the stated timeframe.

Delegate Posters Viewing Schedule

Odd Numbered posters:
On display Sunday, July 17, 10:00 a.m. through Monday, June 18, 2:30 p.m.
Author presentations will take place Monday, July 18: 12:40 p.m.-2:30 p.m.

Even Numbered posters:
On display Monday, July 18, 4:30 p.m. through Tuesday, June 19, 2:30 p.m.
Author presentations will take place Tuesday, July 19: 12:40 p.m.-2:30 p.m

Want to print a poster in Vienna - try these options:

Repacopy- next to the congress venue link [MAP]

Also at Karlsplatz is in the Ring Center, Kärntner Str. 42, link [MAP]

If you need your poster on a thicker material, you may also use a plotter service next to Karlsplatz: http://schiessling.at/portfolio/

View Posters By Category

Search Posters:

↑ TOP