ISMB 2008 ISCB


















Accepted Posters
Category 'U'- Sequence analysis'
Poster U01
TATA-variant identification, characterization and functional classification in plant genomes
Virginie BERNARD- URGV
Véronique BRUNAUD (URGV, bioinformatics); Alain LECHARNY (URGV, bioinformatics);
Short Abstract: Taking advantages of the TATA-box topological constraints we identified TATA-variants sharing the same constraints and being conserved in Arabidopsis thaliana and Oryza sativa. This work led to TATA-variant characterization distinguishing some motifs relative to the specific function, structure and expression of their related genes.
Long Abstract: Click Here

Poster U02
Significance of hidden Markov model results
Lee Newberg- Wadsworth Center, New York State Department of Health
No additional authors
Short Abstract: For hidden Markov model / dynamic programming algorithm scans of large databases of sequence data, the ability to quickly estimate p-values at the 1e-12 level or smaller is necessary. We present a general approach that has quickly estimated p-values as low as 1e-4000.
Long Abstract: Click Here

Poster U03
SNPlexViewer- a solution towards cost effective traceability system
Eyal Seroussi- The Agricultural Research Organization (ARO), Volcani Center
Yanir Seroussi (Monash University, Clayton School of Information Technology); Andrey Shirak (The Agricultural Research Organization (ARO), Volcani Center, Institute of Animal Sciene); Baruch Karniol (The Agricultural Research Organization (ARO), Volcani Center, Institute of Animal Sciene);
Short Abstract: DNA-based-traceability uses the animal own DNA-code for identity control. For this purpose we multiplexed 25 SNPs and to further decrease SNaPshot-genotyping expenses we introduced software, which facilitates the analysis of trace-files without size-standards. SNPlexViewer improves genotyping performance by aligning two trace-chromatograms while embedding within a normalized target-trace-file the reference size-standards.
Long Abstract: Click Here

Poster U04
Investigation of a simple heuristic improving the speed of statistical alignments
Jesper Nielsen- Aarhus University
Rune Lyngsø (University of Oxford, Department of Statistics); Christian Pedersen (Aarhus University, Bioinformatics Research Center); Jotun Hein (University of Oxford, Department of Statistics);
Short Abstract: We investigate a simple heuristic for speeding up statisticalalignments. The idea is to use the results from pairwise alignmentsto estimate which multiple alignments are likely before actuallycomputing them. We investigate several ways to do this and achieve asignificant speed-up.
Long Abstract: Click Here

Poster U05
Refinement of structure-based sequence alignments by Seed Extension
Chin-Hsien (Emily) Tai- National Cancer Institute, NIH
Changhoon Kim (National Cancer Institute, NIH, Center for Cancer Research); Byungkook Lee (National Cancer Institute, NIH, Center for Cancer Research);
Short Abstract: Refinement with Seed Extension (RSE) is a new procedure for refining a structure-based sequence alignment using a Seed Extension algorithm. With negligible increases in computation time, it improved the average accuracy of sequence alignments from all nine popular structure comparison/alignment programs, when tested against NCBI’s CDD alignments.
Long Abstract: Click Here

Poster U06
Software tool for bulk annotation of genomic loci
Mali Salmon- EMBL-EBI
No additional authors
Short Abstract: Here we describe a collection of software tools developed for the efficient annotation of genomic loci. The programs automatically identify key features of interest, such as the location of experimental peaks within genes, their proximity to up- or downstream transcription start sites, and the presence of binding site motifs
Long Abstract: Click Here

Poster U07
Multiple Motif Scanning to Identify Methyltransferases
Tanya Petrossian- University of California, Los Angeles
Steve Clarke (UCLA, Department of Chemistry and Biochemistry and the Molecular Biology Institute);
Short Abstract: This study seeks to refine the methyltransferase database by using the novel “Multiple Motif Scanning” program. HMM profiles and secondary structures were utilized to identify AdoMet-binding motifs. Statistical examination of these sequences allowed for motif refinement. Additionally, clustering analysis revealed probable substrates for the putative methyltransferases.
Long Abstract: Click Here

Poster U08
Sequence context-specific profiles for homology searching
Andreas Biegert- Gene Center, LMU Munich
Johannes Soeding (Gene Center, LMU Munich, Computational Biology);
Short Abstract: In standard sequence searches, amino acids are compared one by one. We derive context-specific amino acid similarities from short windows centered on each query sequence residue. By employing our context-specific similarities in combination with NCBI BLAST, CS-BLAST achieves two-fold increased sensitivity at the same specificity and speed.
Long Abstract: Click Here

Poster U09
Optimized pipeline for the analysis of mircoRNA sequences obtained from next-generation sequencing technologies
Marcel Grunert- Max Planck Institute for Molecular Genetics
Markus Schueler (Max Planck Institute for Molecular Genetics, Vertebrate Genomics / Computational Molecular Biology); Ilona Dunkel (Max Planck Institute for Molecular Genetics, Vertebrate Genomics); Silke Sperling (Max Planck Institute for Molecular Genetics, Vertebrate Genomics);
Short Abstract: We present an optimized data analysis pipeline for the processing and analysis of small RNA sequences obtained from Solexa next-generation sequencing data. The pipeline includes quality control, statistical reporting, whole genome mapping, several filtering steps, profiling of small RNAs by database annotations, and finally, the prediction of novel microRNAs.
Long Abstract: Click Here

Poster U10
WebLab: a data-centric, knowledge-sharing bioinformatic platform
Ge Gao- Peking University
Xiaoqiao Liu (Peking University, Center for Bioinformatics, School of Life Sciences); Jianmin Wu (Peking University, Center for Bioinformatics, School of Life Sciences); Jun Wang (Peking University, Center for Bioinformatics, School of Life Sciences); Xiaochun Liu (Peking University, Center for Bioinformatics, School of Life Sciences); Shuqi Zhao (Peking University, Center for Bioinformatics, School of Life Sciences); Zhe Li (Peking University, Center for Bioinformatics, School of Life Sciences); Lei Kong (Peking University, Center for Bioinformatics, School of Life Sciences); Xiaocheng Gu (Peking University, Center for Bioinformatics, School of Life Sciences); Jingchu Luo (Peking University, Center for Bioinformatics, School of Life Sciences);
Short Abstract: In order to support biological researches, we have developed WebLab, a data-centric knowledge-sharing bioinformatic platform. Besides plentiful types of analysis tools, WebLab provides powerful data management function for both experimental data and scientific literature. Flexible sharing mechanism and group strategy are also provided to facilitate collaborative team work.
Long Abstract: Click Here

Poster U11
ENTROPIC PROFILER – efficient whole genome analysis using information theory and statistical concepts
Susana Vinga- INESC-ID
Francisco Fernandes (INESC-ID, KDBIO); Ana T Freitas (INESC-ID/IST, KDBIO); Jonas S Almeida (MDAnderson Cancer Center, Biostat Appl Math);
Short Abstract: Entropic Profiles (EP) are local information plots that indicate overall conservation of motifs in genomes. They are based on Information Theory concepts, in particular the Renyi entropy of biological sequences. The present tool implementation, based on new data structures and algorithmic simplifications, allows to process whole genomes in few minutes.
Long Abstract: Click Here

Poster U12
ADAPTdb/ADAPT - A Framework for the Analysis of ARISA Data Sets
Robert Schmieder- San Diego State University
Matthew Haynes (San Diego State University, Biology); Elizabeth Dinsdale (San Diego State University, Biology); Forest Rohwer (San Diego State University, Biology); Robert Edwards (San Diego State University, Computer Science);
Short Abstract: ADAPTdb/ADAPT presents a web-based system for the automatic analysis of ARISA data sets. The database ADAPTdb stores and maintains ITS regions along with information about their source organisms. ADAPT uses ADAPTdb to taxonomically characterize ARISA data sets. Additionally, ADAPT performs pathogenic and autotrophic/heterotrophic comparisons of organisms among different ARISA samples.
Long Abstract: Click Here

Poster U13
Mining unique-m substrings from genomes
Kai Ye- European Bioinformatics Institute
Qilan Li (Leiden/Amsterdam Centre for Drug Research, Medicinal Chemistry); Ad IJzerman (Leiden/Amsterdam Centre for Drug Research, Medicinal Chemistry); Zhenyu Jia (University of California, Irvine, Department of Pathology and Laboratory Medicine); Paul Flicek (European Bioinformatics Institute, PANDA); Rolf Apweiler (European Bioinformatics Institute, PANDA);
Short Abstract: Information about unique substrings of genomes is fundamental but not sufficient for many genetic investigations. We propose an efficient (time and space) pattern growth approach to systematically mine all unique-m substrings, which have exactly one perfect match in the genome while all approximate matches must have more than m mismatches.
Long Abstract: Click Here

Poster U14
LOCAS - a new lowest coverage assembler to support resequencing with ultra-short reads
Juliane Klein- University of Tuebingen
Korbinian Schneeberger (Max Planck Institute for Developmental Biology, Department of Molecular Biology); Stephan Ossowski (Max Planck Institute for Developmental Biology, Department of Molecular Biology); Detlef Weigel (Max Planck Institute for Developmental Biology, Department of Molecular Biology); Daniel H. Huson (University of Tuebingen, Faculty of Computer Science);
Short Abstract: We present LOCAS, a new assembly tool for short read sequence data. Incontrast to existing short read assemblers, which assume highcoverage of reads, LOCAS is aimed at assembling low-coveragedatasets. LOCAS is particularly suited forresequencing projects. We are using it in an Arabidopsis resequencingproject (1001 genomes).
Long Abstract: Click Here

Poster U15
Evaluation of Association Measures for Motif Discovery
Pedro Ferreira- Centre for Genomic Regulation
Roderic Guigó (Centre for Genomic Regulation, Genome Bioinformatics Lab);
Short Abstract: Combinatorial motif discovery algorithms rely on association measures to assess the strength of co-occurrence between simple motifs. We surveyed 14 association measures previously applied in bioinformatics, data mining and language processing and performed an empirical evaluation in artificially generated datasets in order to better understand there similarities and differences.
Long Abstract: Click Here

Poster U16
Command-line-based integration of online bioinformatics resources
Kazuki Oshita- Institute for Advanced Biosciences, Keio University
Masaru Tomita (Institute for Advanced Biosciences, Keio University, Environment and Information Studies); Kazuharu Arakawa (Institute for Advanced Biosciences, Keio University, Graduate School of Media and Governance);
Short Abstract: Here we present a software package that maps online bioinformatics resources as UNIX command-line tools that can be pipelined using EMBOSS Ajax Command Definition ontologies. The software package currently contains more than 50 tools, and is freely available from http://www.g-language.org/.
Long Abstract: Click Here

Poster U17
Biological sequence motif discovery using feature selection in Conditional Random Field
Thanh Hai Dang- University of Antwerp
Alain Verschoren (University of Antwerp, Department of Mathematics and Computer Science); Kris Laukens (University of Antwerp, Department of Mathematics and Computer Science);
Short Abstract: Motif discovery plays important role in molecular biology. Most of computational methods developed so far are limited to gapless motifs and independent assumption between the positions within sequences. We hereby introduce a motif discovery method using feature selection in Conditional Random Field (CRF), which overcomes the above mentioned limitations.
Long Abstract: Click Here

Poster U18
Computational motif discovery using extreme-valuedtuples from mutual information profiles
Sara Garcia- Signal Processing Laboratory, IEETA
Armando Pinho (Signal Processing Laboratory, IEETA, University of Aveiro); Holger Kantz ( Max Planck Institute for the Physics of Complex Systems, Nonlinear Time Series Analysis );
Short Abstract: We propose a new methodology for computational motif discovery based on extreme-valued tuples, using information theory for assessing optimal tuple information measures based on the formalism of Shannon's entropy, and extreme value statistics for providing a framework for threshold-based selecting criteria.
Long Abstract: Click Here

Poster U19
Application of VAMSAS enabled tools for the investigation of protein evolution.
James Procter- University of Dundee
Iain Milne (Scottish Crop Research Institute, Bioinformatics); Frank Wright (Biomathematics and Statistics, Scotland, Genetics); Pierre Marguerite (European Bioinformatics Institute, MSD); Andrew Waterhouse (Riken, Genome Sciences Centre); Dominik Lindner (Scottish Crop Research Institute, Bioinformatics); David Martin (University of Dundee, School of Life Sciences Research); Tom Oldfield (European Bioinformatics Institute, MSD); David Marshall (Scottish Crop Research Institute, Bioinformatics); Geoff Barton (University of Dundee, School of Life Sciences Research);
Short Abstract: Protein evolutionary analysis often involves the use of many programs.We demonstrate how it can be performed effectively using applicationsthat have been modified to dynamically exchange data; via the 'Visualization and Analysis of Molecular Sequences, Alignments, andStructures\\\\\\\\\\\\\\\' (VAMSAS) framework.
Long Abstract: Click Here

Poster U20
Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features
Oliver Frings- Stockholm University
Timo Lassmann (Stockholm University, Stockholm Bioinformatics Center); Erik Sonnhammer (Stockholm University, Stockholm Bioinformatics Center);
Short Abstract: To make Kalign a versatile tool for large-scale alignment studies, we have dramatically improved its computational properties, while maintaining its high accuracy. Kalign 2 now supports the alignment of nucleotide sequences, and a newly introduced extension allows to include sequence annotation into the alignment process to improve alignment accuracy.
Long Abstract: Click Here

Poster U21
A novel ab initio method finding microRNA clusters
Anthony Mathelier- CNRS-UPMC
Alessandra Carbone (CNRS-UPMC, Informatique);
Short Abstract: MicroRNAs are a class of endogenes whose expression profiles reflect origin and differentiation state of human cancers and tumours. We propose a novel ab initio approach searching for clusterized paralogous microRNAs in highly dense palindromic regions. miRNA precursors discrimination is based on 5 (physical and combinatorial) conditions only.
Long Abstract: Click Here

Poster U22
G-language Genome Analysis Environment Version 2: Integrated workbench for computational genome sequence analysis
Kazuharu Arakawa- Institute for Advanced Biosciences, Keio University
Masaru Tomita (Institute for Advanced Biosciences, Keio University, Department of Environment and Information Studies);
Short Abstract: G-language Genome Analysis Environment is a software package written in Perl for genome sequence analysis compatible with BioPerl, especially focusing on bacterial genomes. Here we present the second version of the software, implemented with interactive shell and more than 200 analysis programs. The software is freely available at http://www.g-language.org/.
Long Abstract: Click Here

Poster U23
The CluSTr database in 2009
Craig McAnulla- EMBL - European Bioinformatics Institute
John Maslen (EMBL - European Bioinformatics Institute, InterPro team); Antony Quinn (EMBL - European Bioinformatics Institute, InterPro team); Sarah Hunter (EMBL - European Bioinformatics Institute, InterPro team);
Short Abstract: The CluSTr database offers an automatic classification of proteins from UniProtKB and other databases into groups of related proteins. The clustering is based on analysis of all pairwise similarities between protein sequences. New developments in CluSTr will be presented, including increased coverage, new protein datasets, and extended website functionality.
Long Abstract: Click Here

Poster U24
Novel miRNA identification and target gene prediction in Glycine Max
Trupti Joshi- University of Missouri-Columbia
No additional authors
Short Abstract: We identified over 50 novel miRNAs from Illumina SBS sequencing of seven tissues in soybean, and validated some computationally predicted putative target genes. We also developed a soybean genome browser and incorporated the small RNA libraries, along with Solexa transcriptome sequencing data. The genome browser can be accessed at http://genomebrowser.missouri.edu/cgi-bin/hgGateway.
Long Abstract: Click Here

Poster U25
Extraction of transcription factor binding sites from ChIP-Seq data through de novo TFBS motif identification. Application for EWS-Fli1 oncogenic transcription factor.
Valentina Boeva- Institut Curie
Noëlle Guillon (Institut Curie, Genetics and Biology of Cancers ); Franck Tirode (Institut Curie, Genetics and Biology of Cancers ); Olivier Delattre (Institut Curie, Genetics and Biology of Cancers ); Emmanuel Barillot (Institut Curie, Bioinformatics, biostatistics, epidemiology and computational systems biology of cancer);
Short Abstract: We propose a new algorithm for ChIP-Seq data analysis. It enables binding site extraction without the setting of an explicit threshold on the DNA fragment coverage. On EWS-Fli1 data, the algorithm showed significantly increased peak selection sensitivity with a very minor increase in the expected number of false positive hits.
Long Abstract: Click Here

Poster U26
Revealing the Density-based Clustering Structure of the SwissProt database
Gabor Ivan- PhD Student
Vince Grolmusz (Eotvos Lorand University, Department of Computer Science);
Short Abstract: We classified 389046 sequences occurring in SwissProt using the OPTICS algorithm. We proposed a colouring scheme that is based on taxonomy information and helps analyzing the composition of clusters. We validated our results with the Pfam database using an OPTICS-specific quality measure and concluded that we obtained clusters of high quality.
Long Abstract: Click Here

Poster U27
Identification of double coding regions in papillomaviruses based on nucleotide frequencies
Sten Ilmjärv- University of Tartu
Aare Abroi (Estonian Biocentre, .); Jaak Vilo (Quretec Ltd, .); Hedi Peterson (Quretec Ltd, .);
Short Abstract: Double coding regions in mammalian genomes have been widely studied due to frequent alternative splicing. We have identified double coding regions in papillomaviruses using amino acid sequence alignment based DNA conservation scoring. By comparing theoretical and real nucleotide frequencies we identified overlapping coding sequences for various papilloma types.
Long Abstract: Click Here

Poster U28
Assessing Differences among Next-Generation Sequencing Software for Genomic Resequencing Alignment and Detection of Variation
James Cavalcoli- University of Michigan
Edgar Otto (University of Michigan, Pediatric Nephrology); James MacDonald (University of Michigan, Human Genetics); Friedhelm Hildebrandt (University of Michigan, Pediatrics); Gilbert Omenn (University of Michigan, Internal Medicine);
Short Abstract: While resequencing a portion of Human Chr19 to identify disease-causing variants, we assessed the capacity and variability of a number of next-generation sequence analysis software tools for their ability to align, assemble, and detect genomic variations (polymorphisms and indels) compared to the reference genome (hg18 build 36.3).
Long Abstract: Click Here

Poster U29
An approach to subfamily assignment for large protein families
Yaoqing Shen- Universite de Montreal
Gertraud Burger (Universite de Montreal, Biochemistry); Franz Lang (Universite de Montreal, Biochemistry);
Short Abstract: We report a comprehensive bioinformatics analysis of the acyl-CoA dehydrogenase family (ACAD) family. We identified over 800 ACAD homologs from 250 species, recognized the subfamilies they belong to, compiled their taxonomic profiles, and traced back the evolution of the ACAD family.
Long Abstract: Click Here

Poster U30
Characterization of transcriptome splicing structure using high-throughput RNA-seq
Jinze Liu- University of Kentucky
Kai Wang (University of Kentucky, Computer Science); Stephen Coleman (University of Kentucky, Veterinary Science); James Macleod (University of Kentucky, Veterinary Science); Jan Prins (University of North Carolina at Chapel Hill, Computer Science);
Short Abstract: MAPSPAN robustly identify splices in the transcriptome sampled via RNA-seq short reads. A novel unsupervised algorithm maps spliced reads onto the reference genome. Compared with existing approaches, MAPSPAN demonstrates higher sensitivity and selectivity in identifying splices and their coverage on known datasets.
Long Abstract: Click Here

Poster U31
Does average viral genome sizes covary with that of microbes? A novel method applied to 150 metagenomes
Florent Angly- San Diego State University
Dana Willner (San Diego State University, Biology); Robert Schmieder (San Diego State University, Computer Science); Rebecca Vega-Thurber (Florida International University, Biology Department); Rob Edwards (San Diego State University, Computer Science); Forest Rohwer (San Diego State University, Biology);
Short Abstract: Viral genome vary in length by 1000X and are subject to different environmental pressures than microbes. We developed a method to estimate viral average genome size and applied it to 150 metagenomes to produce estimates for different biomes and verify if it covaries with microbial average genome size.
Long Abstract: Click Here

Poster U32
Increasing Short Read Mapping Speed by Masking of Residues Sequence Reads
Stefan Henz- Max Planck Institute for Developmental Biology
Fabio de Bona (Friedrich Miescher Laboratory, Machine Learning in Biology); Stefan R. Henz (Max Planck Institute for Developmental Biology, Molecular Biology); Korbinian Schneeberger (Max Planck Institute for Developmental Biology, Molecular Biology); Stephan Ossowski (Max Planck Institute for Developmental Biology, Molecular Biology); Detlef Weigel (Max Planck Institute for Developmental Biology, Molecular Biology); Gunnar Rätsch (Friedrich Miescher Laboratory, Machine Learning in Biology);
Short Abstract: Next generation sequencing technologies produce massive amounts of shortsequence reads with varying quality of their positions which slow-down thealignment as many possible mismatches have to be considered.We employ a machine-learning-based algorithm, RTrim, performing areads' segmentation into mappable and unmappable regions. By appropriately maskinglow-quality positions we can map these reads quicker since fewer mismatches are required.
Long Abstract: Click Here

Poster U33
Phylogeny in vertebrates of PEDF
Shivam sidana- JMIT
Niket ladha (JMIT, chemical);
Short Abstract: The PEDF gene first appears in vertebrates and our studies suggest that theregulation and biological actions of this gene are preserved across vertebrates. This analysis of the PEDF gene across phyla provides new information that will aid furthercharacterization of common functional motifs of this serpin in biological processes
Long Abstract: Click Here

Poster U34
Algebraic approach to DNA sequence homology assessment
Andrzej Brodzik- The MITRE Corporation
No additional authors
Short Abstract: We investigate difference sets and related combinatorial objects as models for novel DNA sequence homology markers. We construct representations of DNA sequences in the difference set space, and compute their alignment. This procedure permits identification of homologous DNA sequences in a small fraction of the time required by standard methods.
Long Abstract: Click Here

Poster U35
DistanceScan and Nash: Two novel tools for promoter analysis
Ekaterina Shelest- HKI, Hans Knoell Institute
Eugen Fazius (HKI, Hans Knoell Institute, Bioinformatics and systems biology); Vladimir Shelest (HKI, Hans Knoell Institute, Bioinformatics and systems biology); Reinhard Guthke (HKI, Hans Knoell Institute, Bioinformatics and systems biology);
Short Abstract: Nash is a motif-discovery tool based on a novel approach to the prediction of transcription factor binding sites (TFBS), which is alternative to widely used PWMs and HMMs. DistanceScan utilizes the method of distance distributions of TFBS pairs. It allows to select the functional combinations of motifs on non-random distances.
Long Abstract: Click Here

Poster U36
Promoters and The Transcription Factors - a Simple Relation via The miRNA
Chanchal Mitra- University of Hyderabad
Padmavathi Putta (University of Hyderabad, Biochemistry); Luciano Milanesi (Institute of Biomedical Technology, Bioinformatics);
Short Abstract: We have identified a relatively small set of GC-rich 6-nucleotide and 7-nucleotidesequences around the TSS in human promoter sequences. These sequences are distributed on both sides of the TSS and are likely to be involved in recognition and binding of various factors. They are relatively uncommon elsewhere.
Long Abstract: Click Here

Poster U37
ncSOLID a R package for non coding RNA digital sequencing
Raffaele Calogero- University of Torino
Cristina Della Beffa (University of Torino, Dipartimento di Scienze Cliniche e Biologiche); Francesca Cordero (University of Torino, Dipartimento di Scienze Cliniche e Biologiche);
Short Abstract: ncSOLID is a package for quantitative secondary analysis of non-coding transcriptome sequencing data generated with SOLID next-gen sequencing platform. The philosophy of the package is the organization of rna-seq data in a structure that allows the statistical detection of differential expression for ncRNAs, e.g. micro RNAs within Bioconductor framework.
Long Abstract: Click Here

Poster U38
SeqAn - An efficient C++ library for sequence analysis
David Weese- Free University of Berlin
Tobias Rausch (International Max Planck Research School for Computational Biology and Scientific Computing, Molecular Genetics); Marcel Schulz (International Max Planck Research School for Computational Biology and Scientific Computing, Molecular Genetics); Anne-Katrin Emde (Free University of Berlin, Computer Science); Andreas Döring (Free University of Berlin, Computer Science); Knut Reinert (Free University of Berlin, Computer Science);
Short Abstract: SeqAn is an open source C++ library of efficient algorithms and data structures for the analysis of biological sequences. Using a template-based library design, SeqAn aims at providing (1) algorithms that are generic, fast and extensible and (2) data structures that allow the rapid prototyping of novel sequence analysis methods.
Long Abstract: Click Here

Poster U39
Efficient computation of good neighbor seeds
LUCIAN ILIE- University of Western Ontario
SILVANA ILIE (Ryerson University, Mathematics);
Short Abstract: The current state of the art of homology search involves the use of (multiple) spaced seeds. Particularly important are the neighbor seeds which combine high sensitivity with reduced space. We give the only polynomial-time algorithm that computes better neighbor seeds than previous ones while being several orders of magnitude faster.
Long Abstract: Click Here

Poster U40
Genome-wide computational analysis of eukaryotic core promoters
Holger Hartmann- Gene Center Munich
Claudia Gugenmus (Gene Center Munich, AG Soeding); Johannes Soeding (Gene Center Munich, AG Soeding);
Short Abstract: We have developed a sensitive method for core promoter analysis and detected all currently known but also several previously unknown motifs in yeast, fly and human. For yeast our results show that the core promoter is aligned to the +1 nucleosome rather than to the TSS.
Long Abstract: Click Here

Poster U41
An new bioinformatics analysis tools framework at EMBL-EBI
Mickaël Goujon- European Bioinformatic Institute
Hamish McWilliam (European Bioinformatic Institute, External Services); Franck Valentin (European Bioinformatic Institute, External Services); Weizhong Li (European Bioinformatic Institute, External Services); Robert Langlois (European Bioinformatic Institute, External Services); Rodrigo Lopez (European Bioinformatic Institute, External Services);
Short Abstract: The popular framework to run analytical tools at the European Bioinformatic Institute has been redesigned to improve the user experience. The existing web interface and web services API have been reviewed and simplified to accommodate a larger audience and provide new and unique features that will greatly benefit the end-user.
Long Abstract: Click Here

Poster U42
Development of SOLiD SAGE and tag counting and identification software
Xiequn Xu- Life Technologies
Patrick Gilles (Life Technologies, R&D ASA); Jennifer Kilzer (Life Technologies, R&D ASA); Kevin Clancy (Life Technologies, R&D ASA); Adam Harris (Life Technologies, R&D ASA); Rob Bennett (Life Technologies, R&D ASA);
Short Abstract: We developed the SOLiDTM SAGE kit by modifying serial analysis of gene expression (SAGE) to produce longer tags and adapting it to SOLiDTM sequencing platform. Easy-to-use software with a graphical user interface has also been developed for the post-sequencing data analysis for labs with moderate computational resources.
Long Abstract: Click Here

Poster U43
EMBOSS: European Molecular Biology Open Software Suite
Peter Rice- European Bioinformatics Institute
Alan Bleasby (European Bioinformatics Institute, Rice Group); Jon Ison (European Bioinformatics Institute, Rice Group); Mahmut Uludag (European Bioinformatics Institute, Rice Group);
Short Abstract: EMBOSS is a mature package of software tools developed for the molecular biology community. It includes a comprehensive set of applications and C libraries for molecular sequence analysis and other tasks and integrates popular third-party software packages under consistent interfaces.
Long Abstract: Click Here

Poster U44
Base-pairing profile local alignment kernels for functional RNA analyses
Kengo Sato- Japan Biological Informatics Consortium (JBIC)
Yutaka Saito (Keio University, Department of Biosciences and Informatics); Yasubumi Sakakibara (Keio University, Department of Biosciences and Informatics);
Short Abstract: We developed base-pairing profile local alignment (BPLA) kernels for discrimination and detection of functional RNA sequences using SVMs, and confirmed the effectiveness of our method by not only computational experiments but also expression analysis via qRT-PCR.
Long Abstract: Click Here

Poster U45
Comparison of assembly strategies for high throughput de novo sequencing of bacterial genomes
Frank Panitz- University of Aarhus
Pernille Andersen (Faculty of Agricultural Sciences, Aarhus University, Department of Genetics and Biotechnology); Jakob Hedegaard (Faculty of Agricultural Sciences, Aarhus University, Department of Genetics and Biotechnology); Christian Bendixen (Faculty of Agricultural Sciences, Aarhus University, Department of Genetics and Biotechnology); Frank Panitz (Faculty of Agricultural Sciences, Aarhus University, Department of Genetics and Biotechnology);
Short Abstract: Different strategies were applied to generate optimal assemblies for two bacterial genome sequences based on de-novo sequencing using high-throughput 454 and Solexa paired-end reads. The quality of the hybrid assembly was assessed by the longest average contig size and also supported by gene prediction and comparative analysis to related genomes.
Long Abstract: Click Here

Poster U46
Determining the Reading Frame in Short DNA Fragments
Hochul Lee- San Diego State University
Peter Salamon (San Diego State University, Mathematics); Rob Edwards (San Diego State University, Center for Microbial Sciences); Forest Rohwer (San Diego State University, Biology); Ben Felts (San Diego State University, Computational Science Research Center); Sajia Akhter (San Diego State University, Computational Science Research Center);
Short Abstract: We describe an implementation and preliminary tests for an intelligent algorithm to select the protein-encoding reading frame in short fragments of DNA without relying on extrinsic information. The system will speed up current computational analyses and apply many new analytical methods to metagenomic datasets.
Long Abstract: Click Here

Poster U47
The GNUMAP Algorithm: Probabilistic Mapping of Oligonucleotides from Next-Generation Sequencing
Nathan Clement- Brigham Young University
Mark Clement (Brigham Young University, Computer Science); Quinn Snell (Brigham Young University, Computer Science); Evan Johnson (Brigham Young University, Statistics);
Short Abstract: GNUMAP addresses the analyses problems presented by an increase in the quantity of sequence data from next-generation sequencing technologies. The probabilistic nature of the mapping algorithm implemented in GNUMAP provides an accurate and efficient method for mapping large numbers of short sequences to a genome.
Long Abstract: Click Here

Poster U48
A novel predictor of mucin-type O-glycosylation sites
Yong-Zi Chen- china agricultural university
No additional authors
Short Abstract: we attempted to improve the prediction of O-glycosylation sites in mammalian proteins by seeking a new encoding scheme, named CKSAAP encoding. With the ability of reflecting characteristics of the sequences surrounding mucin-type O-glycosylation sites, and with the assistance of Support VectorMachine (SVM), the result showed that this method was more powerful than the other methods.
Long Abstract: Click Here

Poster U49
Raccess: A tool for genome-scale computation of structural accessibility of RNA transcripts.
Hisanori Kiryu- Computational Biology Research Center, AIST
Toutai Mituyama (Computational Biology Research Center, AIST, RNA Informatics Team); Kiyoshi Asai (University of Tokyo , Department of Computational Biology);
Short Abstract: We have developed a tool called Raccess for computing the accessibility of potential transcripts based on the Turner energy model of secondary structures. We have applied our tool to the entire human genome, and have analyzed the structural constraints imposed on the messenger RNAs and ancestral repeats.
Long Abstract: Click Here

Poster U50
Conserved Sequences of West Nile Viral Proteins as candidate targets for vaccine design
TinWee Tan- National University of Singapore
QiYing Koo (National University of SIngapore, Biochemistry); M. Asif Khan (National University of SIngapore, Biochemistry); Shweta Ramdas (National University of Singapore, Biochemistry); Keun-Ok Jung (Johns Hopkins University School of Medicine, Pharmacology and Molecular Sciences); Jerome Salmon (Johns Hopkins University School of Medicine, Pharmacology and Molecular Sciences); Olivo Miotto (University of Oxford, MRC Centre for Genomics and Global Health); Vladimir Brusic (Dana-Farber Cancer Institute, Cancer Vaccine Center); J Thomas August (Johns Hopkins University School of Medicine, Pharmacology and Molecular Sciences);
Short Abstract: The focus of this study is to identify and characterize WNV protein regions that have exhibited strong conservation throughout the recorded history of the virus, and that are potential targets of T-cell immune responses, using various bioinformatics-based methods and correlation with available experimental data.
Long Abstract: Click Here

Poster U51
Evolution of antigenic variant gene families within Plasmodium species
Diego Diez- Kyoto University
Nelson Hayes (Kyoto University, Kanehisa Laboratory); Susumu Goto (Kyoto University, Kanehisa Laboratory);
Short Abstract: We retrieved sequences involved in antigenic variation from different Apicomplexa and performed Pfam domain analysis. We found a gene family, which has undergone differential expansion in five Plasmodium species. We describe sequence and phylogenetic analyses on these families, revealing clues about the evolution of antigenic multi-gene families in other pathogens.
Long Abstract: Click Here

Poster U52
CARMA: Correction and Reference Morphing Algorithm
Thomas Otto- Wellcome Trust Sanger Institute
Mandy Sanders (Wellcome Trust Sanger Institute , Pathogen Genomics); Matt Berriman (Wellcome Trust Sanger Institute, Pathogen Genomics); Chris Newbold (John Radcliffe Hospital, Institute of Molecular Medicine);
Short Abstract: Second generation sequencing technology enables deep, low cost resequencing across multiple strains and species. We have developed an algorithm that iteratively maps short reads to a reference sequence. At each cycle, CARMA attempts to correct errors, or morph the reference into a new strain, and then evaluates its success.
Long Abstract: Click Here

Poster U53
CentroidFold: Predictions of RNA Secondary Structure for Estimating Accurate Base-pairs
Michiaki Hamada- Mizuho Information & Research Institute, Inc
Kengo Sato (Japan Biological Informatics Consortium, JBIC); Hisanori Kiryu (National Institute of Advanced Industrial Science and Technology (AIST), Computational Biology Research Center); Toutai Mituyama (National Institute of Advanced Industrial Science and Technology (AIST), Computational Biology Research Center); Kiyoshi Asai (University of Tokyo, Graduate School of Frontier Sciences);
Short Abstract: We developed software called CentroidFold for secondary structure predictionof RNA sequences, which includes the centroid estimator used in Sfoldas a special caseand is theoretically superior to MEA estimator used in CONTRAfold.A web server and stand-alone software are freely available athttp://www.ncrna.org/centroidfold/
Long Abstract: Click Here

Poster U54
On the quality of established datasets for benchmarking sequence database search and low-complexity handling tools: the ASTRAL compendium test case
Ioannis Kirmitzoglou- University Of Cyprus
Vasilis Promponas (University of Cyprus, Department of Biological Sciences);
Short Abstract: Benchmarking of sequence database search tools serves to establish protocols for routine or more elaborate searches. Low complexity regions (LCRs) complicate this procedure, requiring special handling.We provide new insights on validating the performance of relevant methods, taking into account LCRs, based on the widely used ASTRAL compendium datasets.
Long Abstract: Click Here

Poster U55
The SSAHA2 software pipeline for the mapping of DNA sequencing reads and genotype calling
Hannes Ponstingl- Wellcome Trust Sanger Institute
Yong Gu (Wellcome Trust Sanger Institute, Sequencing Informatics); Zemin Ning (Wellcome Trust Sanger Institute, Sequencing Informatics);
Short Abstract: The SSAHA2 software pipeline efficiently maps DNA sequencing reads onto a genomic reference sequence. A genotype call of the consensus sequence can be produced taking into account a heuristic score of the mapping quality. Reads from most types of sequencing platforms are supported including paired-end sequencing reads.
Long Abstract: Click Here

Poster U56
EMLIB: a C++ library to manage transcripts and genomic variations
Matteo Cereda- Scientific Institute IRCCS E.Medea
Manuela Sironi (Scientific Institute IRCCS E.Medea, Bioinformatics Laboratory); Uberto Pozzoli (Scientific Institute IRCCS E.Medea, Bioinformatics Laboratory);
Short Abstract: EMLIB is a C++ library containing a novel hierarchy of classes useful to define transcripts, to manage sequence variations and to calculate position-specific quantitative features in a “variation dependent” way. EMLIB provides an intuitive and powerful environment to gain insights about the effect of genomic variations.
Long Abstract: Click Here

Poster U57
BioHDF: Open binary file formats for large-scale data management - Project Update
Mark Welsh- Geospiza, Inc.
Todd Smith (Geospiza, Inc., CEO); N. Eric Olson (Geospiza, Inc., Product Development); Mike Folk (The HDF Group, -);
Short Abstract: BioHDF extends a mature Open Source technology for the storage of scientific data, Hierarchical Data Format, with features specific to Next Generation Sequencing. Initial prototyping of BioHDF has demonstrated clear benefits to storing sequences and their reference alignments in this structured binary format, including file compression and fast data retrieval.
Long Abstract: Click Here

Poster U58
Sequence analysis scale-up and acceleration using Grid and Cloud Computing yield efficient analyses of HIV-1 variants and other viruses
TinWee Tan- National University of Singapore
Yongli Hu (National University of Singapore, Biochemistry); Shen Jean Lim (National University of Singapore, Biochemistry); M Asif Khan (National University of Singapore, Biochemistry); Mark De Silva (National University of Singapore, Biochemistry); Kuan Siong Lim (National University of Singapore, Biochemistry); Martti Tammi (National University of Singapore, Biochemistry); J Thomas August (Johns Hopkins University School of Medicine, Pharmacology and Molecular Sciences);
Short Abstract: Sequence inundation in current paradigm affects the speed and scalability of immunoinformatics-driven sequence analysis of infectious agents such as HIV-1. To overcome these restrictions, we have customized and benchmarked bioinformatics analyses on Grid and Cloud computing and obtained good results of enhanced efficacy and scalability.
Long Abstract: Click Here

Poster U59
Comparative analysis of local compositional complecity in plant encoded proteins
Vasilis Promponas- University of Cyprus
Eleni Mytilineou (University of Cyprus, Department of Biological Sciences); Ioannis Kirmitzoglou (University of Cyprus, Department of Biological Sciences);
Short Abstract: Different approaches have identified numerous low complexity regions (LCR) in protein sequences. However, few systematical studies exist for elucidating their possible biological significance.We take advantage of the completetion of two model plant organism genomes to investigate their functional roles and the mechanisms responsible for LCR appearance, maintenance or modification.
Long Abstract: Click Here

Poster U60
Detecting biases in Next Generation Sequence data
Rudiger Brauning- AgResearch
Anar Khan (AgResearch, Bioinformatics, Mathematics and Statistics); Ken Dodds (AgResearch, Bioinformatics, Mathematics and Statistics); Jo-Ann Stanton (Anatomy and Structural Biology, University of Otago); Chris Mason (Anatomy and Structural Biology, University of Otago);
Short Abstract: A recent study (1) looked at systematic bias in amplicon sequencing by NGS platforms. We look at WGS data generated for the Watson genome project using the 454 platform. After filtering for identical reads we specifically analyse bias at the beginning of each sequence read.(1) Harismendy et al., Genome Biology 2009,
Long Abstract: Click Here

Poster U61
VARiD: Variation Detection in Color-Space and Letter-Space
Adrian Dalca- University of Toronto
Michael Brudno (University of Toronto, Computer Science);
Short Abstract: We present VARiD - a Hidden Markov Model for SNP and Indel identification with AB-SOLiD color-space and regular letter-space reads. VARiD combines both types of data in a single framework which allows for homozygous and heterozygous calls. On both simulated and real datasets VARiD demonstrates very high specificity and sensitivity.
Long Abstract: Click Here

Poster U62
Statistically Significant Ranking of NGS Differential Peaks
Bryan Beresford-Smith- NICTA
Adam Kowalczyk (NICTA, VRL); Thomas Conway (NICTA, VRL); Izhak Haviv (Baker IDI Heart and Diabetes Institute, The Blood and DNA Profiling Facility); Richard Tothill (Baker IDI Heart and Diabetes Institute, The Blood and DNA Profiling Facility);
Short Abstract: Several statistics are presented for replacing ad hoc heuristics such as fold-ratio for the identification of differential peaks in NGS data. The statistical framework leads naturally to a power law for peak significance versus number of reads. The tests have been applied to ChIP-Seq data sets to demonstrate their usefulness.
Long Abstract: Click Here

Poster U63
Prediction of Protein Disordered and Ordered Region
Meijing Li- Chungbuk national university
Yoon Kyeong Lee (Chungbuk National University, Signal Transduction and Systems Biology Laboratory); Jin Hyoung Park (Chungbuk National University, Database/Bioinformatics Laboratory); Heon Gyu Lee (Electronics and Telecommunications Research Institute, Postal& Logistics Research Dep.); Hak Yong Kim (Chungbuk National University, Signal Transduction and Systems Biology Laboratory); Keun Ho Ryu (Chungbuk National University, Database/Bioinformatics Laboratory);
Short Abstract: In this paper, we proposed emerging sequence-based prediction method for identifying protein disordered and ordered region from protein sequence. In the experiment, disordered sequence data: DisProt, ordered sequence data: PDB as training data. The test data is from CASP7. The experiment result is better than result of published prediction methods.
Long Abstract: Click Here

Poster U64
Positive selection contributes to the emergence of new HIV-1 lineages while high substitution rates determines viral pathogenesis in epidemically linked patients
Elcio Leal- Federal University of Sao Paulo
No additional authors
Short Abstract: The contribution of HIV-1 diversity to AIDS was evaluated in epidemically linked individuals composed by one blood donor and two blood recipients. The same HIV-1 source, transmitted during blood transfusion, indicated positive selection as a key factor to the emergence of lineages while substitution rates determine the disease outcome.
Long Abstract: Click Here

Poster U65
Benchmarking promoter prediction software
Thomas Abeel- VIB-UGent
Yvan Saeys (VIB-UGent, Plant Systems Biology); Yves Van de Peer (VIB-UGent, Plant Systems Biology);
Short Abstract: Recently many new promoter prediction programs (PPPs) have emerged, but a common benchmarking strategy is lacking. We propose a multi-faceted protocol as a gold standard for PPP evaluation. We benchmarked 17 PPPs and further investigated the best four. The importance of PPPs will only increase, as more genomes are sequenced.
Long Abstract: Click Here



Accepted Posters

View Posters By Category
Search Posters:
Poster Number Matches
Last Name
Co-Authors Contains
Title
Abstract Contains






↑ TOP