Posters

20th Annual International Conference on
Intelligent Systems for Molecular Biology

Posters

Poster numbers will be assigned May 30th.
If you can not find your poster below that probably means you have not yet confirmed you will be attending ISMB/ECCB 2015. To confirm your poster find the poster acceptence email there will be a confirmation link. Click on it and follow the instructions.

If you need further assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category I - 'Open Science and Citizen Science'

I01 - Analysis of gene structure evolution in Aspergillus fungi using RNASeq data

Short Abstract: NCBI multi-genome Gnomon approach for the parallel annotation of closely related genomes produces a consistent annotation across all the genomes involved in the process. Experimental evidence for any of the genomes contributes to the annotations of the others. RNA-Seq has a tremendous potential to improve the accuracy of gene prediction by providing additional experimental evidence for the splice positions. This study uses RNASeq data in the analysis of the gene structure of 8 Aspergillus genomes. The orthologous genes are grouped together by protein clustering and analyzed further for the conservation of intron positions. 99% of conserved introns are supported by RNASeq data. The results of the analysis confirm the previously reported unusually high rate of intron gain compared to the rate of the intron loss. It also provides additional evidence supporting an mRNA-mediated model of intron loss.

TOP

I02 - Automated method for annotation of genes involved in the degradation of polycyclic aromatic hydrocarbons (PAH) from genomic, metagenomic and metatranscriptomic sequence reads.

Short Abstract: This poster is based on a bioinformatic tool that annotates and clasifies genes involved in the degradation of Polycyclic Aromatic Hydrocarbons (PAH). Microbial genomics and its more complex counterparts, such as metagenomics and metatranscriptomics, are powerful means to understand how microbial community metabolism adapts to strong selective pressures. Communities living in environments contaminated with polycyclic aromatic hydrocarbons (PAH) are under strong selective pressure. Enzymatic pathways for the degradation of monocyclic or bicyclic aromatic compounds have been studied in detail in pure bacterial cultures. However, most catabolic pathways for degradation of high molecular weight PAHs have not been extensively described, and thus their environmental distribution and relevance remain unknown. We have developed a bioinformatics tool to precisely identify the aerobic and anaerobic catabolic potential to degrade aliphatic and aromatic hydrocarbons using genomic information, thereby automating the annotation of genes belonging to key degradation families. This required a careful curation of databases, phylogenetic reconstruction and clustering of sequences at the subfamily level in order to build specific probabilistic HMM profiles which can be applied to screening of translated metagenomic reads. Screening using our bioinformatics tool would allow researchers to quickly and precisely annotate the hydrocarbon catabolome for xenobiotic compounds, which is currently roughly annotated by standard automated programs for peripheral pathways with low discriminatory power. With our work we aim to improve identification of the environmental conditions, compounds and degradation routes to develop a better understanding of the long-term consequences of pollution and the ways for mitigating its unfortunate effects.

TOP

I03 - Dizeez and GenESP: online games for human gene annotation.

Short Abstract: Structured gene annotations are a foundation on which many bioinformatics and statistical analyses are built, however their representation is quite sparse - for example, only 57% of human protein-coding genes have two or more human-curated GO annotations. Structured data for diseases are even less complete. As centralized biocuration efforts struggle to keep up with the rate of biomedical data generation, new models for gene annotation need to be explored.

Recently, online games have emerged as an effective way to recruit, engage and organize contributors to help address difficult challenges like online image tagging (ESP Game), protein folding (Foldit), or multiple sequence alignment (Phylo).

We present here two online games - Dizeez and GenESP - aimed at identifying novel gene-disease annotations, i.e. gene-disease links well established in the literature, but not yet reflected as structured annotations. Both designs allow players to select a specific area of biology (for example, by disease or protein family) that best matches their expertise. Dynamically generated summaries at the end of the game show supporting evidence for each gene-disease association played. Users can review the game log and even suggest new supporting evidence.

We provide preliminary results from game play online and at scientific conferences. These data suggest that even after limited game play, novel gene-disease annotations can be mined from game playing logs.

Both games are available at http://genegames.org.

TOP

I04 - TSRchitect: A tool for defining promoter architectures using Transcription Start Site (TSS) data

Short Abstract: Rigorous analyses of gene expression profiles have proven capable of yielding important insights into the nature of cis-regulation of genes. Owing to rapid developments in sequencing technologies, large amounts of global transcriptome data in Eukaryotes have been generated and made available, including those using approaches that precisely define the Transcription Start Site (TSS) of mRNAs. These studies demonstrate that eukaryotic genes typically utilize diverse positions of 5’ ends. Recent studies in human, mouse and D. melanogaster provide evidence that the shape of transcription initiation for a gene is associated with its promoter architecture and expression plasticity, but this remains incompletely understood. As such, identifying and annotating the Transcription Start Region (TSR)- the genomic region that gives rise to the mRNAs produced for a gene- is of biological utility.
However, the complexity of TSS data, which includes cases of alternative promoter usage and transcripts of dubious function, makes defining TSRs challenging. Here we present a tool, TSRchitect, that annotates TSRs at genome scale from multiple sources of TSS data using a clustering algorithm. As validation for this method, we show evidence, from budding yeast and other eukaryotic taxa, that the TSR annotations generated by TSRchitect are consistent with the current genomic understanding of transcription initiation and promoter architecture, and that it represents an improvement relative to conventional numerical approaches.
Presently in the beta phase of development, we intend to release TSRchitect as a freely available analysis package in mid-to-late 2012.

TOP

I05 - Enhancing Gene Annotation in Castor bean through RNA-Seq

Short Abstract: Castor bean (Ricinus comunis) seeds are the source of castor oil, a triglyceride in which 90% of the fatty acid chains are Ricinoleic acid. Ricinoleic acid has important industrial applications derived from the production of Nylon-11, a widely utilized polymer for piping of hydraulic fluids in automotive engines. While a draft Castor bean genome is currently available, its gene annotation is still far from complete. In this study we revise Castor bean’s gene annotation and investigate alternative splicing in five tissues shown to exhibit differences in lipid metabolism. Transcripts were reconstructed through the reference-based assembly of RNA-seq reads and subsequently enhanced with additional data sources available from public repositories, including ESTs, cDNAs and 454 data. We present the revised set of annotated genes with extensive data for alternative splicing, intergenic expression and fusion transcripts. Split-merge events and other relevant features were assessed, including summary statistics of newly discovered exons, complete ORFs and splice variants.

TOP

I06 - The Ruby UCSC API: accessing the UCSC Genome Database using Ruby

Short Abstract: Background
The University of California, Santa Cruz (UCSC) genome database is among the most used sources of genomic annotation in human and other organisms. The database offers excellent web-based graphical user interface (the UCSC genome browser) and several means for programmatic queries. A simple application programming interface (API) in a scripting language aimed at the biologist was however not yet available. Here, we present the Ruby UCSC API, a library to access the UCSC genome database using Ruby.
Results
The API is designed as a BioRuby plug-in (Biogem) and built on the ActiveRecord 3 framework for the object-relational mapping, making writing SQL statements unnecessary. The current version of the API supports databases of all organisms in the UCSC genome database including human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast.
The API uses the bin index—if available—when querying for genomic intervals. The API also supports genomic sequence queries using locally downloaded *.2bit files that are not stored in the official MySQL database. The API is implemented in pure Ruby and is therefore available in different environments and with different Ruby interpreters (including JRuby).
Conclusions
Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby UCSC API will facilitate biologists to query the UCSC genome database programmatically. The API is available through the RubyGem system. Source codes and documentations are available at https://github.com/misshie/bioruby-ucsc-api/

TOP

I07 - Microbial Genome Annotation using EcoGene 3.0 and EcoGene-RefSeq

Short Abstract: EcoGene is an annotation database for E. coli K-12 that utilizes the genome sequence as a framework for organizing the large amount of new information continuously published about the structure and function of this well-studied model bacterium. The manually updated EcoGene database tables are used as the source files for updating the Genbank U00096 genome record in collaboration with staff annotators at the NCBI, EcoCyc and UniProt. A comprehensive review of gene models revised over 700 ORF start codons, identified and categorized pseudogenes, and found numerous novel small proteins, leading to a more precise representation of the E. coli K-12 proteome. Both experimental evidence and predictions are indicated for start codons, lipoproteins, signal peptides, biochemical functions, etc. For example, the Verified Set is a collection of over 900 mature protein N-termini determined by Edman protein sequencing collated from the biomedical literature.
EcoGene.org 3.0 is a complete renovation of EcoGene.org utilizing the Drupal content management systems. New functionality includes Boolean queries, including interactive Venn diagrams, among (1) genesets organized as transcriptionally-regulated genes in EcoArray, (2) genesets clustered within hundreds of EcoTopics including a variety of omics datasets, and (3) user-specified genesets. Data integration includes importing some data, such as the experimentally supported TFBSs in RegulonDB. A database of gene identifiers from many sources enables hyperlinks to a large number of external resources from GenePages. Ongoing developments include representation of the E. coli pangenome within EcoGene, and EcoGene-RefSeq, a platform for organizing and viewing over 1500 bacterial genomes in NCBI’s RefSeq.

TOP

I08 - Nonparametric Bayesian Method of Transcription Start Site Prediction

Short Abstract: In this presentation we will give an introduction of basic ideas behind Nonparametric Bayesian approach of estimating an unknown probability distribution and demonstrate its utility for biological data analysis.The general statistical problem is to estimate an unknown probability distribution F of a population of nonlinear models from noisy data. There are many important traditional applications of this problem: clinical trials, pharmacokinetics, pharmacodynamics. We expend the framework for reliable prediction of transcription start site. Since F|Data is infinite dimensional, its calculation requires some approximation. A very elegant approach to this problem was developed by Sethuramam called "Stick-Breaking". This represents F as an infinite series of delta functions with random support points and random weights. We used the Stick-breaking approach to find the optimal number of support points for the probability distribution and to determine the number and location of peaks in the distribution. We used JAGS and OpenBugs to implement the Gibbs sampler.
Peaks in the probability distribution correspond to the alternative positions of Transcription Start Site (TSS). We applied this approach to the analysis of Arabidopsis thaliana and Oryza sativa promoters and classified all genes by the number of alternative TSS and demonstrated relationship between functional properties of genes (using GO annotation, gene length, number of exons, nucleotide composition and gene methylation) and number of alternative TSS. We have developed parallel implementation of our algorithm and demonstrated utility of High Performance Computing for this analysis. We have also developed a database of plant transcription start sites.

TOP

View Posters By Category

Search Posters:

TOP