Posters

20th Annual International Conference on
Intelligent Systems for Molecular Biology

Posters

Poster numbers will be assigned May 30th.
If you can not find your poster below that probably means you have not yet confirmed you will be attending ISMB/ECCB 2015. To confirm your poster find the poster acceptence email there will be a confirmation link. Click on it and follow the instructions.

If you need further assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category D - 'Epigenetics'

D01 - RegPrecise: a knowledge base of transcriptional regulatory networks in Bacteria

Short Abstract: We have developed the RegPrecise database (http://regprecise.lbl.gov) for capturing, visualization, and analysis of computationally predicted regulons in microbial genomes. The primary object of the database is a single regulon in a particular genome. Each regulon description contains a regulator, its effector, a set of target genes and operons, and their associated cis-regulatory sites. Each transcription factor regulon also has a DNA-binding site model (profile) represented as a nucleotide logo. Bacterial regulatory networks are highly flexible in evolution of microbial genomes. Central strategy for regulon analysis in microbial genomes implemented in RegPrecise is based on subdivision of all microbial species into small taxonomic groups that are analyzed independently. The current version of RegPrecise contains 13 taxonomic collections of regulons covering major phyla of Bacteria. The total number of regulons in the current release exceeds 7000. RegPrecise provides three classifications of regulons implemented as controlled vocabularies: (i) biological processes /metabolic pathways; (ii) effectors /environmental signals; (iii) regulator families. Biological processes attributed to regulons in the database covers a wide spectrum of the cellular metabolism. The current list of effectors of analyzed regulons includes more than 200 metabolites from the following major classes: amino acids, carbohydrates, nucleotides, lipids and fatty acids, co-enzymes, peptides and antibiotics, secondary metabolites, and inorganic chemicals. We are also planning to conduct the large-scale assignment of confidence levels to the predicted regulons based on Available experimental evidences on known regulatory interaction are being integrated from other web-resources such as EcoCyc, RegulonDB, CoryneRegNet, DBTBS, and RegTransBase.

D02 - Hidden rearrangement breakpoints in genomes

Short Abstract: Genome rearrangement plays a fundamental role in biological processes including cancer, gene regulation, and development. A better understanding of genome rearrangement is expected to lend insight into these biological processes. Genome alignment can identify the conserved segments and breakpoints of rearrangement among two or more genomes. When gene gain and loss occurs in addition to rearrangement, breakpoints of rearrangement can exist that are only detectable by comparison of three or more genomes. An arbitrarily large number of these "hidden" breakpoints can exist among genomes that exhibit no rearrangements in pairwise comparisons.By extending a recent solution to the breakpoint median problem, we demonstrate that it is possible to count the number of these hidden breakpoints in alignments of three or more genomes. Using simulated genome evolution, we measure the abundance of hidden breakpoints. We apply current multiple genome aligners to the simulated genomes, find that all aligners introduce a high degree of error in hidden breakpoint counts, and that this error grows with evolutionary distance in the simulation.Our results suggest that hidden breakpoint error may be pervasive in genome alignments, and studies of the relationship between genome rearrangement history and biological processes could be improved by considering this error.

D03 - Towards a model of transcriptional regulation of energy metabolism in Desulfovibrio vulgaris

Short Abstract: The ultimate goal of reconstruction of a transcriptional regulatory network is to gain insight into the regulatory program of a cell, and to build a predictive model of cellular response to different environmental conditions. In this study we made the first attempt to transition from the inference of individual regulons to building a regulatory model of a specific metabolic process. We used the existing genome annotation of D. vulgaris str. Hildenborough in conjunction with published experimental data as a starting point for regulon inference. Using comparative genomics approach, we reconstructed ten novel regulons controlling energy metabolism: i) global regulon Rex controlling sulfate reduction, ATP biosynthesis and electron transport genes; ii) Rrf2, TmcR and OhcR regulons of electron transport systems; iii) LldR, DVU2275, DVU2827 regulons of lactate oxidation; iv) HrsM and ModE regulons containing metalloenzymes of energy metabolism (regulated by availability of selenium and molybdenum/tungsten); and v) putative regulon of formate metabolism. Transcriptional regulatory model of energy metabolism includes ten new and two previously described regulons (CooA and DVU3023), and covering a majority of known energy metabolism genes.

D04 - Very Large Scale Operon Predictions via Comparative Genomics

Short Abstract: Operons are polycistronic transcript in prokaryotes in which multiple genes arranged in tandem in the same stand of DNA and are involved in the same biological pathway. Knowing the operon structures in a genome could provide insight into the functions and cis-regulatory elements of genes, and reconstruction of biological pathways in the cells. Therefore, successful prediction of operons is very important for improving the functional annotation of genomes.
We aimed at predicting operons in 1499 published prokaryotic genomes as of January 2012 deposited in the NCBI database. This amount of data is almost impossible to be processed using traditional computing powers and requires new high performance computing models to deal with. We will illustrate the use of Microsoft Azure to predict operons in all sequenced prokaryotic genomes with improved prediction accuracy. The genomic features we analyzed across all prokaryotic genomes include intergenic distance, phylogenetic profile, conservation of gene pairs, functional annotation similarity, expression pattern correlation, and shared TFs. In order to investigate these features, we needed first to find orthologous genes in among all genomes using the bidirectional best blast hit method.
We used an HPC cluster deployed on Microsoft Azure with 290 nodes and several terabytes of shared storage to run parametric sweep workloads in multiple sequential steps that started by obtaining and creating BLAST databases for every 1499 genomes; then running more than 2.2 million pairwise BLASTs; and creating orthologous gene pairs; and calculating the features. We present results of the analysis in detail in the poster.

D05 - Comparative analysis of Dermatophyte genomes reveals the expansion of effector genes

Short Abstract: Dermatophytes are the most common human fungal skin pathogens. An estimated ¼ of the worlds population has experienced skin infections caused by these phylogenetically related fungi, however, little is known about the cellular or biological nature of these fungi. We have sequenced and analyzed the genomes of seven of the most common Dermatophytes that infect humans and animals. This analysis revealed a novel family of genes that have expanded in copy number in dermatophytes. This gene family is predicted to be secreted and contains the LysM domain that usually binds to peptidoglycans. In plant pathogenic fungi LysM proteins may mask the fungus from the host immune system by binding chitin; these proteins could perform an analogous role in dermatophytes. The LysM domain is dynamic, appearing in combination with other domains as well as in tandem copies. Traditional comparative approaches can miss this mode of gene evolution as most do not consider domain duplication at the sub-gene level. Here we present the findings of our comparative analysis from an evolutionary perspective and show how gene level comparisons can be extended by consideration of domain order and multiplicity.

D06 - Detecting rare copy number variations (CNVs) with sparse coding on FF and FFPE

Short Abstract: High-density oligonucleotide genotyping microarrays, especially Affymetrix SNP6 chips, are widely used for high-resolution copy number analysis. In order to identify CNVs more reliable, we have proposed a Maximum a posteriori factor analysis model called cn.FARMS. The latent variable, the factor, captures the simultaneous increase or decrease of DNA amount at neighboring chromosome locations measured by the intensity of oligonucleotide probes. This increase or decrease indicates amplification or deletion of a DNA region that is a CNV. cn.FARMS considerably reduces the false discovery rate (FDR) by combining adjacent chromosome locations to an ensemble voting (agreement of multiple measurements) instead of relying on a single measurement as other methods do.

Standard factor analysis assumes a Gaussian factor distribution which, however, is a wrong assumption for CNVs. Redon et al. 2006 showed that most CNVs affect less than three individuals out of 270 HapMap samples. These rare events are hard to detect by cn.FARMS as they would be interpreted as noise. Therefore we propose a factor analysis model with a Laplacian prior, which leads to a sparse factor distribution.

We have applied the Laplacian cn.FARMS model on the HapMap dataset to detect CNVs. We could verify most of published copy number variable regions and found new ones. However many known CNVs seem to be false positives. Furthermore we have tested the algorithm on FFPE samples and present a new analysis pipeline concerning the preprocessing.

D07 - miRNA SNiPer: Tool for identification of genetic variations in microRNA genes

Short Abstract: MicroRNAs (miRNAs) are a class of non-coding RNA that plays an important role in posttranscriptional regulation of mRNAs. Evidence has shown that miRNA gene variability might interfere with its function resulting in phenotypic variation and disease susceptibility. A major role in miRNA target recognition is ascribed to complementarity with the miRNA seed region that can be affected by polymorphisms. In the present study, we developed an online tool for the detection of miRNA polymorphisms (miRNA SNiPer) in animals. MiRNA SNiPer was developed using data assembled from miRBase, TargetScan and Ensembl Variation database. The assemblies of selected databases were downloaded and locally inserted into a MySQL database. The tool is implemented as a CGI (Common Gateway Interface) script written in Perl. Script triggers SQL commands to the MySQL database to perform the searches of variations within miRNA genes. Display settings enable the miR-SNPs to be arranged according to their location in miRNA precursor, mature or seed regions. The search for miR-SNPs in the seed region (miR-seed-SNPs) was performed in sixteen species: human, macaque, mouse, rat, horse, pig, opossum, platypus, anole lizard, chicken, zebra finch, medaka, tetraodon, zebrafish, ciona savignyi, and fruitfly. In six species (human, mouse, rat, chicken, zebra finch, and fruitfly) polymorphisms overlapped with miRNA seed regions. These polymorphisms included SNPs, double nucleotide polymorphisms (DNPs), and insertion/deletions (indels). The tool miRNA SNiPer is available at http://www.integratomics-time.com/miRNA-SNiPer/. This web-based public resource will enable faster and more targeted studies on miR-SNP biology.

D08 - Exploring Gene Environments with GeneRiViT

Short Abstract: Rearrangement of genes in prokaryotes is an important indicator of evolutionary origin and functional potential. The ability to easily compare gene neighborhoods is useful and necessary for comparative genomics. The Genomic Ring Visualization Tool (GeneRiViT) provides a high speed, intuitive visualization tool for investigating sequence environments of conserved genes among related genomes.
GeneRiViT allows researchers to interact with interconnected global and local visualizations of gene neighborhoods and gene order, through a web-based interface that is easily accessible in any browser. The primary visualization is a wheel of nested rotating circles, each of which represents a single genome. By focusing on a local region of interest in a specific organism and using orthology connections to highlight corresponding structures between genomes, this view provides insight into gene context and preservation of neighbor relationships as genomes evolve. Visualizations are linked into a coordinated multiple view interface to provide multiple selection methods and entry points into the data. These approaches make GeneRiViT a flexible, unique tool for examining gene neighborhoods that improves on existing methods.
GeneRiViT uses cutting-edge technologies that enable quick, interactive data analysis. The database uses an OLAP model for fast execution of complex analytical queries and easy incorporation of new data, which is a major limitation of existing gene neighborhood tools. The database’s efficiency is complimented by fast and asynchronous Node JS middleware that facilitates data processing and communication between the database and the HTML5 Canvas/Javascript interface. These technologies let GeneRiViT handle large datasets and instantly make new analyses upon request.

D09 - Comparative analysis of 46 Brucella genomes using the GenoSets analysis platform

Short Abstract: Comparative analysis of several microbial genomes provides insight into pathogenicity, functional diversity, and genes required for different environmental niches. The GenoSets software system combines genome annotation, ortholog clustering, Gene Ontology (GO) classification, and gene set enrichment analysis into a single system. The system automates the process of gathering the information and presents the data in several multi-genome visualizations. The system is flexible enough to incorporate information on complete and draft genomes at several levels of the assembly and annotation process and is not limited to pre-selected genomes.

The GenoSets software system has the primary objective to aid in the discovery of key features that are potentially important in differentiating related microbial organisms. The rich visualizations provide novel entry points into the database that extend beyond keyword searches. There are several visualizations that provide both high-level and detail views in a multi-genome context. The interactive capabilities allow the user to identify sets of interest and perform ad-hoc analysis on these subsets.

GenoSets was used to analyze 46 complete and draft genomes of the Brucella genus. Although the species and subgroups within this genus are closely related at the gene content level, the analysis identified key genes that may be associated with host preferences and symptoms of infection among species and subgroups.

The software and open-source code can be downloaded from http://genosets.uncc.edu.

D10 - Comparative Study of miRNA Regulatory Elements in Cereal Genomes

Short Abstract: Studies concerning transcriptional regulation of miRNAs genes have mostly focused on locating their promoters in plants. Little is known about the regulatory elements in their upstream region. Identification of these elements helps us to have a better understanding of the role of miRNAs in the regulatory networks. Here we have studied the existence and abundance of putative miRNA specific regulatory motifs that are conserved in cereal genomes. We initially found the orthologous miRNA genes among maize, sorghum and rice using both sequence homology and synteny data. Next we searched for motifs occurring in the area 500 bp upstream from maize, sorghum and rice orthologous miRNA precursors. To test the statistical significance of these motifs, we used the promoter region of rice protein coding genes as the background. Then we calculated the frequency of the significant motifs. Overall, the motif frequencies in maize-sorghum orthologous pairs were closer to each other compared to the corresponding numbers in maize-rice pairs. We also searched for the regulatory elements in the upstream region of maize and sorghum cold associated miRNAs and found 2 significant motifs in both. Next we mapped these motifs against AGRIS motif database. We found that they have been reported in the promoter of light and stress responsive genes before. To conclude, our results show that some miRNA regulatory elements are conserved in phylogenetically distant species and might have a critical role in miRNA regulatory networks. Also these data provides important knowledge about how stress might affects miRNAs in cereals.

D11 - RNA Structural Homology Searches in Bacterial Genomes using Single Sequences and Experimentally Determined Secondary Structures

Short Abstract: Increasingly biological information is stored in large online sequence and annotation databases. These databases provide excellent accessibility to primary sequence data and also serve as a collective memory where genomic annotations summarize knowledge gained from a large community of researchers. While this system has been very successful for distributing knowledge concerning protein-coding sequences, annotation of functional RNA sequences within bacterial genomes is still lacking due to the specialized tools required for RNA secondary structure-based homology searches. The Rfam (RNA families) database provides the necessary structural models for many biologically important RNAs, however the creation of such models significantly lags discovery, especially for the many biologically relevant RNA structures identified before the advent of comparative genomics. Many such RNAs risk being lost from the collective memory as they are currently unrepresented in Rfam and thus other online sequence databases. Due to our interest in the regulation of ribosomal proteins, we have created RNA alignments corresponding to the nine regulatory RNA structures associated with ribosomal proteins in E. coli. Starting from single sequences and experimentally determined secondary structures we built RNA sequence alignments and models to allow us to annotate these RNAs in bacterial genomes and to track their phylogenetic distribution. In order to accomplish this we utilized a number of different strategies to both decrease computational search time and increase the sensitivity of our searches to allow identification of weakly conserved homologues.

D12 - Finding Transcription Factor Binding Sites in the Twilight Zone

Short Abstract: Identification of transcription factor binding sites is an important problem in molecular biology. Transcription factor binding sites play an important role in transcriptional regulation, and existing research has found that they can vary significantly between related genomes. Zheng et al. (2010) performed ChIP-seq experiments on two strains of S. cerevisiae, and found many variations between transcription factor binding sites. We found additional conserved binding sites by analyzing the data from multiple experiments jointly.

Our algorithm compares binding site peaks found with Zhang et al.’s (2008) Model-based Analysis of ChIP-Seq (MACS) tool in multiple ChIP-seq datasets using a local alignment of binding sites from each genome. We use the binding sites that are found in only one genome as prior information to identify weakly conserved binding sites in the other genome. We analyzed the ChIP-seq data to find additional signal in the noise by performing a statistical analysis of ChIP-seq reads and by adjusting the MACS parameters to allow a higher false discovery rate. We were able to find evidence of new binding sites that are not significant enough to be found by traditional algorithms.

D13 - Utilising domain architecture content to analyse the evolution of cell types using transcriptome data

Short Abstract: A great deal of focus has been placed on the nucleotide and gene level analysis of next generation sequencing data as it has become more widely available. Whilst this has revealed many interesting features of the transcriptomes of various organisms, the wealth of information that is already known about protein structure and sequence has not been fully exploited.
We present 'TraP', an analysis pipeline for studying the protein domain content of any transcriptome. This pipeline uses SUPERFAMILY HMMs to produce a set of expressed protein domain architectures for any transcriptome data set. By studying transcriptomes at the domain architecture level we are able to comment of the evolutionary past of different cell types from the same organism as well as identifying both cell type and evolutionary time point specific domain architecture expression. Also by making use of both gene ontology and the SUPERFAMILY domain-centric gene and phenotype ontologies we can track the evolutionary history of known processes in a way that previously has not been possible.
This resource allows us to comment on how different cell types have evolved in humans and identify cell types that are more ancient in their domain architecture use than others. Details such as these are critical in order to fully understand how cell types have evolved as well as their relationships to each other in terms of their lineage.

D14 - The apicomplexan kinome: Characterizing the protein kinase superfamily in the malaria parasite and its evolutionary relatives

Short Abstract: The parasitic protozoans that constitute the Apicomplexa cause devastating human and veterinary diseases, including malaria. We characterized the eukaryotic protein kinases, an enzyme superfamily already successfully targeted in cancer and diabetes, in 17 apicomplexan proteomes to identify novel, parasite-specific kinase subfamilies and structural features which can be new targets.

D15 - Visualizing comparative genomic data by functional grouping across taxa to facilitate biological hypothesis building

Short Abstract: Many comparative genomics analyses compare repertoires of genes between multiple genomes to infer evolution/function. When gene products are clustered into functional categories, e.g. complexes/pathways, and the taxonomic patterns of retention/loss compared, the results are highly informative. However, to rapidly comprehend such datasets for hypothesis testing, it is necessary to display these data in an abstract visual format that represent these relationships. We devised the Coulson plot, and an application to generate the plots from text file inputs, to represent protein complexes or pathways as a pie chart matrix. Each pie represents the components of a complex or pathway, and each complex has its own complement of genes represented by colored (present) or open pie segments (absent/not found). Small pie charts arranged as a matrix allow the investigator to visualize multiple complexes which can be compared across many taxa. Organization by taxa of functionally related complexes can then reveal informative patterns of retention or loss at any taxonomic level. We present several examples of the use of this format and the immediate impact that facilitates rapid understanding of underlying evolutionary histories of specific cellular pathways, the production of novel biological hypotheses and laboratory-based experimental analysis. Examples that will be discussed include the ESCRT (endosomal-sorting complex required for transport) system, important in protein turnover and signal transduction modulation, the clathrin-associated adaptin complexes and the vesicle tethering systems. We suggest that this new graphical format will accelerate discovery of patterns in large comparative genomic datasets.

D16 - Comparative analysis and visualization of bacterial pan-genomes

Short Abstract: After the first complete genome of a free-living organism, Haemophilus influenzae, was sequenced whole-genome sequencing became a standard method for genome analysis. At the beginning of the genomic era, it was thought that a single representative isolate was sufficient to describe the genetic complexity of a species. More recently, multiple isolates of the same species have been sequenced and analyzed. It is now known that intra species variation can be as significant as interspecies diversity. Bacterial genomes from various strains of the same species can vary considerably in genome size, nucleotide composition and gene content. The pan-genome has been defined as a super-set of all genes in all the strains of a species that includes the "core genes" present in all strains; "accessory genes" present in two or more strains, and finally "unique genes" specific to single strains.

NCBI is working on the tools for comparative analysis and visualization of bacterial genomes at various levels of diversity, organizing bacterial strains into robust phylogenetic clades, and calculating protein clusters within clades for pan-genome analysis. The interactive tools we have developed allow comparison of selected genomes using data from multiple databases and various displays for data visualization at different level of details.

D17 - The Kinome as a model for exploring gene evolution, orthology, and improved gene predictions

Short Abstract: Protein Kinases constitute ~2% of genes in most eukaryotes, and are critical regulators of almost all biological processes. They offer a focused but substantial model family to explore gene evolution, and to exploit comparative genomics for biological understanding and therapeutic use. We have curated the kinomes of diverse eukaryotes, using our findings to improve automated methods to predict genes, determine orthology, and identify the sequence changes that underlie functional changes between species.

Over 90% of human kinases have distinct homologs across all vertebrates, providing a rich catalog to examine the evolutionary constraints at every residue of each protein kinase, how they evolve with time, and how they can help in evaluating coding SNPs and cancer mutations. Using orthology, we can substantially improve upon current gene predictions: within 5 fish genomes, we detect 12% more kinases than public catalogs, and estimate that we extend current predictions by over 30%, while removing about 8% of likely falsely-predicted sequence. We are automating this methodology and applying it to whole-genome gene predictions across 15 nematode species.

We are exploring patterns of kinase gain and loss within metazoans, querying the biological and genomic consequences of losing a gene in one lineage that is essential in many other lineages. While many losses remain cryptic, we are finding cases of losses of interaction motifs, phosphorylation sites and entire pathways, and expect that such ‘evolutionary knockouts’ will be an important tool to understand gene function.

D18 - A tree construction of all 18 insects complete proteomes reveals extensive expansion and shrinking of proteins families

Short Abstract: Insects are among the most diverse groups of animals, including more than a million described species. They represent over 90% of the metazoan species and contribute to many ecosystems. The insects genomes that were sequenced includs Diptera genomes (16, e.g., fruitfly and mosquito) and Hymenoptera (9, e.g., ants). Proteomes were downloaded from the Hymenoptera Genome Database and from UniProtKB. Functional assignment was performed by PFAM, Phobius and Clantox classifiers.
ProtoNet (www.protonet.cs.huji.ac.il) platform provides an unsupervised hierarchical clustering of all proteins. A total of 300K proteins from 17 complete insects proteomes and Daphnia pluex as an outgroup, were clustered using ProtoNet. All analyzed proteins were clustered into 11K stable families. We developed a systematic methodology to evaluate the imbalance among species, as represented by their protein families. Such an imbalance is a reflection of the evolutionary history of massive gene loss and duplications. We identified protein families which had went through an extensive expansion in only some of the species. We noted that the level of imbalance is remarkably high for membranous (with transmembrane domain) and secreted proteins (with signal peptides).
The Apis Mellifera (Honey bee) was exceptional in view of other insects, by having high number of isolated proteins in the tree. This observation is in accord with a high sequence diversity of the honey bee. Our methodology can be used for identifying functional families that are specialized only for specific species.
ProtoBug (www.protobug.cs.huji.ac.il) is a resource and querying system that supports a specie view for proteins families and functions.

D19 - Detecting copy-number aberrations with a low false discovery rate

Short Abstract: A low false discovery rate (FDR) at the detection of copy-number aberrations (CNAs) in microarray data ensures sufficient detection power and prevents failures in CNA-disease association studies. We obtain a low FDR at the detection of CNAs in microarray data by a probabilistic latent variable model, called ‘cn.FARMS’.

TOP

View Posters By Category

Search Posters:

TOP