HOME

Tweets by @ISMBinfo

Accepted Posters

Attention Conference Presenters - please review the Speaker Information Page available here.

If you need assistance please contact submissions@iscb.org and provide your poster title or submission ID.

Category C - 'Education'

C01 - Phylogeny-wide Discovery of Bacterial Transcription Factor Binding Motifs by Protein Family-based Approach

Maxim Shatsky, Lawrence Berkeley Lab, United States

Short Abstract: Reconstruction of high quality genome-scale gene regulatory networks (GRNs) remains challenging even when gene expression data are available. We developed an automated method for phylogeny-wide discovery of transcription factor (TF) biding motifs across all bacterial genomes to produce starting points for regulon identification. Our approach, SEFMA (Simultaneous Entire-Family Motif Analysis), is based on the simultaneous analysis of all regulators from a given TF family. SEFMA clusters TFs into orthologous groups and applies de novo motif discovery in the TF upstream regions; thus, the discovered motifs may be represented by species that are evolutionary very distant. This is contrast to the existing methodologies that require a collection of sequenced genomes neighboring the genome of interest. Thus, we were able to identify TF binding sites across many microbial genomes. We demonstrate an improvement in discovery of local TF binding sites over current state of the art approaches. We tested SEFMA on experimentally obtained TetR binding sites as well as on binding sites of many TF families from the RegPrecise database. We applied SEFMA to identify binding sites of metal-sensing regulators in Pseudomonas stutzeri and to discover a novel fatty acid metabolism regulon in Gammaproteobacteria.

C02 - Common organizational and regulatory features of the human and mouse transcriptomes

Dmitri Pervouchine, Centre for Genomic Regulation and UPF, Spain

Short Abstract: Recent comparisons of human and mouse genomes revealed significant levels of sequence and organizational similarity. As part of a continuing effort to expand the Encyclopedia of DNA Elements (ENCODE), here we have extensively studied the transcriptomes of the two species by RNA-seq using a panel of diverse tissues and cell lines and extended the existing catalogues of orthologous transcriptional elements (long non-coding RNAs, pseudogenes, exons, and splice junctions). We have found that many features associated with RNA production exhibit significant similarity between human and mouse. Specifically, we report strong statistical evidence for conservation of: transcriptional levels at orthologous loci the inclusion and processing of splice junctions at orthologous sites transcription at intergenic regions, antisense transcription, and chimeric transcription. Notably, the degree of conservation of many transcriptional features is independent from the underlying degree of sequence similarity at regulatory regions, as well as from the biological origin of RNA samples. We also found significant associations between the differences in histone modification levels and the differences in gene expression levels at orthologous loci. These results suggest that the epigenetic layer of regulation has contributed to maintaining mammalian transcriptional programs robust to sequence changes in regulatory regions — making shared transcriptional features not easily traceable to genomic sequence conservation.

C03 - Prediction of a new protein family involved in sulfur metabolism in hyperthermophilic sulfur-reducing archaea

Yan Zhang, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, China

Short Abstract: Selenium (Se) and sulfur (S) are closely related elements that exhibit similar chemical properties. Some genes related to S metabolism are also involved in Se utilization in many organisms. However, the evolutionary relationship between the two utilization traits is unclear. In this study, we conducted a comparative analysis of the selenophosphate synthetase (SelD) family, a key protein for all known Se utilization traits, in all sequenced archaea. Our search showed a very limited distribution of SelD and Se utilization in this kingdom. Interestingly, a SelD-like protein was detected in two orders of Crenarchaeota: Sulfolobales and Thermoproteales. Sequence and genome context analyses revealed that SelD-like protein newly evolved from SelD and might be involved in S metabolism in these S-reducing organisms. Further genome-wide analysis of patterns of genes occurrence in different thermoproteales suggested that several genes, including SirA-like, Prx-like and adenylylsulfate reductase, are strongly related to the SelD-like protein. Finally, we proposed a simple model wherein the SelD-like protein may play an important role in the biosynthesis of certain thiophosphate compound. Our data suggest new genes involved in S metabolism in hyperthermophilic S-reducing archaea, and provide a new window for understanding the complex relationship between Se and S metabolism in archaea.

C04 - ProtoBug: Automatic Classification of Insects’ Complete Proteomes

Nadav Rappoport, The Hebrew University of Jerusalem, Israel

Short Abstract: Insects are the most diverse clade among animals. There are 1 million species of insects according to a conservative estimate. At present already 40 genomes of insects were completely sequenced; many belong to the Drosophilidae family. We collected 300,000 proteins from 17 fully sequenced representatives that capture the diversity of the clade. We included the Arthropod Daphnia pulex as outgroup for the analysis. A classification procedure was applied based on the similarity distance measurements among all proteins. Using the hierarchical method of ProtoNet, all the sequences were clustered to produce 20,000 protein families. We took advantage of the completeness of the 18 species to estimate the degree of protein gain and loss along speciation. We found that when comparing the Hymenoptera (i.e., ants, honeybee, wasp) to Diptera (i.e., fruitfly, mosquitoes), the former had a higher rate of proteins’ gain and loss. In addition we identified 650 families that were significantly expanded or shrank for at least one of the species. For a better understanding of the functional relevance of our findings, we associate each sequence with the appropriate Pfam keywords. Each protein family was named according to the dominant annotation of its proteins. We found that families that were maximally changed during speciation are enriched with DNA binding domains and surface receptors. We conclude that innovation of transcriptional regulation and cell communication functions underlies the diversity of the insects’ clade. We present ProtoBug as a resource and as an unbiased approach for classification and automatic annotations of insect proteomes.

C05 - MLGO: Maximum likelihood phylogeny reconstruction and ancestral genome inference from Gene-Order Data

Jijun Tang, University of South Carolina, United States

Short Abstract: As more and more whole genomes are sequenced, using gene order data for phylogenetic analyses and ancestral reconstruction is attracting increasing interest. Comparative genomics, evolutionary biology, and cancer research all require tools to elucidate the history and consequences of the large-scale genomic changes, such as rearrangements, duplications, losses. However, using gene-order data has proved far more challenging than using sequence data as existing methods have problems on scaling and accuracy, as well as difficulties in handling complex events such as indels and duplications.

MLGO (Maximum Likelihood for Gene-Order Analysis) is the first web tool that is suitable for large scale genomic analysis that can cope with gene insertions, deletions and duplications. It relies on two methods we have developed: MLWD for phylogenetic reconstruction and PMAG+ for ancestral genome reconstruction. Our tool takes the advantage of binary encoding on gene-order data, supports a fairly general model of genomic evolution (rearrangements plus duplications, insertions, and losses of genomic regions), and successfully accommodates itself into the framework of maximized likelihood.

The results of extensive testing on both simulated and real data show that both MLWD and PMAG+ can achieve great performance, scalability and flexibility, suggesting MLGO a suitable tool for large-scale analysis of high-resolution data.

MLGO is available at http://www.geneorder.org/server.php

C06 - Real-Time Viral Genome Comparison with Hausdorff Distance

Hsin-Hsiung Huang, University of Illinois at Chicago, United States

Short Abstract: Genome comparison plays an important role for classifying and clustering various kinds of organisms including viruses. While alignment-base methods such as Multiple Sequence Alignment approach are improving, alignment-free methods still shed light on prerequisite study due to their less computational costs. Among them, Natural Vector method has been successfully used for classifying single-segmented viruses in Yu et al., 2013. It remains an issue: how to classify multiple-segmented viruses using Natural Vectors. We proposed Hausdorff distance as a measurement to compare different sets of vectors. For example, Influenza virus A has 8 segments, but scientists sometimes find part of these segments. If we only found 6 segments from a new Influenza A virus strain, how could we compare it with other viruses? One solution is comparing them segment by segment. However, in order to use full viral genome information, we would like to compare all segments simultaneously. Currently, the consensus tree method is a widely used tool which can combine phylogenetic trees of segments. The correctness rates of our predictions based on cross-validation are as high as 96.5%, 95.4%, 99.7%, and 95.6% for Baltimore class, family, subfamily, and genus, respectively, which are comparable to the rates for single-segmented viruses only. We compare the classification results with other alignment-free methods. We find that the natural graphical representation based on our Hausdorff distance is more reasonable than the consensus tree for the influenza A H7N9 viruses.

C07 - A Systematic Approach for the Identification of Gene Losses from Genome Wide Alignments

Michael Hiller, Max Planck Institute - Physics of Complex Systems and Molecular Cell Biology and Genetics, Germany

Short Abstract: Losses of ancestral genes in descendant species are not uncommon and some of them are often associated with altered or completely lost phenotypes in such species. For instance, the loss of the ability to synthesize Vitamin C in certain mammals is attributed to the non-functional Gulo gene in their genomes. Pseudogenization of a gene involves the slow accumulation of gene inactivating mutations over millions of years and/or deletion of parts or the entire gene in the descendant species. We have developed a computational pipeline that systematically searches for gene losses across different species. Given a reference species, a list of its functional genes and a genome wide alignment of the reference species with other species, our pipeline is able to identify the different types of gene inactivating mutations that occur in other species’ orthologous genes. We strictly control for assembly gaps, low quality genomic sequences, alignment artifacts and changes in gene structures to avoid mistaking such artifacts for inactivating mutations. Our pipeline is able to correctly identify most of the gene-inactivating mutations in well-known gene loss cases like Gulo and Abcb4 while also finding many new candidate gene losses, hitherto unreported. The design of this pipeline is general, so that it can be applied to other gene set and the corresponding genomic alignments. Therefore, our pipeline will be a valuable tool for the compilation of a gene-loss catalogue for genomes that will be sequenced in the future.

C08 - The predicted protein-protein interactions of Chamydomonas reinhardii: an interolog study

Sheng-An Lee, Kainan University, Taiwan

Short Abstract: Chamydomonas reinhardii (green algae) has been intensively studied in recent years due to its biofuel-producing potential. For the first time, we predicted the protein-protein interactions (PPIs) of this important algae using its 313714 protein sequences available in NCBI.
We used an interolog approach for the prediction. Using BLASTP, the protein sequences were searched against the RefSeq of human, Arabidopsis thaliana, mouse, yeast (both Schizosaccharomyces pombe 972h- and Schizosaccharomyces cerevisiae S228c), Caenorhabditis elegans, and Drosophila melanogaster. When a pair of algae proteins were both found to be highly homologous (with e values smaller than e-10) to a pair of interacting proteins of the above model organisms they are assumed to be interacting proteins. The PPI databases for the above model organism were downloaded from BIOGRID, BIND and HPRD. We obtained 2820184 PPIs (without homodimers) for the green algae. When we place the proteins of these PPIs into homologene groups, we obtained 3456100 PPIs of which 2258368 PPIs were identical to the above. Delta-BLAST search results and Gene Ontology data were also used to evaluate the accuracy of the prediction. The PPIs of important genes were visualized and the topology characteristics were analyzed.

C09 - Towards a Universal model of Gene Block Evolution in Bacteria

Iddo Friedberg, Miami University, United States

Short Abstract: Gene blocks are proximal genes in the bacterial genome that operate together. Special known cases of gene blocks are operons, where the genes are typically under the control of an operator, and co-transcribed as polycistronic mRNA. Several models have been proposed to explain observed patterns of gene block evolution. However, there is no universal method to describe gene block evolution, which is necessary for an overarching examination of the forces shaping this ubiquitous genomic feature. Furthermore, there is no known method to infer the ancestral states of observed gene blocks. Here we provide a method that allows for the interrogation of the evolutionary events that are associated with gene blocks, enabling us to infer their evolutionary conservation and relate it to their function. We also combine information from extant gene blocks to provide detailed history of their evolution across proteobacteria.

C10 - Comparative genomics study of the synonymous codons cor-reltions using an alignment-free ISSCOR method

Dariusz Plewczynski, University of Warsaw, Poland

Short Abstract: Living organisms have very often quite biased preferences for some synonymous codons coding for the same amino acids. These differences and their variation have been extensively studied, however, no decisive governing rules have yet been discovered. Possible factors and forces driving synonymous codons usage postulated so far include, among many others: translational optimization, mRNA structural effects, protein composition, and protein structure, gene expression levels the tRNA abundance differences between different genomes, and tRNA optimization, different mutation rates and patterns. Also, some other possibilities were hypothesized, like local compositional bias, and even gene lengths might play a role too. It is clear, that many interesting biological mechanisms underlie the basic phenomenon of genetic code degeneracy. One of the aspects, however, has not been studied until very recently – the question dealing with possible regularities pre-sent in the sequential order of occurrence of synonymous codons (SC).
In an effort to analyze such sequential orders of SCs we have devised the novel in silico method called ISSCOR (Intragenic, Stochastic Synonymous Codon Occurrence Replacement). There were two basic tenets we designed into this approach: (a) that the Monte Carlo shuffling of synonymous codons must preserve the overall codon usage profile of each sequence, and simultaneously (b) that any such shuffling must do not change the amino acids’ order of a gene. Both the (a) and (b) make this technique particularly well suited for assessing possible fluctuations of, and the outcomes from e.g. the codon bias between genes, groups of genes, or entire genome.

C11 - Role of MORN2 domain in evolution of invasion in Fusobacterium

Abigail McGuire, Broad Institute, United States

Short Abstract: The Fusobacterium genus is a diverse bacterial group representing species implicated in a range of human disorders including periodontal disease, ulcers, Lemierre’s disease, preterm birth, inflammatory bowel disease, and colorectal cancer. Though the virulence mechanisms are not well understood, adherence to and invasion of host cells are key to pathogenicity. Fusobacterium nucleatum actively invades host cells partly through the action of a surface associated adhesin, FadA. While the fadA gene is conserved across other Fusobacterium species capable of active invasion, it is absent from species thought to enter host cells through a passive mechanism, involving synergism with other microbes.

To identify additional genes important in active and passive forms of invasion, we compared the genomes of Fusobacterium species exhibiting differing invasion strategies. We discovered distinct differences in gene content between actively and passively invading Fusobacterium, including differing complements of adhesins, and a striking expansion of MORN2 (Membrane Occupation and Recognition Nexus) domains in Fusobacterium species capable of active invasion. The MORN2 domain represents a previously uncharacterized domain found in predicted extracellular proteins of unknown function. MORN2 genes are distributed throughout the genomes of Fusobacterium species capable of active invasion, but tend to cluster and are often hotspots for genomic variation, varying greatly even among closely related strains. MORN2 genes are also in close proximity to other known virulence-related adhesins, including fadA and radD. Since FadA is an important virulence determinant of F. nucleatum, this suggests a possible association between MORN2 genes and pathogenesis.

C12 - Multi-scale approach to the analysis and presentation of bacterial genome data

Leonid Zaslavsky, National Institutes of Health, United States

Short Abstract: The NCBI bacterial genome collection contains a large number of strains with different levels of sequence and assembly quality and sampling density. We have developed a set of tools for navigation through this complex data. First, the genomes are organized in a hierarchical tree then analysis of protein similarity and variability information is performed at different phylogenomic levels. The genome tree is calculated by distance method using single copy universally conserved ribosomal proteins for the distance measure. Clades are defined as groups of related genomes approximately mapping to the species level as defined by NCBI Taxonomy.

In-clade and global protein clusters are built and links are established between related clusters. The in-clade clustering method combines sequence similarity with genome context to separate paralogs. At the global level, each in-clade cluster is compressed to a single protein representative. For computational efficiency, protein redundancy and near-redundancy is eliminated. Links between related clusters are organized and kept as indexes.

We propose and evaluate several heuristics for presentation of the data at different levels of resolution, eliminating noise and data redundancy while preserving necessary variation details.

C13 - Pan-genome analysis of Saccharomyces cerevisiae

Giltae Song, Stanford University, United States

Short Abstract: With the widespread use of next-generation sequencing, it’s now possible to study population genomics by examining variation between genomes to identify evolutionary patterns. However, the traditional comparative approach is limited by the availability of reference annotations. Using genomes from 25 Saccharomyces cerevisiae strains, we developed an automated process for analyzing sequences for core, dispensable and unique genomic features.

For accurate annotations, we combined Maker ab-initio gene predictions (http://www.yandell-lab.org/software/maker.html) and our own homology-based method using LASTZ (http://www.bx.psu.edu/~rsharris/lastz) and AUGUSTUS (http://augustus.gobics.de). Predicted ORFs were refined by checking for alternative start codons and the highly conserved splicing branchpoint consensus of yeast multi-exonic ORFs, and validated with protein match metrics using yeast and fungal protein databases.

To assign strain-specific functional annotations, we identified unique and dispensable genes that were not present in the reference genome. We classified these according to their presence or absence across strains using a statistical clustering algorithm. We characterized each group of genes with the known functional and phenotypic features. Functional roles associated with strains or groups of strains revealed evolutionary patterns that appear consistent with adaptations in specific lineages.

As more S. cerevisiae strain genomes are released, we will be able to use our automated analysis to reveal further details about population genomics. We have developed a new tool set to enhance our understanding of genomic and functional evolution in S. cerevisiae, which will be available to the yeast genetics and molecular biology community.

C14 - Efficient Analysis of RAD-seq Data of Heterozygous Non-model Species

Olivia Choudhury, University of Notre Dame, United States

Short Abstract: RAD (Restriction-site associated DNA) sequencing has emerged as an efficient and economical technique for discovering SNPs and genotyping in non-model and heterozygous organisms. It is particularly useful in phylogenetics, population genetics, developing linkage maps because it can generate thousands of markers at a lower cost than traditional technologies including whole genome sequencing (WGS). Although genome alignment and variant discovery are the prerequisites for several comparative genomics applications, little work has been done on highly heterogenous samples typical of field samples. Here, we begin a systematic study to ascertain the most efficient and accurate workflow for RAD data using difficult oak data as a test case. Based on very high depth samples of two red oak parents, we report false positives and negatives using a highly customizable bioinformatics framework available from the Galaxy instance. We also leverage genetic data obtained from over 250 progeny of these parents to validate the accuracy and efficiency of potential approaches. This experiment is made possible by an underlying Makeflow and Work Queue framework that can efficiently parallelize the required tasks and reduce the run time for analyzing a population of 252 oak individuals from 5 weeks to 5 days. We conclude with characteristics of an efficient RAD workflow leveraging our past work in cloud computing and requiring specific features of a variant discovery framework as future work.

C15 - Programatic access of semantically linked ENCODE data

Venkat Malladi, Stanford University, United States

Short Abstract: The Encyclopedia of DNA Elements (ENCODE) Project is a large collaborative project. Now in its 9th year, ENCODE has grown to include more than 40 experimental techniques in 400+ cell lines and tissues. All experimental data and computational analyses of these data are submitted to the Data Coordination Center (DCC) for storage, and distribution to community resources and the scientific community. In addition, the DCC is currently integrating metadata from the modENCODE and Roadmap Epigenomics (REMC) projects. We have built a semantically enriched database to store the results of these projects, and an interface to search and retrieve metadata and data. Here, we describe the tools to organize, search and access the data.

ENCODE database is accessible through an interactive web browser and programmatically through an API. The site features faceted browsing of experiments and biosamples, full-text searching of metadata, and direct access to relevant data. The same search functionality is available through a REST API, a standard programmatic interface to the database. To further enhance the analysis that can be performed within and across these projects, the DCC has made use of a structured data model integrated with annotations to multiple ontologies. The implementation of ontologies facilitates researchers identification of experiments within the ENCODE project and across databases (i.e. EBI Biosamples databse:
http://www.ebi.ac.uk/biosamples/
).

Here we present our implementation and give examples of how researchers can use these tools to gain access to ENCODE. These technologies can be previewed at
http://www.encodedcc.org/.

C16 - Conserved transcriptional regulatiory modules in mouse, chicken and zebrafish somitogenesis networks.

Bernard Fongang, University of Texas Medical Branch, United States

Short Abstract: The metameric segmentation of the vertebrate body is established during somitogenesis, when a cyclic spatial pattern of gene expression is created within the mesoderm of the developing embryo. Although several studies have been devoted to the subject, the mechanism controlling vertebrate somitogenesis is still poorly understood. A recent study shows that the generally accepted “clock and wavefront” model involving Notch, Fgf and Wnt signaling pathways is partially redundant and some of its components may be unessential. To identify the essential genes and interactions in the somitogenesis network, we combined two approaches. First, we use model-based timing of transcriptional regulation to precisely identify the moments in time, when the particular genes are active. This method, based on maximum entropy deconvolution, allows to select potential causal dependencies within the underlying genetic, signaling and transcriptional networks. Second, we use a maximum-likelihood approach to compare results of such analyses from three different vertebrate species: mouse, chick and zebrafish. As a result, we obtained the list of causal interactions in the somitogenesis network that are evolutionarily conserved and are thus the primary candidates for the most essential, core modules of the regulatory network. We expect that our results will lead to identifying the regulatory interaction between somitogenesis and other conserved developmental processes, including morphogenesis and regulation of genes in Hox clusters.

C17 - Integrative and Comparative Analyses of Blood Transcriptomic Signatures in Disease: A Cell-Centric View

Naisha Shah, National Institutes of Health, United States

Short Abstract: It is increasingly clear that the immune system and inflammation contribute not only to the pathogenesis of autoimmune and infectious disease, but also to a host of pathologies, including cancer, diabetes, neurodegeneration and other chronic illnesses. Thus, a more comprehensive and comparative characterization of immune signatures of diseases could lead to a better understanding of etiology, biomarker identification, and ultimately, disease prevention and treatment. Blood transcriptomic profiling is a powerful approach to assess the statuses of the immune system, and it has been widely applied to examine diseases. Due to the myriad immune-cell subsets in blood, changes in transcript abundance could reflect alternations in the composition of cell populations, transcriptional changes within cell subsets, or both. While flow-cytometry could be used for assessing cell population abundances, such data is often unavailable. Thus, interpretations of blood profiling results tend to focus solely at the genomics level, but much less so from a cellular perspective.

Here, by leveraging natural variations within a cohort of healthy subjects where blood transcriptomics and 100+ immune-cell subset abundances were measured simultaneously, we derived machine-learning models for predicting cell frequencies using gene-expression information alone. Our approach involves cross-platform normalization of gene-expression data as well as training and cross-validation of Elastic Net models. We next applied these models to predict and compare immune-cell subset alterations across 112 diseases, including lupus, sepsis and autism. We identified a number of immune-cell associations, including ones shared across multiple diseases. Our approach can be applied in other settings to dissect the cellular origin of transcriptomic signatures.

C18 - Genome informatics platform of lepidopteran insects

Masaaki Kotera, Tokyo Institute of Technology, Japan

Short Abstract: Many researchers have studied plenty of insect specimens. However, there have not been many cooperative studies with genome biology. It is essential to provide a wide variety of insect genomic information to the knowledge revealed by long-term insect research in order to enhance the insect research and gain a big social benefit. This study aims at developing a cross-search database for vast amount of insect genes containing public complete genomes, draft genomes, cDNA library, RNA-seq, and cloned genes, with novel sequencing using next generation sequencers.
　Taking all insect taxonomy into consideration, the insect species with complete genomes are limited to a small number of specific groups, and the sequence information of other species are spread across the Internet. It is not yet possible to cross-search the insect genes. In order to better understand the biodiversity of insects, it is beneficial to enable cross-search of vast amount of insect genes even if they are not complete genome sequences. We also intend to add other related information such as taxonomic classification of insects, their feeding habits and their symbionts, hoping for the development of a valuable tool to convert the knowledge of entomology into the genomic-level understanding.

C19 - A Visual Spreadsheet using HTML5 for Whole Genome Display

Nada Alhirabi, Concordia University, Canada & King Saud University, Saudi Arabia, Canada

Short Abstract: By Nada Alhirabi and Greg Butler.
Modern sequencing technology has enabled the cheap, rapid production of
whole genomes. There is a need for tools for the visualization of the data
collected about a whole genome such as genes, proteins, annotations, and
expression data. Two common approaches are the genome browser where
sequence features are displayed as visual elements in tracks and features are
aligned with their genome coordinates, and the spreadsheet where each row
captures the information about a gene and the information is textual in
nature, such as identifiers, descriptions, or sequences.
We have developed CGene, pronounced See-Gene, as a HTML5 web-based
spreadsheet that can incorporate visual displays, as well as text, within the
spreadsheet cells. Current displays use Scalable Vector Graphics (SVG) to
present gene models and protein domain architectures in the spreadsheet.
These are generated from standard GFF3 files and standard output files
from InterProScan.
The spreadsheet displays one row for each gene in the given genome. The
user can select rows of the spreadsheet in order to create a dataset. The user
can also configure the display of the various datatypes, such as the shape,
size, and colour to be used for specific protein domains.
Our research group studies fungal genomes, so CGene is tested by displaying
each of the Aspergilli genomes in the AspGD database (www.aspgd.org).

C20 - Analysis of protein coevolution in different species provides complementary perspectives of protein functional interactions

David A Juan Sopeña, Spanish National Cancer Research Centre, Spain

Short Abstract: Recent developments in coevolution-based methods have improved their predictive performance by efficiently controlling for the effect of indirect spurious evolutionary similarities. These new advances have allowed for the first time to detect high-quality coevolutionary constraints that are excellent predictors of different levels of molecular interactions1-3. Specifically, we have previously shown that protein-protein coevolution can provide high-quality predictions of protein functional associations2.

We have performed a comparative analysis of species-centered protein coevolutionary networks detected for about 20 different bacterial species. We obtained species-focused protein coevolutionary networks based on specifically selected sets of evolutionary-related species. This approach aims to detect protein coevolution implicated in the evolutionary challenges faced by specific sets of related species.

Our results show that our coevolutionary networks provide accurate predictions of protein functional interactions for very different species. Interestingly, we observed that species-centered networks point to very different functional associations. We have explored how this information can help us to understand species-specific associations among different functional processes. Finally, integration of these complementary species-specific networks provides a more complete view of pan-bacterial protein functional relationships.

1 Juan D, Pazos F, Valencia A. 2013. Emerging methods in protein co-evolution. Nat Rev Genet 14:249–261.
2 Juan D, Pazos F, Valencia A. 2008. High-confidence prediction of global interactomes based on genome-wide coevolutionary networks. Proc Natl Acad Sci USA 105:934–939.
3 Weigt M, White RA, Szurmant H, et al. 2009. Identification of direct residue contacts in protein-protein interaction by message passing. Proc Natl Acad Sci USA 106:67–72.

C21 - Transcriptome Profiling of Rattus norvegicus Embryonic Stem Cells by RNA-sequencing

Elizabeth Bryda, University of Missouri, United States

Short Abstract: The first Rattus norvegicus (rat) embryonic stem cells (rESCs) were isolated in 2008 and they promise to become an important tool for producing genetically engineered rat models for biomedical research. Despite their usefulness, little characterization of rESCs has been done and the transcriptome has not been defined. Deep RNA sequencing (RNA-seq) analysis was performed on mRNA from DAc8 the first male germline competent rat ESC line to be described and the first to be used to generate a knockout rat model. Furthermore, Homo sapiens and Mus musculus ESC transcriptomes were determined to gain insight into ESC expression patterns across species. Avadis and Tuxedo pipeline were used to quantify all three species’ transcriptomes. Using orthologues of all three species, Toppfun was used to determine significant gene ontology (GO) terms and pathways. Oct4 expression has been demonstrated to be imperative in order to maintain the ESC state. Using an experimentally determined mESC Oct4 interaction network, all three species’ transcriptomes were compared. To attempt to understand the species’ differences in the Oct4 interaction network, DOMMINO was used to explore protein binding differences. The gene expression profile of these rESCs was determined, novel isoforms were identified, and expression profiles for human, mouse, and rat ESCs were compared. In summary, mouse and human ESCs expressed ~50% more transcripts then rESCs as well as several genes associated with maintaining the ESC state not being expressed in rESCs.

C22 - GenSpec, genome-based species identification for archaea

Chengsheng Zhu, Rutgers University, United States

Short Abstract: Identifying the precise species-level taxonomic placement of a given microorganism has always been difficult. Classification is often accomplished via comparison to known organisms, on the basis of shared morphology and physiology and molecular biomarker similarity. While recent developments in high-throughput sequencing facilitate whole-genome comparison for higher resolution taxonomic assignment this approach is computationally expensive. Thus, research has instead focused on selecting sets of taxonomy-representative genes/phylogenetic markers. Here, using a novel data-driven approach, we identify 70 gene/protein families that are sufficient to train GenSpec, a machine-learning model that accurately assigns archaeal species. We additionally apply feature selection to reduce the number of selected gene families, without impacting classifier performance. Both the full and the reduced sets of genes/proteins cover a wide range of functions including DNA replication/repair, transcription/translation, protein processing and central metabolism (cofactor, carbon, cell wall/membrane, etc.). Our work thus identifies new archaeal molecular biomarkers and offers a novel way to select genes/functions that hold critical information for any taxonomy assignment.

View Posters By Category

Search Posters:

TOP