Oral Poster Presentations
ISMB/ECCB 2011 introduces oral poster presentations for a select group of outstanding posters.
Works deemed exceptional in original review were passed on to three other reviewers each for further consideration.
24 posters were selected for oral presentations (8-minute talk) on the first day of the conference.
These Oral Poster Presentations will also be considered for "best poster" awards.
OPT01 Sunday, July 17: 10:45 a.m. - 11:10 a.m.ALISSA-An Automated Live Cell Imaging System for Signal Transduction Analyses
Room: Hall E1
Presenting author: Heinrich Huber, Royal College of Surgeons in Ireland, Ireland
Heiko Dussmann, RCSI, Ireland
Perrine Paul, Institut Curie , France
Dimitris Kalamatianos, NUIM, Ireland
Endl Martina, Siemens Austria, Ireland
Jakub Wenus, NUIM, Ireland
Peter Wellstead, NUIM, Ireland
Jochen Prehn, RCSI, Ireland
Probe photo bleaching and the specimen’s sensitivity to photo toxicity severely limit the frequency and duration of exposures during time-lapse fluorescent microscopy experiments. Consequently, when a study of cellular processes requires measurements over hours or days, temporal resolution is limited, and spontaneous or rapid events may be missed. We have developed ALISSA, an automated live-cell imaging system for signal-transduction analysis. It allows an adaptation of imaging modalities and laser resources tailored to the biological process, and extends temporal resolution from minutes to seconds. ALISSA employs online image analysis to detect cellular events that are then used to exercise microscope control. Our system can be integrated into standard fluorescence microscopes and applied to a large range of single cell experiments by using a graphical language.
OPT02 Sunday, July 17: 10:45 a.m. - 11:10 a.m.GenomeRNAi: A Phenotype Database for Large-scale RNAi Screens
Room: Hall E1
Presenting author: Esther Schmidt, German Cancer Research Center, Germany
RNA interference (RNAi) is a powerful method to generate loss-of-function phenotypes. It enables systematic genetic screens for every annotated gene in the genome. Genome-wide RNAi screens have been performed in human, mouse, drosophila and C.elegans and many RNAi libraries have become available to study phenotypes on a large scale.
With more and more datasets of RNAi-induced phenotypes becoming available, the systematic integration of functional information remains an important task. RNAi screens are performed using different types of assays from visible phenotypes to focused transcriptional readouts and provide a rich data source for functional annotation. Large-scale in vivo RNAi screens are now emerging, representing a new challenge in data representation and annotation, due to the complex nature of in vivo assays and phenotypes.
Currently, the GenomeRNAi database provides access to cell-based phenotypes in drosophila and human cells. It holds 97600 entries for human and 99700 for drosophila phenotypes extracted from 136 published RNAi screens. It also contains sequence and efficiency data for 300000 and 118000 RNAi reagents for human and drosophila, respectively. The database can be searched by phenotype, gene or RNAi probe and is publicly accessible at http://www.genomernai.org.
Our aim is comprehensive coverage of all published genome-wide RNAi screening data in GenomeRNAi, including large-scale data from drosophila in vivo screens. We have developed an annotation guide to ensure consistent data collection. Work to adapt it for in vivo data is ongoing. We will report on curation progress and present an update on new features on the GenomeRNAi website.
Presentation PDF: Download Abstract
OPT03 Sunday, July 17: 10:45 a.m. - 11:10 a.m.Annotare – a tool for annotating high-throughput biomedical investigations and resulting data
Room: Hall E1
Presenting author: Tony Burdett, European Bioinformatics Institute, United Kingdom
Ravi Shankar, Stanford University, United States
Helen Parkinson, European Bioinformatics Institute, United Kingdom
Tony Burdett, European Bioinformatics Institute, United Kingdom
Emma Hastings, European Bioinformatics Institute, United Kingdom
Alvis Brazma, European Bioinformatics Institute, United Kingdom
Junmin Liu, IBM, United States
Michael Miller, Institute for Systems Biology, United States
Chris Stoeckart, University of Pennsylvania School of Medicine , United States
Rashmi Srinivasa, 5am Solutions, United States
Joseph White, Dana Farber Cancer Institute, United States
Gavin Sherlock, Stanford University School of Medicine, United States
Catherine Ball, Stanford University School of Medicine, United States
Meta data describing high-throughput investigations enable unambiguous interpretation of experiments, experiment reproducibility, and meaningful searching and analysis of the resulting data. The microarray community has developed MAGE-TAB, an annotation format for microarray data acquisition and communication, and several public repositories of microarray data support microarray data submissions with MAGE-TAB annotations. In order to improve the volume, quality, and granularity of annotations, there is a compelling need for software that enables biologists to easily annotate such data. We have developed Annotare, a software tool that facilitates annotation of gene expression data in MAGE-TAB format. Using the tool, bench biologists can describe their investigations with the investigators’ contact details, experimental design, protocols that were employed and references to publications, and can also report on the relationships between biological materials, arrays, and experimental data produced in the investigation. Annotare features 1) a set of intuitive graphical user interface forms to create and modify annotations, 2) access to different biomedical ontologies that can be used as controlled vocabularies in the annotations, 3) a set of standard annotation templates indexed by different investigation elements, 4) a design wizard that solicits investigation details from the user and creates partial annotations that the user can then complete, and 5) a MAGE-TAB validator that checks for syntactic and semantic violations in the annotations. Annotare is a collaborative open-source software development effort involving different institutions. A version of the tool is available from Annotare’s project website: http://code.google.com/p/annotare/
OPT04 Sunday, July 17: 11:15 a.m. - 11:40 a.m.Mapping the spread of HIV in Germany
Room: Hall E1
Presenting author: Glenn Lawyer, Max Planck Institute for Informatics, Germany
The Resina Project tracks the spread of HIV-1 genotypes in German patients. To date, 35 participating clinics have gathered over 2700 genotypes, along with demographics including patient risk group and region of residence.
Genotypic similarity between viral samples can be converted into a graph linking related infections. This allows graph theoretical investigations of the data, such as measuring transmissions between various risk groups, the degree of interconnectivity within each group, and the structure and size of connected components. Additionally, the geographical information can be used to condense the graph to connections between regions. Measures on the condensed graph are also of interest, and the geographical structure allows the graph to be plotted over a map of Germany.
A conservative similarity threshold led to the following observations. Connected components in the full graph were mostly pairs or triplets. Only 5% of transmission chains had more than 10 patients. The homosexual risk group continues to be at the center of the epidemic, with all risk groups showing many connections to this group. This was surprising, given that two of the risk groups (intravenous drug users; people from endemic regions) were not expected to have strong overlap with the homosexual community. The reduced graph showed several interesting patterns. For example, two distinct networks of non-B HIV were found in the intravenous drug user risk group.
Graphical representation of genetic relationships reveals deep structure in the spread of HIV, both through mathematical analysis and visual display.
OPT05 Sunday, July 17: 11:15 a.m. - 11:40 a.m.Evolutionary meta-analysis reveals ancient constraints affecting missing heritability and reproducibility in disease association studies
Room: Hall E1
Presenting author: Joel Dudley, Stanford University, United States
Rong Chen, Stanford University, United States
Atul Butte, Stanford University, United States
Sudhir Kumar, Arizona State University, United States
Genome-wide disease association studies contrast genetic variation between disease cohorts and healthy populations to discover single nucleotide polymorphisms (SNPs) and other genetic markers revealing underlying genetic architectures of human diseases. Despite many large efforts over the past decade, these studies are yet to identify many reproducible genetic variants that explain significant proportions of the heritable risk of common human diseases. Here, we report results from a multi-species comparative genomic meta-analysis of 6,720 risk variants for more than 420 disease phenotypes reported in 1,814 studies, which is aimed at investigating the role of evolutionary histories of genomic positions on the discovery, reproducibility, and missing heritability of disease associated SNPs (dSNPs) identified in association studies. We show that dSNPs are disproportionately discovered at conserved genomic loci in both coding and non-coding regions, as the effect size (odds ratio) of dSNPs relates strongly to the evolutionary conservation of their genomic positions. Our findings indicate that association studies are biased towards discovering rare variants, because strongly conserved positions only permit minor alleles with lowest frequencies. Using published data from a large case-control study, we demonstrate that the use of a straightforward multi-species evolutionary prior improves the power of association statistics to discover SNPs with reproducible genetic disease associations. Therefore, long-term evolutionary histories of genomic positions are poised to play a key role in reassessing data from existing disease association studies and in the design and analysis of future studies aimed at revealing the genetic basis of common human diseases.
OPT06 Monday, July 17: 11:15 a.m. - 11:40 a.m.Practical NGS Analysis: A Bioinformatician’s Perspective
Room: Hall E1
Presenting author: Abhishek Pratap, University of Maryland, Baltimore County, USA DOE-Joint Genome Institute, United States
The motivation for this poster comes from the questions I see across bioinfo mailing lists on how to handle NGS (Next Generation Sequencing) Data. I plan to present various analysis methods and techniques that I have used or developed for analysis of data from Illumina class of sequencers in the last three years or so.
The focus remains on asking biologically pertinent questions efficiently in a timely manner from the raw data. More importantly making sure the conclusions make biological sense. I plan to address the following questions via this poster and engage in exciting discussions with my peers at ISMB.
1. Data quality conversions. Why should you convert Illumina QScores to SangerQ?
2. Error Corrections : Dynamic v/s Static trims
3. Can I sequence more from the same library? : Evaluating Library Complexity
4. Alignment: Which aligner is right for your data and question and why?
5. Exome Capture : Knowing you captured the right region
6. RNA Seq : issues with read count and expression analysis
7. SNP Calling : Specificity v/s Sensitivity
8. How to identify disease causing mutations with case and controls?
9. Data exchange formats: How should I store and transfer my NGS data?
Many techniques that will be shown will use some of already available open source tools. My goal is to provide review for those and describe how I threaded different pipelines together to answer the questions that principal investigator’s asked of the data.
OPT07 Sunday, July 17: 11:45 a.m. - 12:10 p.m.A Novel Bioinformatics Pipeline for Identification and Characterization of Fusion Transcripts in 31 Breast Cancer and Normal Cell Lines
Room: Hall E1
Presenting author: Yan Asmann, Mayo Clinic, United States
Asif Hossain, Mayo Clinic, United States
Brian Necela, Mayo Clinic, United States
Sumit Middha, Mayo Clinic, United States
Krishna Kalari, Mayo Clinic, United States
Zhifu Sun, Mayo Clinic, United States
High-Seng Chai, Mayo Clinic, United States
David Williamson, Illumina, United States
Derek Radisky, Mayo Clinic, United States
Gary Schroth, Illumina, United States
Jean-Pierre Kocher, Mayo Clinic, United States
Edith Perez, Mayo Clinic, United States
Aubrey Thompson, Mayo Clinic, United States
SnowShoes-FD, developed for fusion transcript detection in paired end mRNA-Seq data, employs multiple steps of false positive filtering to nominate fusion transcripts with near 100% confidence. Unique features include: (i) identification of multiple fusion isoforms from two gene partners; (ii) prediction of genomic rearrangements; (iii) identification of exon fusion boundaries; (iv) generation of a 5’ to 3’ fusion spanning sequence for PCR validation; (v) prediction of the protein sequences, including frame shift and amino acid insertions. We applied SnowShoes-FD to identify 50 fusion candidates in 22 breast cancer and 9 non-transformed cell lines. Five additional fusion candidates with two isoforms were confirmed. Thirty out of 55 fusion candidates had in-frame protein products. No fusion transcripts were detected in non-transformed cells. Consideration of the possible functions of a subset of predicted fusion proteins suggests several potentially important functions in transformation, including a possible new mechanism for overexpression of ERBB2 in a HER positive cell line. The source code of SnowShoes-FD is provided in two formats: one configured to run on the Sun Grid Engine for parallelization, and the other formatted to run on a single LINUX node. Executables in PERL are available for download from our website: http://mayoresearch.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm.
OPT08 Withdrawn Sunday, July 17: 11:45 a.m. - 12:10 p.m.Discovering Novel Patterns of DNA Methylation Landscape of Human Cancer Genome
Room: Hall E1
Presenting author: G Reshmi, Rajiv Gandhi Centre for Biotechnology, India
Priyanka James, Rajiv Gandhi Centre for Biotechnology, in
M.Radhakrishna Pillai, Rajiv Gandhi Centre for Biotechnology, in
Differential DNA methylation is an essential epigenetic signal for gene regulation, development, and disease processes. Aberrant methylation of CpG islands in promoter region is associated with tumor suppressor gene silencing in cancer cells. It is not known why some CpG islands are susceptible to aberrant methylation during tumor progression while others are not. To address this question we described a computational pattern recognition method that is used to predict the methylation landscape of human cancer genome. Using sequence-based rules derived from cancer cell methylation data has also been explored as a way to predict the pattern of aberrant methylation in cancer genome-wide. To gain more insights into how such epigenetic mechanism works in the human cells, we apply the best performing classifier the support vector machine approach. These studies have identified consensus sequences, proximity to repetitive elements and chromosomal location as potential factors influencing the likelihood of a CpG island becoming aberrantly methylated.Our results also shows that some of the gene promoters are highly co-methylated, demonstrating the evidence that genes are highly interactive epigenetically in human cells. Our data shed new light on the nature of methylation patterns in human cells, the sequence dependence of DNA methylation, and its function as epigenetic signal in gene regulation.
OPT09 Sunday, July 17: 11:45 a.m. - 12:10 p.m.CNAmet: an R package for integrating copy number, methylation and expression data
Room: Hall E1
Presenting author: Riku Louhimo, University of Helsinki, Finland
Sampsa Hautaniemi, University of Helsinki, Finland
Genomic instability is a key enabling characteristic of tumorigenesis. Identification of regions with copy number alterations has uncovered deregulated tumor suppressors and oncogenes which play key roles in tumor progression and drug response. In addition to copy number changes, gene expression is regulated by DNA methylation. Gene copy number, methylation and expression can all be measured with microarrays which enables their computational integration. The goal of integrating copy number and expression data is to characterize genes essential to cancer progression. This is achieved by identifying genes that are both amplified and upregulated or deleted and downregulated. However, gene upregulation can also be caused by hypomethylation (decrease in methylation of cytosine and adenosine residues in DNA) and downregulation caused by hypermethylation. Since both copy number and methylation alterations can deregulate genes, integration of these three types of data should improve the characterization of essential genes in cancer. We introduce the CNAmet R package which integrates high-throughput copy number, methylation and expression data. Our primary goal is to identify genes that are simultaneously amplified, hypomethylated and upregulated, or deleted, hypermethylated and downregulated. To our knowledge CNAmet is the only software package for the three-way integration of copy number, methylation and expression. We illustrate the usefulness of CNAmet by analyzing copy number, methylation and expression data from 174 patients with glioblastoma multiforme, which is the most aggressive type of brain cancer, as well as 196 ovarian cancer patients.
OPT10 Sunday, July 17: 12:15 p.m. - 12:40 p.m.Penalized regression elucidates aberration hotspots mediating subtype-specific transcriptional responses in breast cancer
Room: Hall E1
Presenting author: Yinyin Yuan, Cambridge Research Institute, United Kingdom
Yinyin Yuan, Cancer Research UK, United Kingdom
Oscar Rueda, Cancer Research UK, United Kingdom
Christina Curtis, Cancer Research UK, United Kingdom
Florian Markowetz, Cancer Research UK, United Kingdom
COPY(CNAs) associated with cancer are known to contribute to genomic instability and gene deregulation. Integrating copy-number data with gene expression helps to elucidate the mechanisms by which CNAs act and to identify the transcriptional Down-stream targets of copy-number changes. Such analyses can help to sort functional driver events from the many accompanying passenger alterations. However, the way CNAs affect gene expression can vary between different cellular contexts, for example between healthy tissue and tumour or between different subtypes of the same cancer. Thus it is important to develop computational approaches capable of inferring differential connectivity of regulatory networks in different cellular contexts. We propose a statistical deregulation model that integrates copy-number and expression data of different disease subtypes to jointly model common and differential regulatory relationships. Our model not only identifies copy-number alterations driving gene expression changes, but at the same time also predicts differences in regulation that distinguish one cancer subtype from the other. We implement our model in a penalized regression framework and demonstrate in a simulation study the feasibility and accuracy of our approach. On a real breast cancer dataset we show that we can identify both known and novel aspects of pathway deregulation in ER positive versus negative disease as well as crosstalk between pathways. Availability: The Bioconductor-compliant R package DANCE is available From www.markowetzlab.org/software/
OPT11 Sunday, July 17: 12:15 p.m. - 12:40 p.m.The functional importance and detection of regulatory sequence variants
Room: Hall E1
Presenting author: Virginie Bernard, University of British Columbia, Canada
Wyeth Wasserman, Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Canada
David Arenillas, Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Canada
The convergence of high-throughput technologies for sequencing individual exomes and genomes and rapid advances in genome annotation are driving a neo-revolution in human genetics. This wave of family-based genetics analysis is revealing causal mutations responsible for striking phenotypes. By mapping the reads to the human genome reference and by searching for variations relative to the reference, a list of small nucleotide variations and structural variations is obtained. Analysis is required to reveal those variations most likely to contribute to a disease phenotype within a family. Existing software score the severity of changes that arise in protein encoding exons. However, most mutations within a family are situated in the 98% of the genome that controls the developmental and physiological profile of gene activity - the sequences that control when and where a gene will be active.
Functional contributions of cis-regulatory sequence variations to human genetic disease are numerous. With full genome sequencing becoming accessible to medical researchers, the need to identify potential causal mutations in regulatory DNA is becoming imperative. We are implementing a software system to enable genetics researchers to characterize regulatory DNA changes within individual genome sequences. We are combining reference databases of known regulatory elements, experimental archives of protein-DNA interactions and computational predictions within an integrated analysis package. With our software, researchers will have greater capacity to identify variations potentially causal for disease.
The poster introduces the challenges and approaches of regulatory sequence variation analysis.
OPT12 Sunday, July 17: 12:15 p.m. - 12:40 p.m.SNPs&GO: predicting the deleterious effect of human mutations using functional annotation.
Room: Hall E1
Presenting author: Emidio Capriotti, Stanford University, United States
Piero Fariselli, University of Bologna, it
Pier Luigi Martelli, University of Bologna, it
Rita Casadio, University of Bologna, it
High-throughput data from large-scale sequencing and genotyping techniques allow to analyze a huge amount of genetic variation from whole human genome. Single Nucleotide Polymorphisms (SNPs), which are the main cause of human genome variability, can also be involved in the insurgence of many diseases. In particular missense SNPs, occurring in coding regions and causing single amino acid polymorphisms (SAPs), can affect protein function and lead to genetic pathologies.
In this work, we present SNPs&GO (Calabrese at al., Human Mutation 2009), a new web server for the prediction of deleterious SAPs using protein functional annotation. We implemented two different SVM-based methods relying either on protein sequence or structure information. Both algorithms have been extensively tested on a large set of mutations extracted from SwissVar database (Mottaz et al., Bioinformatics 2010). Selecting a balanced dataset of SAPs, the sequence-based approach reaches 81% overall accuracy, 0.63 correlation coefficient and 0.89 area under the receiving operating characteristic curve (AUC). For the subset of mutations that can be mapped on protein structures known with atomic resolution (at the Protein Data Bank), the structure-based method results in 85% overall accuracy, a correlation coefficient of 0.70, and AUC of 0.92. In conclusion, SNPs&GO is a valuable tools that includes in unique framework information derived from protein sequence, structure, evolutionary profile, and protein function. In a recent study (Thusberg et al., Human Mutation 2011), SNPs&GO has been scored as one of the best algorithms for prediction of deleterious SAPs.
OPT13 Sunday, July 17: 2:30 p.m. - 2:55 p.m.Large-scale protein flexibility analysis of single nucleotide polymorphisms, using molecular dynamics simulations.
Room: Hall E1
Presenting author: Marc Offman, Technical University of Munich, Germany
Burkhard Rost, Technical University of Munich, Germany
Proteins are intrinsically flexible molecules, thus function is often associated to flexibility. Experimental methods to determine protein flexibility are expensive and often time consuming. Over the past few years an efficient complementing method, molecular dynamics simulations, more and more proved to be a powerful tool to yield information on protein dynamics. We have recently proven that careful biology-driven MD simulations can be used to predict the impact of single amino acid mutations on protein flexibility and function, at a level of accuracy comparable to experimental techniques. The question remains whether it will be possible to fully automate this process in the context of a large-scale analysis, and to what extent additional structural information, beyond that derived by sequence analysis of single nucleotide polymorphisms (SNPs) only, is useful. For this we created several comprehensive datasets of non-synonymous SNPs mapped to high-resolution and above average quality crystal structures from the PDB. In the context of the European SCALALIFE project up to 28,000 different mutations found in 1,600 individual crystal structures are simulated in duplicates for 10 ns each, using the GROMACS package. A comprehensive analysis pipeline has been established, investigating protein flexibility and stability, alteration of hydrogen-bond networks, active site integrity, changes in global and local energy and other structural effects. This pipeline has previously been successfully applied in the context of clinically relevant proteins. The results of this study, the automatic protocol and set of analysis tools will help in the future to understand individual phenotypes in clinical contexts.
OPT14 Sunday, July 17: 2:30 p.m. - 2:55 p.m.A method to separate function and fold specific residues in a protein.
Room: Hall E1
Presenting author: Mohd Rehan, Jawaharlal Nehru University, India
Andrew Lynn, Jawaharlal Nehru University, India
According to the neutral theory of molecular evolution, once a protein has evolved to a useful level of functionality, the majority of mutation are selectively neutral at the molecular level and do not affect the function and folding of the protein, whereas those mutations which are deleterious provide selection pressure for residue conservation. Thus, the conservation indicates the importance of residue for structure and function of the protein. Often, the mutations at a conservation site after the gene duplication leads to functional divergence. The residues of a protein at these sites called function specific residues when mutated changes the protein's function. For a class of protein, sequences are further grouped into subtypes contain subtype conserved residues indicative of functional differences among the subtypes. Here we present a method for finding function specific residues and fold specific residues in a protein family pre-classified into subtypes based on the function. Our method is based on RE (Relative Entropy) and a newly derived MI (Mutual Information) score. RE calculates the conservation over background distribution of amino acids whereas the derived MI tells about differential conservation among subtypes. This newly derived MI and the RE are combined in a way to give a new score, MIRE to rank the residues for fold and function. The methodology is implemented using HMM, validated on AGC-Protein Kinases and G-Protein Coupled receptors (GPCRs) and compared with other existing methods. The kinases have different peptide substrates, but common protein kinase function whereas GPCRs have similar scaffold with different substrate binding sites.
OPT15 Sunday, July 17: 2:30 p.m. - 2:55 p.m.Structural Biology Meets Systems Biology: A Structural Systems Biology Approach for Gauging the Systemic Effect of Single Nucleotide Polymorphisms
Room: Hall E1
Presenting author: Tammy Cheng, Cancer Research UK, United Kingdom
Linda Jeffery, Cancer Research UK London Research Institute, United Kingdom
Lucas Goehring, Max Plank Institute, Germany
Yu-En Lu, University of Cambridge, United Kingdom
Jacqueline Hayles, Cancer Research UK London Research Institute, United Kingdom
Bela Novak, University of Oxford, United Kingdom
Paul Bates, Cancer Research UK London Research Institute, United Kingdom
Small changes in protein structure, such as non-synonymous single nucleotide polymorphisms, can have a large impact on cellular behaviour. To understand how a change at the protein structure level eventually affects a cell's phenotypic outcome is, however, not trivial. This is because complex, multi-scale, information needs to be considered in order to obtain analytical results with physiological meanings. To tackle this issue, we have developed a structural systems biology approach, PEPP (Phenotype Extrapolation via Pathway and Protein information), to effectively integrate protein structural information with pathway dynamics.
Here we demonstrate the new method by studying point mutations in two biological systems: (1) Regulation of the G2 to mitosis transition in Schizosaccharomyces pombe (fission yeast) and (2) the human ERK pathway. In the first system, we use mitosis-promoting factor as a reporter protein and studied 11 mutations that result in various extent of growth before yeast cells enter the mitosis stage. In the second system, we used ERK as a downstream reference to study 40 mutations that lead to phenotypically overlapping symptoms under the broad term 'neuro-cardio-facial-cutaneous syndrome'. By combining the information on both pathway and protein structure levels, we are able to gauge the phenotypic effect of the mutations – in the yeast model, PEPP reflects the general trend of cell sizes measured experimentally; in the human model, PEPP clusters the point mutations into subgroups that reflects their clinical symptoms.
Presentation PDF: Download Abstract
OPT16 Sunday, July 17: 3:00 p.m. - 3:25 p.m.Network-based gene prioritization from expression data by diffusing through protein interaction networks
Room: Hall E1
Presenting author: Daniela Nitsch, Belgium, KU Leuven
Léon-Charles Tranchevent, KU Leuven, Belgium
Joana Gonçalves, INESC-ID, Portugal
Yves Moreau, KU Leuven, Belgium
Discovering novel disease genes is challenging for diseases for which no prior knowledge is available. Performing genetic studies frequently result in large lists of candidate genes of which only few can be followed up for further investigation. In the past couple of years, several gene prioritization methods have been proposed. Most of them use a guilt- by - association concept, and are therefore not applicable when little is known about the phenotype or no disease genes are available.
We have proposed a method that overcomes this limitation by replacing prior knowledge about the biological process by experimental data on differential gene expression between affected and healthy individuals. At the core of the method are a protein interaction network and disease-specific expression data. Our approach propagates the expression data over the network using an extended Random Walk approach based on kernel methods, as the inclusion of indirect associations compensating for network sparsity and small world effect issues. It relies on the assumption that strong candidate genes tend to be surrounded by many differentially expressed neighboring genes in a protein interaction network.
We have benchmarked our approach, and results showed that it clearly outperforms other gene prioritization approaches with an average ranking position of 8 out of 100 genes, and an AUC value of 92.3%.
Recently, we have developed the web server PINTA implementing our gene prioritization approach to make it available for clinicians and other researchers.
OPT17 Sunday, July 17: 3:00 p.m. - 3:25 p.m.Protein interaction partners revealed by their dynamical properties
Room: Hall E1
Presenting author: Jau-Ji Lin, National Chiao Tung University, Taiwan
Jenn-Kang Hwang, National Chiao Tung University, Taiwan
The biological processes in cells are carried out by various kinds of protein molecules that perform their functions by interacting with other proteins. Identifying the interacting partners of proteins is crucial to the constructing of the biological pathways in a cell. Recently we developed a method called WCN (Lin et al., 2008) to calculate the protein dynamical properties from its folded structure directly and efficiently. In this work, we analyze the dynamical properties of interacting proteins. We find that these proteins display a “complementary” relationship in their profiles. Therefore we analyze the dynamical profiles of all protein structures in PDB and try to elucidate the relationships between them. The dynamical profiles reveal that proteins do have their specific “disposition” to interact with certain kinds of protein molecules. By inspecting the dynamical profiles between each pair of protein structures in PDB, we can construct a network with nodes and edges representing the proteins and the interactions between them. The resulting network is composed of several sub networks, and most of them agree with the available protein-protein interaction data. In addition, we use CELLO (Yu et al., 2006) to predict the subcellular localization of proteins being proposed as interacting partners and find most of them have the same subcellular localization. This finding confirms the space requirement of proteins to interact with each other. This study constitutes a novel methodology of constructing protein-protein interaction networks and provides a promising way to identify interacting partners of proteins of unknown function with structural information only.
OPT18 Sunday, July 17: 3:00 p.m. - 3:25 p.m.Detection of nonlinear effects in gene expression pathways
Room: Hall E1
Presenting author: Andreas Mayr, Johannes Kepler University Linz, Austria
Djork-Arné Clevert, Johannes Kepler University Linz, Austria
Sepp Hochreiter, Johannes Kepler University Linz, Austria
One of the main topics in systems biology is to model genetic pathways. Genes of a pathway, which show linear dependencies of their expression values, are easy to identify to belong to the pathway. However, if feedback loops or signal cascades are present, gene expression values of pathway genes can be nonlinearly dependent on the expression values of other genes in the pathway. In this situation such genes are hard to detect as belonging to the pathway because nonlinearity and noise must be distinguished.
We propose an algorithm to infer nonlinear network elements in pathways from microarray data. Our model assumes, that gene expression values, belonging to one pathway, are mainly driven by one single latent factor. We expect two groups of genes in a pathway: genes belonging to the first group are linearly dependent on the hidden factor, genes from the other group show a nonlinear dependence from the latent variable. The goal is to identify the kind of dependence from the hidden factor.
Our algorithm for detecting nonlinear effects is an extension of linear Gaussian factor analysis. Nonlinearities are modelled by the square of the latent variable weighted by specific coefficients. We derived a novel model selection method for this generalization of factor analysis. To avoid the interpretation of noise as nonlinearity, we determine p-values that measure the probability of a linear gene being detected by chance as nonlinear.
We apply our algorithm to microarray data of breast cancer samples, where we identified nonlinear dependencies of gene expression values in the p53 pathway.
OPT19 Sunday, July 17: 3:30 p.m. - 3:55 p.m.Systematically mapping the druggable pathways of Saccharomyces cerevisiae
Room: Hall E1
Presenting author: Kristen Fortney, University of Toronto, Canada
Wing Xie, University of Toronto, Canada
Max Kotlyar, University of Toronto, Canada
Igor Jurisica, University of Toronto, Canada
Drug modes of action are complex and still poorly understood. The set of known drug targets is widely acknowledged to be biased and incomplete, and so gives only limited insight into the system-wide effects of drugs. But a high-throughput assay unique to yeast – barcode-based chemogenomic screens – can measure the individual drug response of every yeast deletion mutant in parallel.
We integrate the four largest S. cerevisiae chemogenomic experiments, which together comprise the responses of thousands of gene knockout strains to over 500 drug treatments, and develop a data-mining approach to investigate drug effects at the systems level. We apply our method to identify yeast pathways, functions, and phenotypes that are targeted by particular drugs. To demonstrate relevance to human disease, we collect groups of disease-associated human genes, map them to their homologs in yeast and apply our method; we recover drugs already prescribed for those diseases and propose several new drug candidates. We also develop methods for modeling the set of all significant pathway-drug connections as bipartite interaction networks. Our analyses of the structure of these networks reveal that while most pathways are targeted by few drugs, some are extremely druggable. Finally, we build YEDr, the YEast Drug database, a searchable interface to our data, methods, and results. Users can query YEDr with new gene groups (yeast, mouse, or human) and YEDr will retrieve the drugs that target them. Human targets are integrated with GeneCards, I2D, mirDIP and other resources.
OPT20 Sunday, July 17: 3:30 p.m. - 3:55 p.m.Gene expression evolution on the emergence of pathogenicity in Ascomycetes
Room: Hall E1
Presenting author: Aminael Sanchez-Rodriguez, Catholic University of Leuven, Belgium
Kristof Engelen, Catholic University of Leuven, Belgium
Riet De Smet, Ghent University, Belgium
Qiang Fu, Catholic University of Leuven, Belgium
Yan Wu, Catholic University of Leuven, Belgium
Kathleen Marchal, Catholic University of Leuven, Belgium
The Ascomycetes form the largest phylum in the fungal kingdom. They are of special interest due to their broad spectrum of life styles including both plant and human pathogens. In a previous study we showed that most of the protein-coding genes needed for pathogenicity were already present in an ancestor common to both pathogenic and non-pathogenic Ascomycetes. Based on this finding that at least some of the coding potential related to pathogenicity is common to both pathogens and non-pathogens, we studied whether alterations in expression behavior could be correlated with a pathogenic versus a non pathogenic lifestyle. Using coexpression networks derived from large scale expression compendia for the non-pathogen N. crassa and the pathogens M. grisea and F. graminearum, we compared the coexpression behavior of true orthologs from gene families related to pathogenicity and occurring in each of the species. We found that the direct neighbours, of the respective orthologs in the coexpression network of the pathogen was largely different from its neighbours in the non-pathogenic species, implying a considerable rewiring of the coexpression network of these common orthologs. In addition we also observed that the expression evolution of paralogs in gene families that are common to both pathogens and non pathogens is different. More specifically we detected that the expression behavior of paralogs belonging to gene families that are involved in pathogen-host interaction tend evolve faster in pathogens than in non-pathogens.
OPT21 Sunday, July 17: 3:30 p.m. - 3:55 p.m.The Darwinian Tree Of Life In Light Of Horizontal Gene Transfer (Is Still Sound)
Room: Hall E1
Presenting author: Adam Sardar, University of Bristol, United Kingdom
Horizontal gene transfer (HGT) is the process whereby nucleic acids are transmitted from the genome of one species to another without inheritance by descent (i.e parent to child). Some people believe that there is no prokaryotic tree of life, but in fact a ‘thicket’ of shared genetic material. It is therefore extremely important to understand HGT, as it is so divergent from assumptions made in some phylogenetic analysis , such as a direct ancestry of homologous proteins. Previous studies have focused on trying to predict precisely which genes have been horizontally transferred and where, based on multiple sequence alignments and searches for a ’footprint’ or biochemical signal peculiar to the parent genome. This is very difficult and the signal is quickly lost with evolutionary time. We propose avoiding to attempt prediction of every horizontal transfer event, but to take a global view. We look for proteins comprised of sets of structural-domains (domain architectures, as identified in the SUPERFAMILY database) that show disparity in their deletion rates across the tree of life, requiring an overwhelmingly improbable realisation of events in order for their current distribution in observed genomes. This does not provide concrete identification of individual HGT events, but creates an enriched set of possible contenders, acting as an indicator towards HGT across all all organisms.
Our research suggest an upper bound of 30% for the percentage of domain architectures in bacteria involved in HGT, suggesting that bacteria do indeed posses an underlying Darwinian tree.
OPT22 Sunday, July 17: 4:00 p.m. - 4:25 p.m.Quality or Quantity of Structural Dynamics Information: That is the Question! (when improving performance of structure-based function prediction methods)
Room: Hall E1
Presenting author: Dariya Glazer, Stanford University, United States
Grace Tang, Stanford University, United States
Vijay Pande, Stanford University, United States
Russ Altman, Stanford University, United States
Previously, we showed that incorporating structural dynamics information from molecular simulations improved performance of structure-based function prediction methods. This raised new intriguing questions. Does the choice of force field, the length of the simulation trajectories, or the presence of a ligand in the simulation systems have any effect? Are there any meaningful descriptors of the true positive results that can help discriminate those from the false positive ones?
To this end, we scaled up our efforts in molecular dynamics simulations with Folding@Home distributed computing. Using 5 force fields (Amber ’96, ‘99sb, '03, Gromos53a6, and OPLS-AA) we generated 14 multiples of 10 nanosecond trajectories (a total of 140 nanoseconds) with and without the presence of calcium ions for 11 pairs of structures, HOLO/APO for calcium binding. Resulting structural ensembles were evaluated for calcium binding sites using FEATURE.
Our results indicate that different force fields explore somewhat different conformational space of the molecules with respect to the sensitivities of FEATURE. Longer simulation trajectories and inclusion of calcium ions in the simulation systems did not yield a greater improvement in FEATURE’s performance. We propose 6 descriptors of sites identified by FEATURE that can be intelligently combined by machine learning algorithms in order to maximize true positive and minimize false positive results.
Future efforts in structure-based function prediction, especially identifying calcium binding sites, could benefit from applying function prediction methods to structural ensembles of the molecules of interest obtained from several copies of short-scale molecular dynamics simulations employing at least two general force fields.
OPT23 Sunday, July 17: 4:00 p.m. - 4:25 p.m.Nearest-Neighbor Approaches to Predict Protein Function by Sequence Homology Alone
Room: Hall E1
Presenting author: Tobias Hamp, TU Muenchen, Germany
Rebecca Kassner, TU Muenchen, Germany
Stefan Seemayer, TU Muenchen, Germany
Esmeralda Vicedo, TU Muenchen, Germany
Christian Schaefer, TU Muenchen, Germany
Dominik Achten, TU Muenchen, Germany
Florian Auer, TU Muenchen, Germany
Ariane Boehm, TU Muenchen, Germany
Tatjana Braun, TU Muenchen, Germany
Maximilian Hecht, TU Muenchen, Germany
Mark Heron, TU Muenchen, Germany
Peter Hoenigschmid, TU Muenchen, Germany
Thomas Hopf, TU Muenchen, Germany
Stefanie Kaufmann, TU Muenchen, Germany
Michael Kiening, TU Muenchen, Germany
Denis Krompass, TU Muenchen, Germany
Cedric Landerer, TU Muenchen, Germany
Yannick Mahlich, TU Muenchen, Germany
Manfred Roos, TU Muenchen, Germany
Burkhard Rost, TU Muenchen, Germany
This year's Automated Function Prediction SIG meeting at ISMB/ECCB 2011 features, for the first time, the so-called CAFA challenge.
Participants were supposed to computationally predict and submit the Gene Ontology (GO) terms for over 40,000 so far functionally unannotated targets.
In this context, we have developed three different Nearest-Neighbor-based methods to derive the functions of a protein by sequence homology alone. Despite being conceptually the same, they largely differ in their technical details, ranging from an emphasis on superior ad-hoc performance to sophisticated and well-defined scoring schemes, allowing to direct either recall or precision into the desired reliability range.
All three methods were employed to predict all the CAFA targets mentioned before and are, in a way, supposed to act as controls for an expected wealth of more sophisticated function prediction machineries at the CAFA meeting.
In addition, we have created a meta-classifier to combine the three approaches and established new ways to generally capture the performance of function prediction tools with GO terms. The latter now enable us to not only estimate precision and recall as described in the official CAFA rules, but also to use the shapes of the GO subgraphs of a target protein to create an assessment that considers its biologically distinct functions.
An own pre-CAFA evaluation of our methods with 10,000 random targets from the SwissProt database already showed remarkable performances of up to 87% precision with 84% recall using the official standard protein centric measure and a greatly outperformed class of random prediction models.
OPT24 Sunday, July 17: 4:00 p.m. - 4:25 p.m.Graph-based queries of Semantic-Web integrated biological data
Room: Hall E1
Presenting author: Marco Moretto, Edmund Mach Foundation, Italy
Alessandro Cestaro, Edmund Mach Foundation, it
Riccardo Velasco, Edmund Mach Foundation, it
In the post-genomic era, life science researchers are faced with the need to manage and inspect a growing abundance of data and information. Data from different sources, both public and proprietary, have the most value when considered in the context of each other as they give more information. In order to answer questions that spans multiple fields in the biology domain without an integrated approach, a biologist needs to visit all data sources related to the problem and ?nd relevant data. In the last years we have become witnesses of a growing interest for the Semantic Web technologies to integrate and query biological data. Semantic Web technologies were designed to meet the challenges of reduce the complexity of combining data from multiple sources, resolve the lack of widely accepted standards and manage highly distributed and mutable resources. However, Semantic Web standard technologies do not provide any tools to query integrated knowledge bases from a graph perspective, that is defining graph traversal patterns. For example, it is not possible to ask a query like "are enzyme A and compound B related?" without knowing the complete structure of the knowledge base. After exploring different alternatives we come up with the use of a graph traversal programming language on top of a triplestore in order to perform several path traversal queries on an integrated graph. We tested the feasibility of the approach integrating Uniprot, Gene Ontology, Chebi and Kegg resources posing queries of different complexity.