ISMB/ECCB 2011 Posters

19th Annual International Conference on
Intelligent Systems for Molecular Biology and
10th European Conference on Computational Biology

Accepted Posters

Category 'A'- Bioinformatics of Health and Disease'

Poster A01

Clustering disease connections using DMDM: Domain Mapping of Disease Mutations

Maricel Kann UMBC

Short Abstract: Domain mapping of disease mutations (DMDM) is a database in which each disease mutation can be displayed by its gene, protein or domain location. By aggregating disease mutations and polymorphisms from all proteins containing a given domain, DMDM’s unique domain view highlights molecular relationships between different diseases that might not be observed with traditional gene-centric visualization tools. To examine the relationships between diseases connected at varying levels of specificity (ex. between Parkinson disease and Severe Combined Immunodeficiency, or more generally between neurological and immunological disorders), we are now developing methods for clustering diseases based on their protein domain associations. To benchmark these methods, we performed an initial clustering of disease names. We are currently evaluating methods to cluster diseases by their phenotype similarity in order to group diseases with similar molecular mechanisms and to assess our domain-based cluster methodology. Our website, DMDM, is available at http://bioinf.umbc.edu/dmdm and can be used to identify protein domain sites with high incidence of disease mutations.

Poster A02

Bayesian Networks to Decision Making in Lymph Node Metastasis of Colorectal Cancer

Md. Aminul Hoque Niigata University of Medical & Dental Hospital

Toshifumi Wakai (Niigata University of Medical & Denta Hospital, Division of Digestive and General Surgery); Kohei Akazawa (Niigata University of Medical & Dental Hospital, Medical Informatics);

Short Abstract: Background: The quality of medical care has always been a key issue for both practitioners and patients and the highest standards and practice guidelines are expected in all fields of medicine. The diagnosis of cancer metastasis is often difficult because of atypical clinical histories, clinical signs, and the results of laboratory tests. Recently, the Bayesian network (BN) model has been used in a variety of research fields for decision support.
Methods: This study has been conducted on patients who suffered from colorectal cancer collected from Niigata University Hospital in Japan. A total of 111 patients with colorectal cancer were analyzed in this study with 62 men and 49 women. The optimal structure of the BN model was determined based on R package deal along with Bayesian score and experts knowledge.
Results: Through the course of illness, only 12 of 111 (10.81%) of the patients were diagnosed as having regional lymph node metastasis. The network structure showed a complex relationship among the graph nodes, and metastasis node is directly connected with five other nodes. The conditional probabilities of bump (27.6% to 75.8%), permeation of lymphatic vessels (41.8% to 97.3%), degree of differentiation SM layer (34.7% to 81.9%), and blood vessel invasion (39.5% to 69.0%) have been remakable changed when found the patient with nodal disease. These results justified that these variables are directly influenced by regional lymph node metastasis.
Conclusion: We constructed a BN model for the diagnosis of cancer metastasis and found BN model provided the best prognostic prediction of colorectal cancer in clinical practice.

Poster A03

Possibilities and limitations for the prediction of disease related mutations in the human kinome

Jose Izarzugaza CNIO - Spanish National Cancer Research Institute

Angela del Pozo (CNIO - Spanish National Cancer Research Institute, Structural Biology and Biocomputing); Alfonso Valencia (CNIO - Spanish National Cancer Research Institute, Structural Biology and Biocomputing);

Short Abstract: Human Protein Kinases are involved in a wide variety of physiological functions. Most of the many mutations described in this protein family are tolerated without significant disruption of their structure or function. Interestingly, a number of them are associated to human diseases, including cancer, and deserve particular attention.
Here we present the basis for the development of a computational method for the prediction of the impact of mutations in the function of protein kinases. The study was carried out in a set of 3492 well-characterized disease and neutral kinase-mutations extracted from Uniprot.
We explored the significance of disease-associated kinase mutations in terms of sequence-derived characteristics at different levels, including: a) at the gene level, the membership to a Kinbase group and Gene Ontology terms. b) at the domain level, the occurrence of the mutation inside a PFAM domain, and c) at the residue level, several properties including amino acid types, functional annotations from Swissprot and FireDB, specificity-determining positions, etc. We analyzed the independent significance of these properties and their combination, with a Support Vector Machine (SVM). Interestingly, the family-specific features appear among the most discriminative information sources, which justifies the development of a kinase-specific predictor.
Our study aims to broaden the knowledge on the mechanisms by which mutations in the human kinome contribute to disease with a particular focus in cancer. In addition, we discuss the benefits and pitfalls of using the information available for the development of a kinase-specific predictor with regard to other current prediction methods.

Poster A04

A large scale analysis in the human proteome detects correlation among disease associated mutations and perturbation of protein stability

Rita Casadio University of Bologna

Valentina Indio (University of Bologna, Laboratory of Biocomputing, Computational Biology Network & “Giorgio Prodi” Center (CIRC)); Pier Luigi Martelli (University of Bologna, Laboratory of Biocomputing, Computational Biology Network); Marco Vassura (University of Bologna, Laboratory of Biocomputing, Computational Biology Network & Department of Computer Science ); Piero Fariselli (University of Bologna, Laboratory of Biocomputing, Computational Biology Network & Department of Computer Science );

Short Abstract: Technological advancements constantly increase the number of mutations that need annotation in translated regions of the human genome. Single residue mutations in proteins are known to affect protein stability and function. As a consequence they can be disease associated. Available computational methods starting from protein sequence/structure predict whether residue mutations are conducive to disease or alternatively to instability of the protein folded structure. However the relationship among stability changes in proteins and their involvement in human diseases still needs to be established. Here we try to rationalize in a nutshell the complexity of the question by generalizing over information already stored in public databases. For this we derive for each Single Aminoacid Polymorphysm (SAP) type the probability of being disease-related (Pd) and compute from thermodynamic data three indexes indicating the probability that it is conducive to decreasing (P-), increasing (P+) and perturbing the protein structure stability (Pp). Statistically validated analysis of the different P/Pd correlations indicates that Pd best correlates with Pp. Pp/Pd correlation values are as high as 0.49, and increase up 0.67 when data variability is taken into consideration. This is indicative of a medium/good correlation among Pd and Pp and corroborates the assumption that protein stability changes can be associated to disease at the proteome level.
All the probabilities are listed in a feature table useful to label SAPs as disease/protein perturbation frequently or less frequently associated in the current data bases.

Poster A05

SNPs&GO: predicting the deleterious effect of human mutations using functional annotation.

Emidio Capriotti Stanford University

Piero Fariselli (University of Bologna, Department of Computer Science); Pier Luigi Martelli (University of Bologna, Department of Biology); Rita Casadio (University of Bologna, Department of Biology);

Short Abstract: High-throughput data from large-scale sequencing and genotyping techniques allow to analyze a huge amount of genetic variation from whole human genome. Single Nucleotide Polymorphisms (SNPs), which are the main cause of human genome variability, can also be involved in the insurgence of many diseases. In particular missense SNPs, occurring in coding regions and causing single amino acid polymorphisms (SAPs), can affect protein function and lead to genetic pathologies.
In this work, we present SNPs&GO (Calabrese at al., Human Mutation 2009), a new web server for the prediction of deleterious SAPs using protein functional annotation. We implemented two different SVM-based methods relying either on protein sequence or structure information. Both algorithms have been extensively tested on a large set of mutations extracted from SwissVar database (Mottaz et al., Bioinformatics 2010). Selecting a balanced dataset of SAPs, the sequence-based approach reaches 81% overall accuracy, 0.63 correlation coefficient and 0.89 area under the receiving operating characteristic curve (AUC). For the subset of mutations that can be mapped on protein structures known with atomic resolution (at the Protein Data Bank), the structure-based method results in 85% overall accuracy, a correlation coefficient of 0.70, and AUC of 0.92. In conclusion, SNPs&GO is a valuable tools that includes in unique framework information derived from protein sequence, structure, evolutionary profile, and protein function. In a recent study (Thusberg et al., Human Mutation 2011), SNPs&GO has been scored as one of the best algorithms for prediction of deleterious SAPs.
Avaliablity: http://snps.uib.es/snps-and-go

Poster A06

The functional importance and detection of regulatory sequence variants

Virginie Bernard Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute

Wyeth Wasserman (Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics - University of British Columbia); David Arenillas (Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics - University of British Columbia);

Short Abstract: The convergence of high-throughput technologies for sequencing individual exomes and genomes and rapid advances in genome annotation are driving a neo-revolution in human genetics. This wave of family-based genetics analysis is revealing causal mutations responsible for striking phenotypes. By mapping the reads to the human genome reference and by searching for variations relative to the reference, a list of small nucleotide variations and structural variations is obtained. Analysis is required to reveal those variations most likely to contribute to a disease phenotype within a family. Existing software score the severity of changes that arise in protein encoding exons. However, most mutations within a family are situated in the 98% of the genome that controls the developmental and physiological profile of gene activity - the sequences that control when and where a gene will be active.

Functional contributions of cis-regulatory sequence variations to human genetic disease are numerous. With full genome sequencing becoming accessible to medical researchers, the need to identify potential causal mutations in regulatory DNA is becoming imperative. We are implementing a software system to enable genetics researchers to characterize regulatory DNA changes within individual genome sequences. We are combining reference databases of known regulatory elements, experimental archives of protein-DNA interactions and computational predictions within an integrated analysis package. With our software, researchers will have greater capacity to identify variations potentially causal for disease.

The poster introduces the challenges and approaches of regulatory sequence variation analysis.

Poster A07

Uncovering Network Control Mechanisms in Induced Oncogenic Transformation of Human Mammary Epithelial Cells through Whole Transcriptome Analyses

Chen Rubinstein Bar Ilan University

Sol Efroni (Bar Ilan University) Rotem Ben-Hamo (Bar Ilan University, The Mina and Everard Goodman Faculty of Life Sciences); Helit Cohen (Bar-Ilan university, The Mina and Everard Goodman Faculty of Life Sciences);

Short Abstract: Breast cancer affects one of eight women during their lives, and is the second leading cause of cancer related deaths among women in the USA. Malignant transformation is a complex multistep process, in which genetic, environmental, and dietary factors together alter critical cell growth regulatory pathways, resulting in uncontrolled proliferation, which is a hallmark of tumorigenesis. A multistep in vitro model established by Weinberg and colleagues involves serial introduction of defined genetic elements to achieve full tumorigenic transformation of human mammary epithelial cells. Using this in-vitro model to dissect the tumorigenic process into step-wise controlled processes allows us to identify transcriptional alterations at defined “molecular time points” during the transformation process. To explore mRNA abundance in this cell tranformation system we used high throughput RNA sequencing, specifically, the SOLiD system. This method allows an accurate broad view of gene expression by sequencing the whole transcriptome. Using computational methods we were able to find evidence for transcriptome-wide modifications and their correlations with induced molecular modification. We were also able to quantify significant changes in unexpected pathways that are the result of this induced transformation. The analysis suggests a specific tumorgenesis mechanism that controls the oncogenic transformation via specific genes. This study may provide a basis for the identification of novel mechanisms involved in breast cancer and potentially important targets for diagnosis, prognosis and therapy.

Poster A08

Exact and heuristic algorithms for analysis of genome-wide association studies

Peng Sun Max-Planck-Institut für Informatik

Jiong Guo (Saarland University, Computer Science); Jan Baumbach (Max-Planck-Institut für Informatik, Computational System Biology);

Short Abstract: Genome-wide association studies (GWAS) are dedicated to examining genes of different individuals of one species in order to associate genetic variations with phenotypic traits, usually diseases. Typical assays allow for testing disease associations of hundreds and thousands of single-nucleotide polymorphisms (SNPs), which have revoluntionized the search for relations between genetic features and complex traits. However, the traditional analyzing methodology, concentrating on the association between one SNP and one phenotype, neglects the potential of GWAS to identify multiple variants and that predict the risk of more than one phenotype. Therefore, we propose a new approach for discovering associations between a groups of genetic variants and groups of phenotypes. First, we model GWAS data as bipartite graph. Afterwards, we propose to compute clusters of DNA loci that are responsible for clusters of phenotypes based on solving the bi-cluster editing problem. Here, we introduce first results of our studies.

Poster A09

siRna – A Comprehensive Analysis Workflow Towards Better Understanding of High-throughput Screening Data.

Vidal Fey VTT Technical Research Centre of Finland

Pekka Kohonen (VTT Technical Research Centre of Finland, Medical Biotechnology); Pekka Tiikkainen (VTT Technical Research Centre of Finland, Medical Biotechnology); Elmar Bucher (VTT Technical Research Centre of Finland, Medical Biotechnology); Saija Haapa-Paananen (VTT Technical Research Centre of Finland, Medical Biotechnology); Merja Perälä (VTT Technical Research Centre of Finland, Medical Biotechnology);

Short Abstract: Today’s high-throughput screening (HTS) approaches are based on RNA interference (RNAi) technologies such as small interfering RNAs (siRNAs) or microRNAs (miRNAs) in addition to a wide range of small bioactive compounds. HTS is performed on microtiter plates, generally in a 384-well format. We have developed an automated analysis workflow that takes in raw data files from the plate reader along with plate-specific annotation files and generates comprehensive, annotated textual and graphical output. Numerous normalisation techniques are applied combining well-established transformation and smoothing algorithms such as B-score, time-drift correction or loess normalization with and without log2 transformation and mad-score variance correction. An extensive set of quality control (QC) plots enables researchers to objectively inspect their data and evaluate different normalisation methods. For instance, plate-series plots using different colors representing different well contents visualize data distributions before and after normalization while plate-picture plots allow for step-by-step inspection of loess-based normalizations. Hit finding is performed utilizing the three-sigma rule or rank-based methods implemented in the RankProd R package. Since results will be generated for all available ways of normalization with one command, users can immediately use the output data after visually determining the appropriate normalization. Interactive approaches for QC and selecting appropriate normalizations are under development.
The function is implemented entirely in R, the widely used software environment for statistical computing. Initial output comprises a separate text file for each of the normalizations and a log file providing supplementary statistics and verbose command line output. Plots are saved in PDF or PNG format.

Poster A10

miRNA levels are associated with immune network activity and associate with disease outcome in Ovarian Cancer. A possible miRNA – network control mechanism.

Rotem Ben-Hamo Bar Ilan University

Sol Efroni (Bar Ilan University)

Short Abstract: Ovarian cancer is the leading cause of deaths from gynecologic cancer, with five-year survival rates <30%. miRNAs are a class of short single-stranded RNA-molecules that target mRNAs and trigger either translation repression or mRNA degradation and are strongly connected with most, if not all, human malignancies due to their important role in controlling many biologic processes. miRNA may provide a potential immune-based therapy for cancer. The-Cancer-Genome-Atlas, multi-center coordinated effort, has made available the molecular characteristics of more than 500 patients. Pathway Consistency and Activity metrics, which overlay expression data over curated network knowledge, were calculated according to (Efroni,et.al,2007) and the connection between network modifications and miRNA abundance has been determined. Two miRNAs were found to be associated with pathway behavior in highly positive correlated manner. Three pathways associated with miRNA behavior are all immune-response related pathways and share a set of genes. Perhaps surprisingly, this set of genes does not correlate with miRNA when out of pathway context. Only the combined, non-linear, integrative culmination of genes into pathways reveals this immune-miRNA associated behavior. Previous works regarding the two miRNAs show significant positive association with the genes related to inflammatory-response. Furthermore, Kaplan-Meier survival analysis revealed one of the miRNAs as highly significant (Pvalue=0.001) in prognosis stratification and showed a significant separation in expression-levels between groups (Mann-Whitney p-value=1.812x10-84), when high-levels of the miRNA predict poor prognosis and vice versa. The work presented here suggests biomarkers, based on miRNA abundance levels, for prognosis and as a possible therapeutic target for ovarian cancer.

Poster A11

A Novel Bioinformatics Pipeline for Identification and Characterization of Fusion Transcripts in 31 Breast Cancer and Normal Cell Lines

Yan Asmann Mayo Clinic

Asif Hossain (Mayo Clinic, Health Sciences Research); Brian Necela (Mayo Clinic, Cancer Biology); Sumit Middha (Mayo Clinic, Health Sciences Research); Krishna Kalari (Mayo Clinic, Cancer Biology); Zhifu Sun (Mayo Clinic, Health Sciences Research); High-Seng Chai (Mayo Clinic, Health Sciences Research); David Williamson (Illumina, Bioinformatics Scientist); Derek Radisky (Mayo Clinic, Cancer Biology); Gary Schroth (Illumina, Expression R&D); Jean-Pierre Kocher (Mayo Clinic, Health Sciences Research); Edith Perez (Mayo Clinic, Department of Medicine); Aubrey Thompson (Mayo Clinic, Cancer Biology);

Short Abstract: SnowShoes-FD, developed for fusion transcript detection in paired end mRNA-Seq data, employs multiple steps of false positive filtering to nominate fusion transcripts with near 100% confidence. Unique features include: (i) identification of multiple fusion isoforms from two gene partners; (ii) prediction of genomic rearrangements; (iii) identification of exon fusion boundaries; (iv) generation of a 5’ to 3’ fusion spanning sequence for PCR validation; (v) prediction of the protein sequences, including frame shift and amino acid insertions. We applied SnowShoes-FD to identify 50 fusion candidates in 22 breast cancer and 9 non-transformed cell lines. Five additional fusion candidates with two isoforms were confirmed. Thirty out of 55 fusion candidates had in-frame protein products. No fusion transcripts were detected in non-transformed cells. Consideration of the possible functions of a subset of predicted fusion proteins suggests several potentially important functions in transformation, including a possible new mechanism for overexpression of ERBB2 in a HER positive cell line. The source code of SnowShoes-FD is provided in two formats: one configured to run on the Sun Grid Engine for parallelization, and the other formatted to run on a single LINUX node. Executables in PERL are available for download from our website: http://mayoresearch.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm.

Poster A12

Feature extraction from Spinal cord images

Bunheang Tay Dankook University

Sejong Oh (Dankook University)

Short Abstract: The particular aim of this work is to improve the classification efficiency for spinal cords’ diffusion tensor image (DTI) dataset. By applying mapping area (atlas) to Regions of Interest (ROIs), we can extract feature more efficiently and effortlessly than studying each voxel in whole DTI. We get more specific voxel, meanwhile, it can avoid noise which is caused by the neighbor gray matter or bone voxel. The mapping area includes four regions – posterior, anterior, left and right of the spine. These four regions are the hot spot to calculate fractional anisotropy (FA), and apparent diffusion coefficient (ADC) parameter, all of which are precisely to distinguish between normal controls from patient. All patients in this work were clinical motor and sensory functions were examined the International standards for the Neurological Classification of Spinal Cord Injury (ISCSCI).
After selecting regions, we adapt feature extraction method and feature ranking method for evaluating each region. Feature selection is to eliminate low accuracy and ineffective features, in vice versa; it gives good accuracy result which determined by k-nearest neighbor algorithm (KNN). The possibility of analyzing the value from voxel within white matter has the potential to improve the diagnosis of a variety of disease that recently occurs in human spinal cord.

Poster A13

Integrated Molecular Biology and Bioinformatics approaches in Identification of Putative Reservoirs/Hosts of Rift Valley Fever Virus (RVF) in Kenya

David Ouma icipe-African Isect Science for Food and Health

Daniel Masiga (icipe-African Insect Science for Food and Health, Molecular Biology and Biotechnology); Marion Burugu (Kenyatta University, Biochemistry); Paul Mireji (Egerton University, Biochemistry and Molecular biology); Rosemary Sang (Kenya Medical Research Institute, Arbovirology/Viral hemorrhagic Fever);

Short Abstract: Rift Valley Fever (RVF) is a mosquito borne viral infection, first reported in Kenya in 1931. In the 2006/2007 outbreak in Kenya, Garissa, Baringo and Kilifi districts were major hot spots with a total mortality of approximately 300 human and 40,000 domestic livestock. Blood fed mosquitoes sampled during the outbreak were cryopreserved. Heads and abdomens of single cryopreserved blood fed mosquitoes (n = 216) (Aedes, Culex, Anopheles and Mansonia genera) were screened for RVF by cell culture, and RT-PCR. Putative vertebrate hosts of the vectors were determined by amplification and sequencing of bloodmeal cytochrome C oxidase I (COI). The resultant sequences were annotated through a bioinformatic pipeline suite comprising of 1) BioEdit for initial cleaning and development of consensus sequences 2) Basic Local Alignment Search Tool (BLAST) searches against GenBank nr database to identify putative homologues of the sequences and 3) Barcode of Life Data Systems (BOLD) for COI. The results of the in silico analyses implicated Mansonia uniformis (Baringo), and Aedes ochraceous and Aedes mcintoshi (Garissa) as putative vectors of the virus. Bloodmeal analyses of positive RVFV mosquitoes implicated goats, and sheep as putative hosts (Baringo) while goat, human, sheep and donkey as putative hosts in Garissa. The analyses demonstrate the potential application of bioinformatics approaches when integrated to wet lab tools as accurate and effective in identification of the vertebrate hosts of RVF, with potential application in public health initiatives in control and understanding vector borne pathogen epidemiology.

Poster A14

Cross-species candidate gene prioritization with MerKator

Leon-Charles Tranchevent Katholieke Universiteit Leuven

Shi Yu (Katholieke Universiteit Leuven, Department of Electrical Engineering ESAT-SCD); Sonia Leach (Katholieke Universiteit Leuven, Department of Electrical Engineering ESAT-SCD); Bart De Moor (Katholieke Universiteit Leuven, Department of Electrical Engineering ESAT-SCD); Yves Moreau (Katholieke Universiteit Leuven, Department of Electrical Engineering ESAT-SCD);

Short Abstract: In biology, there is often the need to prioritize large list of candidate genes to further test only the most promising candidate genes with respect to a biological process of interest. In the recent years, many computational approaches have been developed to tackle this problem efficiently by merging multiple genomic data sources. We have previously described a gene prioritization method based on the use of kernel methods and proved that it outperforms our previous method based on order statistics. In the present poster, we report the extension of the method to support data integration over multiple related species and the development of a web based interface termed MerKator that implements this strategy and proposes candidate gene prioritization for 5 species. Our cross-species approach has been benchmarked and cases studies demonstrate that human prioritizations can benefit from model organism data.

Poster A15

A guide to web tools to prioritize candidate genes

Yves Moreau Katholieke Universiteit Leuven

Leon-Charles Tranchevent (Katholieke Universiteit Leuven) Francisco Bonachela Capdevila (Katholieke Universiteit Leuven, Department of Computer Science); Daniela Nitsch (Katholieke Universiteit Leuven, Department of Electrical Engineering ESAT-SCD); Bart De Moor (Katholieke Universiteit Leuven, Department of Electrical Engineering ESAT-SCD); Patrick De Causmaecker (Katholieke Universiteit Leuven, Department of Computer Science); Yves Moreau (Katholieke Universiteit Leuven, Department of Electrical Engineering ESAT-SCD);

Short Abstract: Finding the most promising genes among large lists of candidate genes has been defined as the gene prioritization problem. It is a recurrent problem in genetics in which genetic conditions are reported to be associated with chromosomal regions. In the last decade, several different computational approaches have been developed to tackle this challenging task. In this study, we review 19 computational solutions for human gene prioritization that are freely accessible as web tools and illustrate their differences. We summarize the various biological problems to which they have been successfully applied. Ultimately, we describe several research directions that could increase the quality and applicability of the tools. In addition we developed a website (http://www.esat.kuleuven.be/gpp) containing detailed information about these and other tools, which is regularly updated. This review and the associated website constitute together a guide to help users select a gene prioritization strategy that suits best their needs.

Poster A16

History-aware models for HIV therapy screening

Jasmina Bogojeska Max-Planck Institute for Informatics

Thomas Lengauer (Max-Planck Institute for Informatics, Computational Biology and Applied Alorithmics);

Short Abstract: HIV patients are treated by repeated administration of different combinations of several antiretroviral drugs. Finding an effective combination therapy for patients in the mid to late stages of antiretroviral therapy is particularly challenging because of the accumulated drug resistance from all previous therapies - the latent virus population. Moreover, the HIV clinical data sets are biased in many ways such as: uneven therapy representation and uneven representation with respect to the level of therapy experience. We developed an approach that tackles the aforementioned problems by utilizing information from previous therapies the patient has received when predicting the outcome of a given HIV therapy. For this purpose, we introduce a quantitative notion of pairwise similarity of therapy sequences that incorporates information about the latent virus population, the specific therapies given to a patient and the order in which they were administered. We use this similarity measure to derive sample weights for training sample-specific history-aware linear models for predicting outcomes to HIV combination therapies.

The computational experiments on the EuResist clinical data demonstrated that compared to the most commonly used approach that encodes therapy history information only by specific input features, our approach has the advantage of delivering significantly better results for therapy-experienced patients. Additionally, the
sample-specific linear models are more interpretable because for each target sample the information on which similar therapy sequences were most informative for the prediction, or which viral mutations have the largest influence on therapy effectiveness is available.

Poster A17

Network-based gene prioritization from expression data by diffusing through protein interaction networks

Daniela Nitsch KU Leuven

Léon-Charles Tranchevent (KU Leuven, ESAT-SCD); Joana Gonçalves (INESC-ID, Knowledge Discovery and Bioinformatics (KDBIO) group); Yves Moreau (KU Leuven, ESAT-SCD);

Short Abstract: Discovering novel disease genes is challenging for diseases for which no prior knowledge is available. Performing genetic studies frequently result in large lists of candidate genes of which only few can be followed up for further investigation. In the past couple of years, several gene prioritization methods have been proposed. Most of them use a guilt- by - association concept, and are therefore not applicable when little is known about the phenotype or no disease genes are available.

We have proposed a method that overcomes this limitation by replacing prior knowledge about the biological process by experimental data on differential gene expression between affected and healthy individuals. At the core of the method are a protein interaction network and disease-specific expression data. Our approach propagates the expression data over the network using an extended Random Walk approach based on kernel methods, as the inclusion of indirect associations compensating for network sparsity and small world effect issues. It relies on the assumption that strong candidate genes tend to be surrounded by many differentially expressed neighboring genes in a protein interaction network.
We have benchmarked our approach, and results showed that it clearly outperforms other gene prioritization approaches with an average ranking position of 8 out of 100 genes, and an AUC value of 92.3%.

Recently, we have developed the web server PINTA implementing our gene prioritization approach to make it available for clinicians and other researchers.

Poster A18

Detecting cancer driving mutations by whole genome sequencing in the ICGC PedBrain project

Barbara Hutter German Cancer Research Center

Natalie Jäger (German Cancer Research Center, Theoretical Bioinformatics); Qi Wang (German Cancer Research Center, Theoretical Bioinformatics); Benedikt Brors (German Cancer Research Center, Theoretical Bioinformatics); Marc Zapatka (German Cancer Research Center, Molecular Genetics);

Short Abstract: As a part of the International Cancer Genome Consortium (IGCG), the German ICGC PedBrain project aims at identifying genomic mutations that drive the development of the two most common pediatric brain tumors, medulloblastomas and astrocytomas. Of these, 300 whole genomes each as well as their matched blood samples will be analyzed based on next generation sequencing.
To efficiently address the resulting computational challenges, we developed a bioinformatics pipeline that integrates the most up-to-date publicly available software. After short read alignment and quality control, the pipeline reports mutations such as copy number variations, single nucleotide polymorphisms (SNPs), insertions and deletions in protein coding and regulatory regions. Mutations are assigned germ-line or somatic status by comparison to the matched blood sample. Automated functional annotation is followed by evaluation of recurrent mutations and applying tools to distinguish drivers from passengers. In addition, we align reads to viral genomes to assess the potential role of virus infection in tumor development.
Several medulloblastoma and blood sample pairs sequenced at over 30x coverage each have already been analyzed. We found that a common source of error is sample swapping, which we address by using insert size information of the paired end reads and in silico SNP genotyping. Furthermore, overlap with the SNP database dbSNP132 serves as a quality control. Experimental validation demonstrated high specificity of our results. Mutations with predicted high relevance will be subject to a large-scale prevalence screen. The whole genome analyses will be complemented by transcriptome (RNA-Seq) and methylome (RRBS) data.

Poster A19

Chromosomal clustering of tissue restricted antigens

Maria Dinkelacker German Cancer Research Center, Heidelberg

Sheena Pinto (German Cancer Research, Developmental Immunology); Roland Eils (German Cancer Research Center, Theoretical Bioinformatics); Bruno Kyewski (German Cancer Research Center, Developmental Immunology); Benedikt Brors (German Cancer Research Center, Theoretical Bioinformatics);

Short Abstract: Every day the immune system has to distinguish between self and non self antigens presented to the receptors of the adaptive immune system. T cells are therefore trained in the thymus on self antigens in order to not recognize our own genes and develop autoimmune diseases. We determined the set of tissue restricted antigens (TRAs) involved in this process of negative selection and could show that they are significantly clustered on the chromosome. This could explain how their gene expression can be regulated simultaneously in one cell type, while representing all different types of tissues in the body at once. Understanding the molecular mechanism of this regulation will give more inside in the prevention of autoimmune diseases. In case of disregulation, many TRAs are not displayed to the T cells, autoreactive T cells are released into the periferal blood and result in multiple autoimmune diseases in mouse and human. Furthermore we could show that the TRA clusters found are highly conserved between different species. Finding a molecular regulator of TRAs would help to treat and cure autoimmune diseases.

Poster A20

Mapping the spread of HIV in Germany

Glenn Lawyer Max Planck Institute for Informatics

Short Abstract: The Resina Project tracks the spread of HIV-1 genotypes in German patients. To date, 35 participating clinics have gathered over 2700 genotypes, along with demographics including patient risk group and region of residence.

Genotypic similarity between viral samples can be converted into a graph linking related infections. This allows graph theoretical investigations of the data, such as measuring transmissions between various risk groups, the degree of interconnectivity within each group, and the structure and size of connected components. Additionally, the geographical information can be used to condense the graph to connections between regions. Measures on the condensed graph are also of interest, and the geographical structure allows the graph to be plotted over a map of Germany.

A conservative similarity threshold led to the following observations. Connected components in the full graph were mostly pairs or triplets. Only 5% of transmission chains had more than 10 patients. The homosexual risk group continues to be at the center of the epidemic, with all risk groups showing many connections to this group. This was surprising, given that two of the risk groups (intravenous drug users; people from endemic regions) were not expected to have strong overlap with the homosexual community. The reduced graph showed several interesting patterns. For example, two distinct networks of non-B HIV were found in the intravenous drug user risk group.

Graphical representation of genetic relationships reveals deep structure in the spread of HIV, both through mathematical analysis and visual display.

Poster A21

SNP4Disease: a knowledge database to detect disease associated SNPs

David John Max Planck Institute Bad Nauheim

Pascal Gellert (Max Planck Institute Bad Nauheim, Cardiac Development and Remodelling); Shizuka Uchida (Max Planck Institute Bad Nauheim, Cardiac Development and Remodelling); Thomas Braun (Max Planck Institute Bad Nauheim, Cardiac Development and Remodelling);

Short Abstract: In the last years, the number of conducted genome wide association studies (GWAS) has increased worldwide. The aim of GWAS is to find gene variations that may cause diseases or increase the susceptibility for certain disorders. Through GWAS, many single nucleotide polymorphisms (SNPs) have been identified to be associated with different diseases.
Here we propose our database “SNPer”, which enables scientists and medical doctors to find a wide range of potentially interesting SNPs from a disease of their interest. SNPer is a knowledge database constructed using data and literature mining techniques to incorporate data from various open databases. It provides information about disease associated SNPs and their potential biological impacts. The easy to use web interface of SNPer displays the calculated SNPs and provides further information in a clearly arranged table. Furthermore, it offers different filters to restrict the number of resulting SNPs as well as many links to other databases for further investigations. SNPer is freely available at http://SNP4Disease.mpi-bn.mpg.de"

Poster A22

PEPTIDE DRUG DESIGN FOR INTERLEUKIN-1?

ECE BULUT Koc University

Ece Bulut (Koc University) Burak Erman (Koc University, Chemical and Biological Engineering);

Short Abstract: Interleukin 1 beta (IL-1?) is an important pro-inflammatory cytokine related to diseases such as rheumatoid arthritis and intestinal inflammation. IL-1? is activated intracellularly by Caspase-1 (Interleukin converting enzyme, ICE). Active IL-1? migrates to the extracellular region and binds to its embedded receptor (IL-1R) and an accessory protein (IL-1RAcP) in the cell surface. This movement approximates the intracellular domains of its receptors and causes further expression of IL-1?.

In this study, inhibition of active IL-1? has been studied extensively by using computational docking tools and techniques. Binding residues of IL-1? to its receptor were accepted as the docking regions. Several trials were made to determine the best site for a drug candidate. Viterbi algorithm based on Hidden Markov Model is used to obtain finest results. This model is a stochastic model which consists of observed and hidden probabilities. For this case observed probabilities are chosen from the docking scores while hidden probabilities are selected as the secondary structure states based on phi/psi Ramachandran angles. SER-LEU-ASP tripeptide is found as the most potent inhibitor and the quality of the peptide was tested with GOLD and AutoDock programs.

Poster A23

Comparison of RNA-Seq analysis tools to detect alternative splicing events in pancreatic tumour transcriptomes

Emilie Chautard Ontario Institute for Cancer Research

Francis Ouellette (Ontario Institute for Cancer Research, Informatics and Bio-computing);

Short Abstract: Pancreatic cancer (PC) has one of the poorest five-year relative survival rates (6%), a rate which has not increased substantially over the past 30 years. Our objective is to develop algorithms to identify potential prognostic markers that can accurately predict the outcome of PC and help in the treatment decision of PC patients. We are particularly interested in the discovery of alternative splicing markers in PC. We are comparing different methods of extracting isoform and expression data from RNA-Seq experiments. Different strategies for reconstructing a transcriptome using RNA-Seq reads exist, either without prior knowledge (Trans-ABySS) or using a reference genome (Cufflinks) or transcriptome (ALEXA-SEQ). We are comparing these methods to evaluate which one(s) would be the most adapted for analyzing cancer transcriptomes, as tumor genomes often undergo a high number of alterations that are difficult to map on a reference genome or transcriptome. We are comparing the transcripts obtained with these methods to the transcript annotations provided by different databases, using two different data sets of normal mouse liver and human cancer cell lines. We are measuring the percentage of reconstruction of the transcripts and the read coverage across junctions for each method and using them to evaluate their best parameters. We are also analyzing the reads that are not assembled to know why they have been discarded. The result of this work will be the quantification of the abundance and the presence, absence or identification of new isoforms in the PC transcriptome compared to the normal matched control.

Poster A24

Ion mobility spectrometry for real time metabolomics in clinical diagnostics

Arthur Gollmer Max Planck Institut - Informatik

Short Abstract: The ion mobility spectrometry (IMS) – normally used on airports for the detection of explosives or drugs – can be coupled with a multi-capillary column and used for metabolic profiling, biomarker finding and analysis of human breath. Thus, this approach is a new analytical method for exhaled humid air with the advantages of being very fast (one measurement takes only few minutes; sample collection is even only few seconds) and being very sensitive (up to pptv). Arising from these advantages the complexity space exceeds the pure static question by focusing on dynamic processes. The goal is to introduce a new quality in the early prediction of bronchial carcinoma, tumor stages, rejection prediction for lung transplantations, bacterial infection detection, COPD types, medical control and anesthetic monitoring. Some of the results from these applications are illustrated on the poster. Finally this powerful diagnostics should lead towards a personalized and individual medication with significantly increased chance of recovery and decreased number of side effects because of the treatment.

Poster A25

Transcriptomic profiles and gene networks of human Mesenchymal Stem Cells (MSCs)

Beatriz Rosón Consejo Superior de Investigaciones Cientificas (CSIC)

Alberto Risueño (Consejo Superior de Investigaciones Cientificas (CSIC), Bioinformatics and Functional Genomics Group); Fermín Sánchez-Guijo (Hospital Universitario de Salamanca, Haematology Department); Consuelo Del Cañizo (Hospital Universitario de Salamanca, Haematology Department); Javier De Las Rivas (Consejo Superior de Investigaciones Cientificas (CSIC), Bioinformatics and Functional Genomics Group);

Short Abstract: Stem Cells, by definition, has the potential for self-renew and capacity to differentiate into specialized cell types. Recently discovered Mesenchymal Stromal/Stem Cells (MSCs) represent a highly flexible cell type, capable to differentiate into bone, cartilage or adipose, among other tissues. However, many questions about the transcriptomic events and regulatory machinery that govern MSC’s biology remain unknown. In order to characterize human MSCs at genome-wide level, we proceeded to isolate them from several tissues: placenta, bone marrow, and adipose tissue. Once achieved in vitro culture-expansion, the cells were analysed based on specific CD markers, and its capacity to differentiate were tested. Thereafter we isolated mRNA from the primary cultures and hybridized it over high-density Exon Arrays. Expression profiling and alternative splicing mapping were obtained from the analysis of this data set. To achieve a wider gene expression pattern we simultaneoulsy performed a metanalysis of more than 300 microarrays mined from public databases (GEO, ArrayExpress, StemBase) using 12 different cell types, classified as stem/progenitor cells or differentiated cells from two main lineages: Mesenchymal and Haematopietic. Here we outline a bioinformatic approach, in which hierarchical clustering, differential expression, co-expression, and gene networks were uncovered from the data sets described. Several regulatory elements, pathways and key genes have been deciphered expanding the knowledge about human MSCs, which could be used to improve handling protocols and trigger future tissue engineering achievements.

Poster A26

Classification of Mismatch Repair Gene Missense Variants

Heidi Ali University of Tampere

Ayodeji Olatubosun (University of Tampere, Institute of Biomedical Technology); Mauno Vihinen (University of Tampere, Institute of Biomedical Technology);

Short Abstract: Lynch syndrome accounts for approximately 2 to 5 % of colorectal cancers. The syndrome is caused by germline mutations in mismatch repair (MMR) genes, MLH1, MLH3, MSH2, MSH6, PMS1, PMS2 and TFGBR2. MMR is a DNA repair system that recognizes and repairs base-base mispairs and insertion-deletion loops arising in DNA replication and recombination. Thousands of MMR variants have been discovered, but their relevance to the cancer is usually unknown. Here, we utilize bioinformatics prediction methods to classify MMR variants.
We identified from literature 168 functionally tested MMR missense variants of which 82 were pathogenic. The InSiGHT database for Lynch syndrome data contains over 600 variants with unknown effect. We used Pathogenic-Or-Not-Pipeline http://bioinf.uta.fi/PON-P for the prediction and analysis of these variants. Since the performance of the individual predictors was not as good as we wanted, we developed a consensus predictor based on several tolerance prediction methods. With this predictor, we were able to classify over 200 previously unknown MMR missense variants as pathogenic or neutral. The results can be used to prioritize variants for further experimental validation and may help in the diagnosis of Lynch syndrome and other gastric cancers.

Poster A27

Association Rule Mining with Prior Knowledge for Alzheimer's Disease

Peter Li Mayo Clinic

Gyorgy Simon (Mayo Clinic, Health Sciences Research);

Short Abstract: As we migrate to modeling diseases as a multi-factorial problem, the ability to analyze any given genomic data set is limited by the combinatorial explosion of false discoveries. The statistical solution is to require increased significance (e.g. Bonferroni correction), but this increases false negatives. Another approach is to use prior knowledge, such as pathways and networks. Most methods fail to account for population heterogeneity. In this work, we present a novel approach integrating prior knowledge, population heterogeneity, with a two-stage association rule mining technique, whose behavior is different from traditional testing.

We evaluated this method using GWAS from the Joint Aging, Addiction and Metal Health (JAAMH) Alzheimer's Disease (AD) data set of 1237 cases and 1254 controls. A combined interaction network was built from Reactome, BrioGrid, IntAct, MINT, DIP and HPRD. In the first stage, we generate haplotype blocks and then apply predictive association rule mining for each block. In the second stage, we discover combinations of predictive haplotypes, whose corresponding genes are on average ?k hops away from the nearest known AlzGene gene on the network.

We found that at k=1, we discovered 50% less patterns than we would have without the use of prior knowledge, yet we recovered 93% of the significant patterns and 89% of the unique genes. The lower total number of patterns allows for less stringent Bonferroni correction, leading to 10% increase in the number of significant patterns. The predictive capability of the discovered genes is higher than that of individual SNPs or haplotypes.

Poster A28

Receptor and ligand based virtual screening for beta 3 adrenergic receptor agonists

Parul Tewatia Amity University

BK Malik (Amity University, Amity Institute of Biotechnology); Shakti Sahi (Gautam Buddha University, School of Biotechnology);

Short Abstract: beta3 - adrenergic receptors (AR) are located in the plasma membrane of both white and brown adipocytes where they mediate metabolic effects such as lipolysis and thermogenesis. Recently, we reported an active conformation model of the beta3 AR obtained through molecular modeling techniques. The model was optimized by flexible docking long term MD simulations. To gain insight into the structure activity relationship of the beta3 AR agonists, we validated and successfully applied different techniques of virtual screening (VS), Pharmacophore and QSAR to identify promising new agonists for beta3 AR. In a step wise filtering protocol, structure based VS of 1.0 million compounds from various publicly available databases of small molecules was done. These molecules were docked into the active site of the receptor utilizing three levels of accuracy; ligands passing the HTVS (high throughput VS) step were subsequently analyzed in Glide SP and finally in Glide XP to estimate the receptor ligand binding affinities. Structure based VS was conducted in order to discover the structurally diverse agonists for beta3 AR. In the second step a total of 300 pharmacophore hypothesis were generated from a set of known and diverse beta3 AR agonists using PHASE. The best hypothesis showed six features: three hydrogen bond acceptors, one positively charged group, and two aromatic rings, a predictive QSAR model was further established. To cross validate, pharmacophore filtering was done on the set of shortlisted compounds from structure based VS. All calculations were performed on a Linux cluster of 8 computers with 16 processors.

Poster A29

An Integrative Bioinformatic Predictor of Protein Sub-Cellular Localisation in Malaria

Ben Woodcroft The University of Melbourne

Robert Radloff (The University of Melbourne, Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute); Lee Yeoh (The University of Melbourne, Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute); Kristie-Lee Scanlon (The University of Melbourne, Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute); Maria Doyle (The University of Melbourne, Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute); Giel van Dooren (The University of Melbourne, School of Botany); Geoff McFadden (The University of Melbourne, School of Botany); Chris Tonkin (The Walter and Eliza Hall Institute of Medical Research, Division of Infection and Immunity); Terry Speed (The Walter and Eliza Hall Institute of Medical Research, Division of Bioinformatics); Stuart Ralph (The University of Melbourne, Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute);

Short Abstract: The malarial parasite Plasmodium falciparum remains a leading international cause of mortality, with almost a million deaths each year. Determination of protein sub-cellular localisation remains a challenge in Plasmodium parasites due to their evolutionary distance from well-studied model organisms, and limited efficiency of appropriate molecular tools. However, abundant large scale systems biology information exist for several Plasmodium species as well as other apicomplexan parasites, including full genomic DNA sequences, plus data sets relating to the transcriptome, protein expression and interactions, polymorphisms and phyletic profiles. To date, most bioinformatic predictors of sub-cellular localisation use sequence information exclusively without consideration for other data sets. We have now developed the first global bioinformatic predictor of sub-cellular localisation in Plasmodium falciparum (called Plasmarithm) that predicts localisation for multiple cellular compartments using a variety of post-genomic information types. We have identified several non-sequence data types that are predictive of localisation, including phyletic distribution and transcript abundance at specific life stages. We performed a comprehensive literature survey of the phylum Apicomplexa to construct a database of ~850 recorded protein localisations curated from ~700 separate publications. The database, called ApiLoc, is freely available at http://apiloc.bio21.unimelb.edu.au, and we are using this to improve the accuracy of our predictor. We achieved an overall accuracy of ~60% on a seven class problem, where a number of the classes have not previously been predicted. To further validate these in-silico analyses, we have experimentally verified localisations of a number of hypothetical proteins in the related apicomplexan Toxoplasma gondii.

Poster A30

Semantic data integration to represent biochemical network relationship for finding molecular signature of trace chemical expose

Yeon-Kyung Kang Inisilicogen LTD

Byeong-Chul Kang (Insilicogen Inc., R&D center); Ga-Hee Shin ( Insilicogen Inc , R&D center); Seung-Yong Hwang (Hanyang University, Biochemistry); Jae-Chun Ryu (Korea Institute of Science and Technology, Cellular and Molecular Toxicology Laboratory);

Short Abstract: Using toxicogenomics for the risk assessment of environmental hazards caused by a trace of chemical expose for a long period is very effective. Toxicogenomics data such as genome sequence, genotype, gene expression, phenotype, disease information, etc. are contributed to each step for environmental risk assessment on hazards. However, these various and heterogeneous data are reconstructed by proper data model with critical component for building information system for risk assessment. This study suggests a semantic modeling to organize heterogeneous data types and introduces techniques and concepts (such as ontologies, semantic objects, typed relationships, contexts, graphs, and information layers) that are used to represent complex biochemical networks relationship. Therefore, the semantic modeling tool is used as an example to demonstrate how a domain such as risk assessment is represented and how this representation is utilized for research. In addition, we present comprehensive databases touching on experimental, computational, and regulatory aspects. Especially, databases and tools of interactions and pathways beyond '-omics' data reveal biochemical network relationship with molecular signature by expose of trace chemical. This work finely aims to provide open platform to manage user's knowledgebase for risk assessment of environmental hazards, which supports to build chemical-gene-disease relationship like CTD.

Poster A31

Epicluster for high-throughput GWA epistasis analysis

Attila Gyenesei Turku Centre for Biotechnology, University of Turku and Åbo Akademi University

Jonathan Moody (MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine); Asta Laiho (University of Turku and Åbo Akademi University, Turku Centre for Biotechnology); Colin Semple (MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine); Chris Haley (MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine); Wen-hua Wei (MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine);

Short Abstract: Gene-gene interactions (epistasis) are thought to be a potential source of so called hidden genetic variation but they remain largely unexplored in human genome-wide association (GWA) studies conducted so far. One major hurdle for studying epistasis in GWA studies is the lack of widely accepted algorithms that are fast enough to effectively handle high density SNPs and can map different forms of epistasis while keeping false positive rates under control. Numerous new algorithms have been developed recently (e.g. Bayesian marker partitioning, fastEpistasis and BOOST) but still require substantial computation to analyze epistasis in one trait in a GWA dataset.
We have developed a novel tool named Epicluster based on a pattern discovery algorithm that has been successfully applied in high-throughput gene expression data analyses. The method has been tested in two real GWA datasets that have been examined previously by other algorithms. The tests showed that Epicluster successfully re-produced the epistasis results previously reported but completed the analyses very quickly - for the GWA dataset with 300,000 SNPs and 1000 individuals, Epicluster took 5 CPU hours on a single computer to finish a pair-wise search. With this speed, Epicluster can make GWA epistasis analysis routinely available and help investigate questions such as power of detection, false positive control and eventually the role of epistasis in complex trait genetics.

Poster A32

Deconvoluting Cancer Genomes using an Optimal Scaffolder

Gao Song Genome Institute of Singapore

Niranjan Nagarajan (Genome Institute of Singapore)

Short Abstract: Cancer genomes are marked by large-scale amplifications and complex structural variations and often such regions contain the driver events linked to tumorigensis. De novo reconstruction of such regions can be extremely challenging and has not been systematically attempted before. Using large insert paired-end libraries and a recently developed optimal scaffolder (Opera), we describe the de novo assembly of several cancer genomes. We show how our approach provides a more complete picture of complex regions in the genome comparied to reference-based paired-end mapping analysis. The long-range genomic structure provided by a de novo assembly also serves as an ideal template for studying gene regulation and we provide several examples to illustrate this.

Poster A33

Classification of diseases and construction of gene-disease networks using transcriptomic data from clinical samples

Sara Aibar Consejo Superior de Investigaciones Cientificas (CSIC)

Sara Aibar Santos (Consejo Superior de Investigaciones Cientificas (CSIC)) Celia Fontanillo (Consejo Superior de Investigaciones Cientificas (CSIC), Bioinformatics and Functional Genomics Group); Javier de las Rivas (Consejo Superior de Investigaciones Cientificas (CSIC), Bioinformatics and Functional Genomics Group);

Short Abstract: Finding disease gene-markers is nowadays one of the main focuses of molecular medicine. However, to look at single genes individually is no longer enough. Complex diseases are normally caused by a combination of multiple altered genes. Therefore, studying the genes in a comprehensive way, that reflects relations among them, has become a must in order to get a global view of a biological state. Recent approaches show that combining classifiers and networks can help to improve the identification of key elements in such diseases. Most of these studies incorporate prior information from gene network databases to the classifiers. Seeking a different approach, we have developed a multi-step algorithm that generates gene networks associated to a given pathology. This algorithm analyzes genome-wide expression profiles, selects specific gene-markers from each pathological state and integrates them in a co-expression and mutual information network. Moreover, the algorithm builds a multi-class classifier and extracts information about the most discriminant genes for each class. By using the transcriptomic analysis and the classifier to create disease gene networks, we provide a comprehensive view of the compared diseases and a clear ground to identify key genes as putative biomarkers of a given pathological state.

Poster A34

Chipster 2.0: User-friendly analysis software for next generation sequencing data with interactive genome viewer.

Massimiliano Gentile CSC - Finnish IT Ceneter for Science

Massimiliano Gentile (CSC - Finnish IT Center for Science, Application Services); Aleksi Kallio (CSC - Finnish IT Center for Science, Application Services); Taavi Hupponen (CSC - Finnish IT Center for Science, Application Services); Petri Klemelä (CSC - Finnish IT Center for Science, Application Services); Eija Korpelainen (CSC - Finnish IT Center for Science, Application Services);

Short Abstract: Chipster (http://chipster.csc.fi/) is a user-friendly analysis software with a rich collection of analysis tools for microarray and next generation sequencing data, interactive visualizations and workflow functionality. Version 2.0 includes functionality for analysis of ChIP-seq, RNA-seq and miRNA-seq data, and its integrated genome browser allows interactive visualization of reads and results in their genomic context.

Powerful preprocessing tools allow users to perform quality control with the FASTX toolkit and align reads using Bowtie or BWA. Integration of the SAMtools package enables format conversion from SAM to BAM, as well as sorting and indexing.In addition, Chipster features capabilities to find, remove, fuse and combine overlapping genomic regions.

The ChIP-seq analysis tools enable users to detect peaks with MACS, filter them based on p-value, no of reads etc, and scan them for common sequence motifs to be matched against the JASPAR database. Identification of nearby genes, filtering with regards to peak distance, location and various genomic features is also possible. Interpretation of results is aided by incorporating biologically relevant information through pathway analysis.

Data from both single and multi factor RNA-seq and miRNA-seq experiments can be normalized and analyzed for differential expression through integration of the edgeR Bioconductor package. Identification of predicted miRNA gene targets is provided via a number of databases and gene sets can be tested for overrepresentation of biologically relevant classifications with pathway analysis tools.

Technically, Chipster is a Java-based client-server system. It is open source and new tools can be easily added using a simple mark-up language.

Poster A35

MUTALYZER 2: Improved Sequence Variant Descriptions from next generation sequencing data and locus-specific mutation databases

Peter Taschner LUMC

Jeroen Laros (LUMC, Human Genetics); Martijn Vermaat (LUMC, Human Genetics); Gerben Stouten (LUMC, Human Genetics); Johan den Dunnen (LUMC, Human Genetics);

Short Abstract: Unambiguous and correct sequence variation descriptions are of utmost importance, not in the least since mistakes and uncertainties may lead to undesired errors in clinical diagnosis. The Mutalyzer sequence variation nomenclature checker (www.mutalyzer.nl/) names all sequence variants following the Human Genome Variation Society sequence variant nomenclature recommendations (www.hgvs.org/mutnomen), using a GenBank or Locus Region Genomic (LRG) accession number, a HGCN gene symbol and the mutation as input. Mutalyzer 2 has new functionality lacking in the commercial Alamut tool used by many DNA diagnostic labs. Mutalyzer generates an output containing a description of the sequence variant at DNA level, the effect on all annotated transcripts, its deduced outcome at protein level and gains or losses of restriction enzyme recognition sites. Mutalyzer facilitates batch-wise conversion from dbSNP rsIDs or chromosomal position numbering used in next generation sequencing data to transcript position numbering, as well as checking of sequence variants in locus-specific mutation databases (LSDBs). Mutalyzer is also used to quickly check new variant submissions in LSDBs based on LOVD software (www.LOVD.nl/). The new Name Generator can be used to train yourself to generate correct HGVS descriptions. New web services also facilitate the use of Mutalyzer’s functionality from other computer programs.

Funded in part by the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement nº 200754 - the GEN2PHEN project.

Poster A36

Penalized regression elucidates aberration hotspots mediating subtype-specific transcriptional responses in breast cancer

Yinyin Yuan Cambridge Research Institute

Yinyin Yuan (Cancer Research UK, Cambridge Research Institute); Oscar Rueda (Cancer Research UK, Cambridge Research Institute); Christina Curtis (Cancer Research UK, Cambridge Research Institute); Florian Markowetz (Cancer Research UK, Cambridge Research Institute);

Short Abstract: This poster is based on Proceedings Submission 91.
Copy number alterations (CNAs) associated with cancer are known to contribute to genomic instability and gene deregulation.
Integrating copy-number data with gene expression helps to elucidate the mechanisms by which CNAs act and to identify the transcriptional
down-stream targets of copy-number changes. Such analyses can help to sort functional driver events from the many accompanying passenger alterations. However, the way CNAs affect gene expression can vary between different cellular contexts, for example between healthy tissue and tumour or between different subtypes of the same cancer.
Thus it is important to develop computational approaches capable of inferring differential connectivity of regulatory networks in different cellular contexts.
We propose a statistical deregulation model that integrates
copy-number and expression data of different disease subtypes to jointly model common and differential regulatory relationships. Our model
not only identifies copy-number alterations driving gene expression changes, but at the same time also predicts differences in regulation that
distinguish one cancer subtype from the other. We implement our model in a penalized regression framework and demonstrate in a simulation study the feasibility and accuracy of our approach. On a real breast
cancer dataset we show that we can identify both known and novel aspects of pathway deregulation in ER positive versus negative disease as well as crosstalk between pathways.
Availability: The Bioconductor-compliant R package DANCE is available
from www.markowetzlab.org/software/

Poster A37

AllergenNet: a database applied to the study of allergenic proteins

Helen Arcuri University of Sao Paulo

Jorge Kalil (University of Sao Paulo, Division of Clinical Immunology and Allergy); Mario Palma (Sao Paulo State University, CEIS - Department Biology); Fabio Castro (University of Sao Paulo, Division of Clinical Immunology and Allergy);

Short Abstract: The AllergenNet is a relational database applied to the study of allergenic proteins available free on the Web. Extensive information is recorded for each allergen, including a description of the allergen (source, symptom, etc.), biochemical properties, clinical data, information about IgE epitopes, primary sequence, structure and functional studies, and literature references. The source of allergens is the IUIS list (http://www.allergen.org), several files are available for downloading and link to the references databases (PDB, UniprotKB, GenBank, NEWT, PROSITE, PFAM). The current database still is in its version pilot; the data is updated regularly with the addition of new allergen or information, and will be completed in June 2011 with about 686 allergens with their respective iso allergens. For the allergenic proteins that hadn´t structure resolved experimentally, we used the methodology of molecular modeling by homology in large scale through the MODELLER program to construct a 3D model of these allergenic proteins. The BLASTP program was used to execute the alignments and all 3D models were subjected to molecular dynamics using the Gromacs program. The overall stereochemical quality of the 3D models were assessed by the program PROCHECK, G-factor from CCP4 package and RMSD from ideal geometry from XPLOR program. The 3D model can be viewed visualization tools with Jmol. All optimization process was performed on a Beowulf cluster with 16 nodes and using CUDA GPU technology from Nvidia when possible. The plans for further development include adding computational tools for the study and characterization of new allergenic proteins.

Poster A38

Integrative analysis of expression and copy number profiles to find causal genes in cancer

Celia Fontanillo Consejo Superior de Investigaciones Cientificas (CSIC)

Maria Ortiz-Estevez (CEIT and TECNUN (University of Navarra), Department of Electronics and Communication); Angel Rubio (CEIT and TECNUN (University of Navarra), Department of Electronics and Communication); Javier De Las Rivas (Consejo Superior de Investigaciones Cientificas (CSIC), Bioinformatics and Functional Genomics Group);

Short Abstract: Traditionally, DNA copy number aberrations and gene expression profiles have been used to find and study potential target genes in complex diseases such as cancer. Recent research reports have shown combination of these two types of data using different strategies, but focused mainly on finding gene-based relationships. By contrast, we undertake an integrative analysis of expression and copy number profiles based on the location along the whole genome. Our approach takes into account the genomic loci not only for the analysis of copy number, but also for the gene expression. To achieve this we apply segmentation algorithms in a similar way to both types of data. The method searches for optimal correlation between segmented chromosomal copy number regions that present a significant copy number aberration (CNA) (gain/loss) and segmented gene expression with significant regulation (up/down). In this way, it determines which CNAs have a strong influence over expression patterns. We propose that these regions constitute a key starting point to find causal genes that drive the alteration of other genes with similar expression profile. To test our hypothesis we have applied the method to a data set of genomic alterations and gene expression profiles of 158 Glioblastoma Multiforme (GBM) samples from patients. We use a reference set of known genes in GBM to test the approach and we analyse the functional associations among the found causal genes and the pathological features reported for this disease.

Poster A39

GUILDify: A web server for phenotypic characterization of genes by means of protein-protein interactions and graph algorithms

Emre Guney Pompeu Fabra University

Baldo Oliva (Pompeu Fabra University, Experimental and Health Sciences); Javier García-García (Pompeu Fabra University, Experimental and Health Sciences);

Short Abstract: Our knowledge on the sequential variations in Human DNA has expanded substantially throughout the past few decades where genome sequencing has been increasingly available for lower costs. Although these sequential variations boost genetic diversity, some of them bears severe consequences on the human health. Recently, genome wide association studies proved their use on identifying a handful of genes involved in the pathologies of various diseases (such as Breast Cancer and Alzheimer Disease). Nevertheless, determining such genetic factors is hindered by the complex nature of these disorders that involve products of multiple genes acting cooperatively. Following the emergence of high throughput interaction detection experiments, approaches based on protein interactions have recently been developed to prioritize candidate disease-genes. These methods rely on the proximity of a gene in the interaction network to other known disease-genes. GUILD (Genes Underlying Inheritance Linked Disorders) is a novel framework that implements 5 different algorithms (NetScore, NetZcore, NetShort, Functional Flow and PageRank with Priors) to prioritize genes potentially involved in diseases using a priori gene-disease associations and protein-protein interactions. Considering the lack of convenient interfaces that bridge many of these algorithms to end users, we present GUILDify, an easy to use web server that assigns genes likelihood scores of involvement for a given keyword (e.g. disease phenotype, functional annotation or in broader terms any phenotypic association) using integrated data from publicly available major biological data repositories. Using GUILDify we investigate genes involved in neurodegenerative disorders and identify genetic determinants shared in the pathways of these disorders.

Poster A40

Connection patterns in protein-protein interaction network reveal the phenotypic relationship among human diseases

Young-Eun Shin Pohang University of Science and Technology

Solip Park (Pohang University of Science and Technology, School of Interdisciplinary Bioscience and Bioengineering); Jae-Seong Yang (Pohang University of Science and Technology, School of Interdisciplinary Bioscience and Bioengineering); Jinho Kim (Pohang University of Science and Technology, Division of Molecular and Life Sciences); Sanguk Kim (Pohang University of Science and Technology, Division of IT Convergence Engineering);

Short Abstract: Human diseases are often associated with each other in a similar phenotype or co-occur in the same individual as a comorbid disease. Molecular connections of human diseases have been under intense investigation and found to be related as shared genes, linked by protein-protein interactions (PPIs), co-expression, or co-localization of disease-associated proteins. These connections construct disease modules composed of phenotypically similar or same disease-associated proteins. Thus, the network connections of disease module are crucial to establish genotypic to phenotypic relationship of human diseases. Here, we investigated the connection patterns occurring in the network of disease-associated protein and found that the weight of molecular connection explained the phenotypic relationships of disease-associated protein pairs. Furthermore, connection patterns represent characteristics of each disease module. For example, disease-associated protein pairs in dermatological disease class tend to have tight connections, whereas those pairs of metabolic disease class often linked by loose connections. As a consequence, dermatological diseases tend to have high comorbid relationships, while metabolic diseases show relatively low comorbid relationships within the disease module. Thus, connection patterns in the network provide insights into phenotypic relationships among disease modules, which is important for the understanding of disease progression, diagnosis and prevention.

Poster A41

Bioinformatics for large scale characterization and tracking of gene-modified hematopoietic stem cells

Uwe Appelt National Center for Tumor Diseases and German Cancer Research Center

Manfred Schmidt (National Center for Tumor Diseases and German Cancer Research Center) Anne Arens (National Center for Tumor Diseases and German Cancer Research Center, Department of Translational Oncology); Cynthia Bartholomae (National Center for Tumor Diseases and German Cancer Research Center, Department of Translational Oncology); Frank Giordano (National Center for Tumor Diseases and German Cancer Research Center, Department of Translational Oncology); Richard Gabriel (National Center for Tumor Diseases and German Cancer Research Center, Department of Translational Oncology); Derek Gustafson (National Center for Tumor Diseases and German Cancer Research Center, Department of Translational Oncology); Annette Deichmann (National Center for Tumor Diseases and German Cancer Research Center, Department of Translational Oncology); Hanno Glimm (National Center for Tumor Diseases and German Cancer Research Center, Department of Translational Oncology); Stephanie Laufs (National Center for Tumor Diseases and German Cancer Research Center, Department of Translational Oncology); Christof von Kalle (National Center for Tumor Diseases and German Cancer Research Center, Department of Translational Oncology);

Short Abstract: An emerging option in the treatment of monogenetic immune diseases is the use of integrating viral vectors to transfer “healthy” genes into hematopoietic stem cells. The resulting gene-corrected cell populations are able to correct CGD, Thalassemia, SCIDs, and WAS. In this context, clonal tracking of viral integration sites (IS) represents a convincing approach to assess vector biosafety and to monitor the fate of individual gene-corrected cells. Today, massive amounts of genome-vector junctions can be enriched and sequenced in a high-throughput fashion by LM-/LAM-PCR and Roche's GS FLX Titanium technology. The resulting sequences have to be processed in a way, that i) PCR- and vector-specific sequences are removed, ii) the remainders are aligned to a target genome and iii) the obtained IS are characterized by identification of surrounding genomic features. We provide web-based access to two of our strategies for comprehensive bioinformatical analyses of NGS derived IS and the corresponding clonality aspects.
QuickMap (http://www.gtsg.org) identifies IS with BLAT, then employs Ensembl annotations to very rapidly process even huge numbers of input sequences to finally derive insertion site profiles. To reveal insertion site preferences resulting profiles are then statistically compared to those derived from a random set consisting of 106 in-silico IS. HISAP (http://hisap.nct-heidelberg.de/HISAP) incorporates BLAT to locate the IS and annotations obtained from the UCSC genome browser. It may process numerous samples in parallel by condensing the size of the resulting table to one entry per IS. This is archived by clustering sequences and IS at different stages.

Poster A42

Detecting Intrahost Emergence of Recombinant Viruses within HIV-1 infections using Ultra-Deep Sequencing

Felix Feyertag University of Manchester

John Archer (University of Manchester, Faculty of Life Sciences); Andrew Rambaut (University of Edinburgh, Institute of Evolutionary Biology); David L. Robertson (University of Manchester, Faculty of Life Sciences);

Short Abstract: Sequencing platforms, such as those developed by 454 Life Sciences and Illumina, have opened up unprecedented opportunities to investigate viral diversity and evolution within hosts. Of particular interest is the detection of intrapatient viral recombination within temporally sampled data subjected to differing selection pressures. Such recombination may contribute an important role in changes in viral phenotype, which ultimately determines viral fitness within particular selective environments, e.g., in the presence of drugs. Using 454 data, amplified from the gp120 region of the HIV-1 genome, we previously characterized HIV-1 populations sampled over five time points from two individuals that had undergone therapy with a CCR5 antagonist (PLoS Comput Biol 6(12): e1001022). In these infections, low frequency CXCR4-using variants (2.5-15%) present prior to therapy resulted in the emergence of a distinct drug-resistant lineage. Here we investigate the extent to which this lineage is maintained outside of the V3 region and the role that recombination plays in its emergence. Our analysis framework, Segminator, has been extended to facilitate the detection of intrapatient recombinants through the implementation of a dynamic programming algorithm based on Recco. Results suggest that polymorphisms defining the lineage extend either side of the V3 region and that recombination events are detectable. In conclusion, we present a framework for the identification of recombination breakpoints within next generation data and for the mapping of these breakpoints onto lineages. This will have implications for the understanding of the intrahost evolution and the emergence of drug resistance.

Poster A44

Large-scale protein flexibility analysis of single nucleotide polymorphisms, using molecular dynamics simulations.

Marc Offman Technical University of Munich

Burkhard Rost (Technical University of Munich, Fakultät für Informatik, I12, Chair of Prof. Rost);

Short Abstract: Proteins are intrinsically flexible molecules, thus function is often associated to flexibility. Experimental methods to determine protein flexibility are expensive and often time consuming. Over the past few years an efficient complementing method, molecular dynamics simulations, more and more proved to be a powerful tool to yield information on protein dynamics. We have recently proven that careful biology-driven MD simulations can be used to predict the impact of single amino acid mutations on protein flexibility and function, at a level of accuracy comparable to experimental techniques. The question remains whether it will be possible to fully automate this process in the context of a large-scale analysis, and to what extent additional structural information, beyond that derived by sequence analysis of single nucleotide polymorphisms (SNPs) only, is useful. For this we created several comprehensive datasets of non-synonymous SNPs mapped to high-resolution and above average quality crystal structures from the PDB. In the context of the European SCALALIFE project up to 28,000 different mutations found in 1,600 individual crystal structures are simulated in duplicates for 10 ns each, using the GROMACS package. A comprehensive analysis pipeline has been established, investigating protein flexibility and stability, alteration of hydrogen-bond networks, active site integrity, changes in global and local energy and other structural effects. This pipeline has previously been successfully applied in the context of clinically relevant proteins. The results of this study, the automatic protocol and set of analysis tools will help in the future to understand individual phenotypes in clinical contexts.

Poster A45

Copy Number Variant for Coronary Artery Disease and Hyperlipidemia

Tienhsiung Ku Changhua Christian Hospital

Wei-Chung Shia (Feng Chia University, Department of computer science); FangRong Hsu (Feng Chia University, Department of computer science); Yung-Ming Chang (Changhua Christian Hospital, Department of cardiology); Chien-Hsun Hsia (Changhua Christian Hospital, Department of cardiology); Chin-Hui Hung (Changhua Christian Hospital, Department of cardiology); Yeh-Ching Chung (National Tsing Hua University , Department of computer science); Chuan Yi Tang (PROVIDENCE UNIVERSITY, Department of computer science);

Short Abstract: Copy number variant (CNV) has been found to be relevant to several diseases, like schizophrenia, . Coronary artery disease is one of the major causes of mortality and was thought to be multi-factors origin. Hyperlipidemia has been related to coronary artery disease and was also thought to be a genetic disorder. Several researches for genetic factors for coronary artery disease and hyperlipidemia have been done, but there is no study that focus on patients that suffered from both coronary artery disease and hyperlipidemia. We try to find the copy number variant for coronary artery disease and hyperlipidemia, and hope to elucidate the mechanism of this specific type of disease.

With the permit of IRB and patients’ informed consent, genomic studies of Affymetrix Human Genome 6.0 SNP array were conducted on 31 patients with coronary artery disease confirmed by coronary artery catheterization. Those patients have history of hyperlipidemia also. Common copy number variant regions were screened from Database of Genomic Variants (DGV). Array data from HAPMAP phase II (CHB+JPT) were used as control group. Some significant CNV regions were validated with ABI qPCR CNV assay. Statistics were checked using Fisher exact test and significance is considered as P-value < 0.05.
We found disease-specific CNVRs in 5 genes and 9 exons. Duplication polymorphisms are on chromosome 1,10,21. Deletion polymorphism are on chromosome 6,12. We hope to further search the relationship between disease and CNV in the future.

Poster A46

Structural comparison of Erwinase and E. coli L-Asparaginase to facilitate rational engineering of a cancer drug

Mainá Bitar Technische Universität München

Edda Kloppmann (Technische Universität München, Department for Bioinformatics and Computational Biology); Marc Offman (Technische Universität München, Department for Bioinformatics and Computational Biology); Burkhard Rost (Technische Universität München, Department for Bioinformatics and Computational Biology);

Short Abstract: L-asparaginase (L-ASNase) is an enzyme with antileukemic properties that has been widely employed in the treatment of acute lymphoblastic leukemia (ALL) since the 1970s. It hydrolyzes L-asparagine to L-aspartate and ammonia. Malignant lymphoblasts have low levels of asparagine synthetase and therefore undergo apoptosis once exogenous asparagine is depleted from the blood. Three different L-ASNase are clinically available, one Erwinia chrysanthemi (Era), one native and one pegylated Escherichia coli (Eca) enzyme. Although treatment is effective, some patients fail to maintain therapeutic levels of the Eca drug due to silent inactivation. Furthermore, the bacterial origin of the protein can lead to hypersensitivity. Previously, Eca mutant proteins with the potential to reduce antigenicity and toxicity have been successfully generated. Since Era has decreased antigenicity, it is a good candidate for further protein engineering. This work aims to assess similarities and differences between Eca and Era that can assist the engineering of a protein with increased activity and decreased toxicity. Two additional targets for protein engineering are glutaminase activity related residues, responsible for neurotoxic side effects, and features involved in protein half-life. Molecular dynamics simulations are performed and coupled with consensus modes analysis of protein structures. In addition, a previously developed molecular engineering protocol is employed. This is a step forward to address the molecular mechanisms of L-ASNase protein dynamics and function, elucidating the basis of specific side effects arising from its use in ALL treatment.

Poster A47

Inhaled Toxicant Interactions with Transcriptional Regulatory Networks in Frontline Lung Cells

George Acquaah-Mensah Massachusetts College of Pharmacy and Health Sciences

Short Abstract: Incidences of lung diseases have been ascendant in recent times. Lung epithelial cells are frontline cells for inhaled toxicants. Thus an examination of transcriptional regulatory relationships therein could provide insights regarding the perturbations introduced by toxicants. In this study, transcriptional regulatory networks were reverse-engineered from lung epithelium microarrays GDS534, GDS999, GDS2604, and GDS2486 obtained from the Gene Expression Omnibus repository. Algorithms employing mutual information, including the Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE) and Context Likelihood of Relatedness (CLR), were used. The networks were subsequently merged with information mined from the Comparative Toxicogenomics Database, a manually curated resource that highlights established interactions between chemicals, genes, and diseases. Furthermore data from AcTOR, the United States Environmental Protection Agency’s online chemical toxicity warehouse was used to superimpose publicly available toxicity data. Interactions involving inhaled toxicants and highly connected nodes within the transcriptional regulatory networks (e.g. CDKN2A, NME1, NPM1 , TBX5, TGFB1, TP53, etc…) indicated specific signaling pathways are impacted by contact with these toxicants. Asbestos, an inhaled toxicant, increases the methylation of the CDKN2A promoter, increases the expression and activity of TGFB1 and TP53, and alters the expression of TBX5. Another inhaled toxicant, arsenic, alters methylation of the CDKN2A and TP53 promoters, increases the expressions of CDKN2A, NME1 and NPM1 while decreasing the expressions of TGFB1 and TP53. Toxicant effects on these regulatory networks elucidate their associations not only with lung cancer but also with other lung diseases such as Chronic Obstructive Pulmonary Disease.

Poster A48

Multi-Harmony: detecting functional specificity in HIV differential replication rates

K. Anton Feenstra IBIVU/Free University Amsterdam

Esther F. Gijsbers (Amsterdam Medical Center (AMC), Department of Experimental Immunology); Ad C. van Nuenen (Amsterdam Medical Center (AMC), Department of Experimental Immunology); Neeltje A. Kootstra (Amsterdam Medical Center (AMC), Department of Experimental Immunology); Jaap Heringa (IBIVU/Free University Amsterdam, Computer Science);

Short Abstract: Motivation: Immunotype dependant 'escape' mutations and subsequent compensatory mutations
account for a large part of the variability in disease progression of HIV-infected patients. We
attempt to track the appearance of these mutations in the course of disease progression by
comparing HIV capsid seqeuences between progressing and non-progressing (HLA-B57
immunotype) patients. A small number of amino acids generally determine this functional diversity.
The identification of these residues can aid the understanding of HIV-related disease progression
and help finding target sites for experimental analysis.
Materials and Methods: Using our multi-Harmony method, we compared B57 with non-B57 HIV
capsid protein sequences, and early-stage with late-stage B57 patients without and those with
disease progression. Of the 14 sites thus identified as related to HLA B57-specific disease
progression, seven were selected for in-vitro testing. Viral replication was measured over 18 days in
PBMC and U87 cell-lines, in addition competition assays were performed for selected pairs of
viruses in PBMC.
Results: The general trend in the replication rates is that the virus containing the mutations that are
found in all B57 patients, has the lowest replication rates. This is consistent with escape mutations,
which are a trade-off between evading immune pressure and maintaining viability; a few of these
sites reside in known B57 epitopes. Additional mutations, which are specific to the B57 sequences
as a group, infer improved replication rates, consistent with subsequent compensatory mutations.
Conclusion: We show that, using the multi-Harmony specificity detection tool, we can identify
known HLA-B57 epitopes as well as previously unknown sites in the HIV capsid protein that, when
mutated, significantly influence in-vitro replication rates.

Poster A49

Data Based Identification of Estimation Models for Tumor Markers Using Evolutionary Computation

Stephan Winkler Upper Austria University of Applied Sciences

Michael Affenzeller (Upper Austria University of Applied Sciences, Heuristic and Evolutionary Algorithms Laboratory); Gabriel Kronberger (Upper Austria University of Applied Sciences, Heuristic and Evolutionary Algorithms Laboratory); Michael Kommenda (Upper Austria University of Applied Sciences, Heuristic and Evolutionary Algorithms Laboratory); Stefan Wagner (Upper Austria University of Applied Sciences, Heuristic and Evolutionary Algorithms Laboratory); Witold Jacak (Upper Austria University of Applied Sciences, Software Engineering); Herbert Stekel (General Hospital Linz, Central Laboratory);

Short Abstract: As measuring tumor markers is often expensive, we have applied several data based approaches for identifying mathematical models for estimating selected tumor markers on the basis of routinely available blood values; in detail, estimators for the tumor markers AFP, CA-125, CA15-3, CEA, CYFRA, and PSA have been identified. The documented tumor marker values are classified as “normal” or “elevated” based on limits known from the literature. Our goal has been to design data based binary classifiers, i.e., to use machine learning for creating mathematical models that classify blood measurement samples correctly (as “normal” or “elevated” with respect to the selected tumor marker) using only standard blood parameters as input.
Blood data measured for more than 20,000 patients in the years 2005 - 2008 have been compiled in a database storing basic information, standard blood parameters, and tumor marker values. For each tumor marker we have preprocessed the data, selected samples with available tumor marker target values and thus compiled 6 independent data sets (consisting of around 400 to 5000 samples) with differing rates of samples representing class “normal” (48.39% - 77.89%).
The following test accuracies could be achieved using several machine learning methods (linear regression, k-nearest-neighbor, artificial neural networks, support vector machines, genetic programming): AFP 85.48%, CA 125 68.10%, CA 15-3 73.02%, CEA 68.28%, CYFRA 73.03%, PSA 67.46%.
For executing machine learning tests and optimizing variable selections and modeling parameters we have used genetic algorithms with strict offspring selection implemented in HeuristicLab, an open source framework for heuristic optimization (http://dev.heuristiclab.com/).

Poster A50

Models for reverse engineering of transcriptional networks in Yersinia enterocolitica infection

Olivia Prazeres da Costa Technical University of Munich

Short Abstract: Understanding the molecular mechanisms of a transcriptional network is a process under research and requires the construction of detailed circuit diagrams. Reverse engineering of a transcriptional network aims at revealing the underlying structure of a biological system by reasoning backwards from microarray data. Reverse engineering of networks identifies transcriptional interactions between gene products and has successfully been applied to knock-out and time series experiments (He et al., 2009). Here, we calculate an interaction score of bacterial infections and stimulations as single conditions and the combination of two conditions for each gene, to investigate the interaction state of the conditions. Two conditions can act either independently or cooperatively on the expression of genes.
This method defines for interacting conditions A and B three possible states of the genes: 1) the activation from A alone (termed as A component); 2) the activation from B alone (termed as B component); 3) the activation that results from the interaction between A and B (termed as A+B component). We calculate the component values for each gene by linear regression using the expression components measured in each microarray. The two conditions A and B act fully cooperatively, independently or partially cooperatively to regulate gene expression.
Thus, this method will provide new insights in the transcriptional response of the host system of bacterial infection by identification of statistical dependencies between potential functional associations. The application of this analysis corroborates in silico results and hence guarantees providing appropriate candidate genes for further experimental characterization.

Poster A51

A framework for whole-exome deep sequening data management and analysis

Angelo Nuzzo University of Pavia

Ivan Limongelli (University of Pavia, Computer Engineering and System Science); Annalisa Vetro (University of Pavia, Biology and Medical Genetics); Roberto Ciccone (University of Pavia, Biology and Medical Genetics); Orsetta Zuffardi (University of Pavia, Biology and Medical Genetics); Riccardo Bellazzi (University of Pavia, Computer Engineering and Systems Science);

Short Abstract: Using next generation sequencing (NGS) technologies for targeted sequencing of all protein-coding regions (“exomes”) has been proved to be able to reduce costs while enriching for discovery of highly penetrant non-synonymous variants. The overwhelming amount of data produced by each experiment need a suitable IT infrastructure for effective data management and analysis. We are developing a comprehensive framework to collect whole-exome experimental data and implementing an effective way of managing, exploring and analyzing data all over the main steps of such studies, which typically consist of: i) extraction of sequence read for raw data, ii) reads alignment and variant (SNPs/indels) calling, iii) variants annotation exploiting public available knowledge bases and iv) performing results evaluation according to the specific experiment. We already developed and validated the automated pipelines and modules needed to accomplish the first three tasks of this workflow. We are currently designing and developing a database infrastructure to collect data from several different experiments. This module allows the user to both explore data across individuals within an experiment, in order to identify case/control discriminant variants, as well as merging cross-experimental data, in order to exploit results of a previous experiment as an additional control (or independent) dataset for newest studies. Web-browsing interface is provided for an easier usability. The availability of such an integrated framework is crucial for a better usability of the high computational demanding software required by NGS technologies and effectively support researcher in focusing their reasoning on biological interpretation of sequencing results.

Poster A52

Genome-Wide Association Study of TB in the South African Colored Population: Comprehensive Gene and pathway-based Association Study

Emile RUGAMIKA CHIMUSA University of Cape Town

Eileen Hoal (Stellenbosch University, Molecular Biology and Human Genetics,DST/NRF Centre of Excellence for Biomedical TB Research, Faculty of Health Sciences); Marlo Möller (Stellenbosch University, Molecular Biology and Human Genetics, MRC Centre for Molecular and Cellular Biology, DST/NRF Centre of Excellence for Biomedical TB Research, Faculty of Health Sciences); Nicola Mulder (University of Cape Town, Clinical Laboratory Sciences, Computational Biology Group);

Short Abstract: The Coloured population in South Africa is uniquely admixed of ancestry from various populations include African, European and Malaysian descent, however this mixed population has the highest incidence of tuberculosis (TB) in sub-Saharan. Understanding the genetic basis of the TB susceptibility, is critical for informing the development of novel interventions. Because of the complex nature of the immune system and the polygenic nature of TB, incorporating both the association signal from Genome-wide Association Study (GWAS) and the available human protein-protein interaction information for testing the combined effects of SNPs and searching for significantly enriched sub-networks for such a complex disease provides increased evidence to elucidate the genetic susceptibility. We conducted the first SNP-based, comprehensive gene and pathway-based Genome-wide association study for TB risk on 692 cases and 91 controls of the South African Coloured population. Through this new paradigm for GWAS, we found evidence of significant association of the (CYP2C9) cytochrome P450, family 2, sub-family C (10q23.33, p=9.0018e-09) and (DGKH) diacylglycerol kinase eta (13q14.11, 2.7852e-08) genes with susceptibility to TB. Our comprehensive gene and pathway-based association analysis revealed the evidence of 52 highly significant TB-related candidate genes and 11 TB-related canonical pathways. These results demonstrate that the combination of SNP-based, Gene-based and pathway-based association in complex diseases such as TB provides increased evidence for identifying novel genes in which each single SNP conferred small disease risk. Our results demonstrated evidence of convergence of the genetic signals to novel sub-networks of the human interactome, enriched with interesting TB related biological pathways.

Poster A53

International Cancer Genome Consortium Data Portal: one-stop shop for cancer genomic data

Arek Kasprzyk Ontario Institute of Cancer Reseach

Junjun Zhang (Ontario Institute for Cancer Research, Informatics and Bio-computing); Joachim Baran (Ontario Institute for Cancer Research, Informatics and Bio-computing); Anthony Cros (Ontario Institute for Cancer Research, Informatics and Bio-computing); Jonathan Guberman (Ontario Institute for Cancer Research, Informatics and Bio-computing); Syed Haider (University of Cambridge, Computer Laboratory); Jack Hsu (Ontario Institute for Cancer Research, Informatics and Bio-computing); Yong Liang (Ontario Institute for Cancer Research, Informatics and Bio-computing); Elena Rivkin (Ontario Institute for Cancer Research, Informatics and Bio-computing); Jianxin Wang (Ontario Institute for Cancer Research, Informatics and Bio-computing); Long Yao (Ontario Institute for Cancer Research, Informatics and Bio-computing);

Short Abstract: The International Cancer Genome Consortium (ICGC) Data Portal (http://dcc.icgc.org) provides access to genomic, transcriptomic, epigenomic, and clinical data generated by the major cancer sequencing projects including the ICGC, The Cancer Genome Atlas (TCGA), Tumor Sequencing Project (TSP), and Johns Hopkins University.
The numerous ICGC centers around the globe generate their data independently and deposit it into their local databases. These databases are federated using the BioMart technology (http://www.biomart.org) and made available to the public through the web-based ICGC Data Portal. To facilitate integrative functional analyses, experimental data are presented in the context of diverse annotations federated from a variety of public resources including Ensembl, KEGG, Reactome, COSMIC, Breast Cancer Expression Database, and the Pancreatic Expression Database.
Users can query the data by genes, samples, mutations, expression, or methylation. In all cases, users can refine their searches by selecting from diverse query conditions (filters) and output options (attributes), thereby retrieving data based on different molecular and clinical covariates. Data can also be accessed programmatically using several methods, including a Java, REST, and SOAP APIs. Available tools allow users to view the most commonly affected genes or pathways, view key findings about selected genes, or compare mutation patterns between patients with different clinicopathologic characteristics.
The portal allows users to conveniently retrieve, analyze, and compare cancer data from different sources through a single point of access. As more ICGC data are added and portal's functionality is further enhanced, it is expected that it will become an increasingly important resource for the scientific community.

Poster A54

Hunting cancer predisposition genes by whole exome sequencing in relatives from affected family

Tatiana Popova Institute Curie

Virginie Jaquemin (Institute Curie, Centre de Recherche, INSERM U830); Severine Lair (Institute Curie, Centre de Recherche, INSERM U900); Claire Jubin (Institute Curie, Centre de Recherche); Romain Daveau (Institute Curie, Centre de Recherche, INSERM U830); Emmanuel Barillot (Institute Curie, Centre de Recherche, INSERM U900); Alain Nicolas (Institute Curie, Centre de Recherche, CNRS, UMR3244); Dominique Stoppa-Lyonnet (Institute Curie, Department of Tumor Biology, INSERM U830); Marc-Henri Stern (Institute Curie, Centre de Recherche);

Short Abstract: Families with high incidence of cancer, in which no mutations are found in the known cancer predisposition genes are good objects to look for new cancer related genes. Next generation sequencing gives now unique opportunity to scan all exons, while availability of rather valuable description of normal variation (dbSNP, 1000 genomes) provides the basis for filtering common variants. General strategy of the study consists in extracting shared non-synonymous variation of exons in several relatives with similar cancer phenotype and narrowing localization of possible causative gene variants to the genomic regions (1) identical by descent (IBD) and (2) with loss of heterozygosity in the tumor genomes. This strategy follows the classical appearance of a tumor suppressor gene. Bioinformatics’ challenges accompany each stage in the data analysis workflow. We present a case study with a special emphasis on the possible problems and limitations arisen in each step.
The family with multiple incidence of cancer of various localizations was considered. The whole exome sequencing (Solid, paired-end, v.4) was performed on two affected first degree cousins; constitutional Affimetrix SNP 6.0 data was also available. The data analysis workflow includes alignment on the whole genome; exon coverage estimation; calibration of SNP call and locus coverage to reproduce IBD regions detected previously by SNP arrays; small Indel calling; scanning exonic multi-alignment regions for false positive SNP and Indel calls; analysis of shared rare variants present in non IBD regions; classification of SNPs according to predicted deleterious effects, validation of some SNPs by Sanger sequencing.

Poster A55

deepBlockAlign: aligning profiles from short blocks of reads

Sachin Pundhir University of Copenhagen

Jan Gorodkin (University of Copenhagen) David Langenberger (University of Leipzig, Department of Computer Science, and Interdisciplinary Center of Bioinformatics); Claus Ekstrom (University of Copenhagen, Center for applied Bioinformatics); Peter F. Stadler (University of Leipzig, Department of Computer Science, and Interdisciplinary Center of Bioinformatics); Steve Hoffmann (University of Leipzig, Department of Computer Science, and Interdisciplinary Center of Bioinformatics);

Short Abstract: The emergence of high-throughput sequencing methods allows to quickly and
cost-effectively sequence whole transcriptomes. It has previously been
shown that short RNA sequencing data helps to identify new coding and
non-coding RNAs (Jung et al. 2010). RNAs often undergo post transcriptional
processes such as maturation and degradation leaving specific short RNA sequence
fragments. When these fragments are mapped to the reference sequence they
form distinct patterns. Well known patterns include the miR-miR*
pattern in miRNAs and are indicative of the type of RNA. Here we present
an algorithm, deepBlockAlign for alignment of RNA seq read patterns to
quickly find classes of similarly processed RNAs. As a primary input
deepBlockAlign takes closely spaced (<=30 nt) blocks of reads mapped to
the reference genome (read pattern). The algorithm work in two steps, first
computing the optimal alignment score between all read blocks, then constructing an optimal alignment of two read patterns based on alignment score and distance between the read blocks. On hierarchical clustering based on alignment scores, distinct clusters for miRNA and tRNA were observed along with ~30 tRNAs clustering
together with miRNA. Most of these putative dicer processed tRNAs have read blocks marked by precise start position of reads and 4 of the tRNA have also been reported in literature to be processed by dicer.

Poster A56

Integrating network co-expression analysis and gene expression in the analysis of coronary artery disease GWAS: application to CARDIoGRAM data

Seraya Maouche University of Luebeck

Xia Yang (Sage Bionetworks, Seattle, Washington, USA, Bionetworks); CARDIoGRAM consortium (University of Luebeck, Cardiovacsular); Thomas Quertermous (Department of Medicine, Stanford University School of Medicine, Stanford, California, Cardiovacsular Medecine); Nilesh J Samani (Department of Health Sciences, University of Leicester, Leicester, United Kingdom;, Health Sciences); Themistocles L. Assimes (Department of Medicine, Stanford University School of Medicine, Stanford, California, Health Sciences); Jeanette Erdmann (University of Luebeck, Department of Medicine Clinic II); Patrick Diemert (University of Luebeck, Department of Medicine Clinic II); Heribert Schunkert (University of Luebeck, Department of Medicine Clinic II);

Short Abstract: Genome-wide association studies have been very successful in identifying risk variant loci for complex diseases, including coronary artery disease (CAD) and myocardial infarction (MI). Recently, to increase power and identifying additional CAD/MI risk loci, we founded the CARDIoGRAM (Coronary ARtery DIsease Genome-Wide Replication and Meta-Analysis) consortium and meta-analyzed 14 GWAS datasets including > 22,000 cases and > 64,000 controls. We identified 13 new susceptibility loci. However, the mechanisms by which these loci lead to CAD/MI remain elusive. To provide insights into the mechanisms driving the disease, we propose a system genetics approach integrating biological knowledge, network co-expression analysis of gene expression profiles of monocyte from patients following myocardial infarction, into the analysis of CARDIoGRAM CAD/MI GWAS data.

Poster A57

A Novel Algorithm for Identifying Non-canonical Splicing Regions Using RNA-Seq Data

Yongsheng Bai University of Michigan

Zach Wright (University of Michigan, Center for Computational Medicine and Bioinformatics); Justin Hassler (University of Michigan, Biological Chemistry); Randal Kaufman (University of Michigan, Biological Chemistry); Maureen Sartor (University of Michigan, Center for Computational Medicine and Bioinformatics); Jim Cavalcoli (University of Michigan, Center for Computational Medicine and Bioinformatics);

Short Abstract: During ER (endoplasmic reticulum) stress, the RNase Ire1? is known to splice out a short 26nt region from the mRNA of the transcription factor Xbp1, causing an open reading frame-shift that leads to the alteration of many downstream genes in reaction to ER stress. With the aim of identifying additional Ire1 targets, we developed an algorithm/pipeline to identify non-canonical splicing regions using RNA-Seq data from ER stress induced (using 2 alternative ER stress inducing treatments) Ire1? heterozygote (Het) and knockout (KO) mouse embryonic fibroblast cell lines.
Our algorithm first creates the splice junction file using the UCSC mouse “known genes” model and ERANGE custom Python scripts. The expanded genome is then built and aligned for original read sequences generated from Illumina GAII pipeline by Bowtie. The original reads deposited in Bowtie’s unmapped read data set are split and realigned to the same expanded genome. Our algorithm only chooses split read halves if they are mapped onto the same gene structure and/or within a certain distance and in the same strand direction. Finally, all of the selected read halves for each single read are consolidated into a unique “splice region”. Our algorithm additionally reports whether the detected splice site falls into a known junction region or it belongs to a novel candidate.
Xbp1’s 26 bp non-conventional splice site was detected in heterozygote samples from both treatment cases but not in the negative control Ire1? knock-out samples. We believe our algorithm/pipeline can detect novel splicing sites in RNA-seq data generated under similar conditions.

Poster A59

Personalized Medicine for Cancer

Miguel García Spanish National Cancer Research Centre

Victor de la Torre (Spanish National Cancer Research Centre, Structural biology and biocomputing); Alfonso Valencia (Spanish National Cancer Research Centre, Structural biology and biocomputing);

Short Abstract: We developed a cancer focused Personalized Medicine Web application. The goals are: i) prioritize somatic mutations according to their role in tumor onset and progression, and ii) provide a list of candidate drugs for the patient treatment. Rather than relying only on genotyping experiments to predict patient’s drugs response and disease risks, the application goes one step further by taking advantage of NGS technology.

In the first prototype, for each somatic mutation obtained from the full-exon sequencing experiment, the system automatically collects information about the mutated gene, the corresponding protein, and the pathways where the gene is involved. The application also collects information on the plausible functional effect of the mutations as predicted by the SIFT, Polyphen 2, and SNPs&GO servers. The drug analysis is based on the direct relations between mutated genes and drugs (extracted from PharmaGKB, NCI and Matador) and indirect relations between drugs and the affected pathways (extracted from KEGG medicus). The system also displays information about the expression of the mutated genes in different tissues. Other information sources will be added in the next versions.

The personalized medicine application is freely accessible at: http://permed.bioinfo.cnio.es/. Users are able to browse the information compiled by the system through a user-friendly web interface. Users must enter a username and password to access their experiment’s data. A demo example is available to illustrate the application functionalities. The Epitehelial carcinogenesis group and the Gastrointestinal Cancer Clinical Research Unit, at the CNIO, are using the system for lung and bladder cancer studies.

Poster A60

CONCEPT AND APPLICATION OF A COMPUTATIONAL WORKFLOW FOR EPITOPE PREDICTION IN BACTERIA

Daniela Resende Universidade Federal de Ouro Preto

Nesley Jesus Daher (Centro de Pesquisas René Rachou, Laboratório de Parasitologia Celular e Molecular); Antônio Mauro (Centro de Pesquisas René Rachou, Laboratório de Parasitologia Celular e Molecular); Alexandre Barbosa (Universidade Federal de Ouro Preto, Núcleo de Pesquisas em Ciências Biológicas); Jeronimo Conceição (Centro de Pesquisas René Rachou, Laboratório de Parasitologia Celular e Molecular);

Short Abstract: Peptide-based vaccines in which small peptides derived from target proteins epitopes are used to induce an immune reaction have attracted substantial attention as a potential means of preventing infectious diseases. With the availability of genome datasets, epitope prediction by computational methods becomes one of the most promising approaches to vaccine development.
Corynebacterium pseudotuberculosis is an important animal pathogen and the etiological agent of a disease called caseous lymphadenitis. In this work, we developed a structural database approach to identify epitopes that could be experimentally tested for vaccine development. Predictions from nine algorithms including MHCI, MHCII and B–Cell epitope predictors jointly with protein subcellular localization from 13 bacterial genomes were integrated in a database. The use of the specific conceptual schema developed was important in establishing an environment that made possible to accommodate the data from diverse approaches predictions, manipulate and extract the relationships between entity classes.
Results show that 11 proteins from C. pseudotuberculosis have all characteristics to be a good vaccine candidate, that is: peptides that have affinity binding with 19 MHCI alleles and 15 MHCII alleles (human and mouse); peptides predicted to bind surface immunoglobulins from B cells and proteins located on plasmastic membrane or secreted. From these 11 proteins, four have functions related to transport, three are related to metabolism, one is a collagen adhesin and the others are hypothetical proteins.
The resource developed represents an important tool that can be used to drive vaccine development in these bacteria.
FINANTIAL SUPPORT: FAPEMIG, CAPES, FIOCRUZ, CNPq, UFOP

Poster A61

Structural analysis of the functional effects of perforin mutations and association with clinical data of FHL2 patients

Omer AN Koc University

Short Abstract: Perforin is a cytotoxic protein secreted by T-lymphocytes and NK-cells which plays a key role in immune system in elimination of virus-infected and malignant cells, in synergy with granzyme-B which induces apoptosis. Essentiality of perforin for cytotoxic-lymphocyte-function has been known for decades, however, the molecular and structural basis for membrane-binding and pore formation have been recently revealed. Mutations in perforin cause fatal FHL2 disease, that urges further-studies addressing structure-function relationship. In this study, functional consequences of perforin-mutations based on structure and resulting phenotypic-effects on FHL2 patients are investigated. Clinical-data of 90 patients are collected from the literature. Patients present clinical and laboratory findings in common such as fever, splenomegaly, cytopenia, hypertriglyceridemia/hypofibrinogenemia, hemophagocytosis, reduced/absent NK-cell activity; which establish the characteristics and diagnosting criteria for FHL. Classification of patients due to mutation-types show that some mutations are observed more common in certain ethnic groups: W374X in Turkish, L17X in African-American, L364fs in Japan patients. Furthermore, detailed-analysis of 52 missense mutations on the structure of mouse perforin is performed; 40 are within MACPF-domain and 9 within C2-domain, majority of which reside in functionally-important-regions for perforin-cytotoxicity. Potential binding sites between perforin-perforin and perforin-granzyme-B complexes are predicted via an “interface prediction algorithm by using structural matching”. Observed-mutations show a higher percentage(66%) to cause total energy changes on protein structure rather than introduction of random mutations(36%). A91V-polymorphism, however, stabilized the protein-structure by lowering its total energy. Perforin-structure is significantly altered by mutations, leading defects in cytotoxic-death-pathway of lymphocytes resulting in severe symptoms observed in FHL2-patients.

Poster A62

Bioinformatics and Biodiversity: A holistic approach to enhancing drug discovery.

Negusse Kitaba Glamorgan University

Short Abstract: This poster is based on Proceedings Submission xxx
For millennia, plants have provided humans with a source of medicine, and they remain the source of much drug discovery today. Identifying a wide variety of active metabolites is of key importance in discovering new substances for the treatment of specific diseases. Progress in the sciences of genomics, proteomics and metabolomics have allowed us to gain a wide understanding of biology at the system level. However, the traditional knowledge of medicinal plants held within indigenous communities is still invaluable in identifying potential candidates for drug design.

Communities live across diverse ecological ranges and have contact with nature on a daily basis. As a result, biological knowledge accumulates over the generations. Collating and preserving such data not only contributes to biodiversity conservation but also identifies species warranting detailed genetic analysis. However, there are many challenges: As result of high biome diversity, it is not easy for a few research groups to capture all information about a species in all ecological ranges in time and space by. Data access and integration is also challenging due to the lack of a standard format, and constant changes in species taxonomy can reduce the reliability of the data.

Biodiversity informatics has the potential to identify, locate and integrate information vital for drug discovery. This poster uses information gathered from indigenous communities to demonstrate how ethnobotanical and bioinformatics tools can be linked to provide data of pharmaceutical importance.

Poster A63

Finding locally disrupted RNA structure from SNPs

Sabarinathan Radhakrishnan Center for non-coding RNA in Technology and Health, IBHV

Jan Gorodkin (Center for non-coding RNA in Technology and Health, IBHV) Stefan E. Seeman (Center for non-coding RNA in Technology and Health, IBHV, University of Copenhagen);

Short Abstract: The distinct function of a non-coding RNA is governed by its structure. Effects of Single Nucleotide Polymorphisms (SNPs) on the structure of non-coding regulatory RNAs have been found associated with disease phenotypes (Halvorsen et al., 2010). Some (in human) are hereditary Hyperferritinemia Cataract Syndrome, Retinoblastoma and Neuropsychiatric disorder. Existing computational methods, such as RNAmute, RNAmutants, RDMAS, SNPfold predict these deleterious mutations based on the change induced in native (secondary) structure. However, these methods carry out their prediction globally where their scoring based on comparison of the structure between wild type and mutant over the entire sequence. Clearly, the longer the sequences are the harder it becomes to detect local conformational changes on global scale. However, such local changes might well be responsible for a dysfunction like inhibition of viral replication in Hepatitis C virus (You et al., 2004). To address this problem, we have developed a computational method to search for local changes by comparing all substructures between a wild type and a mutant sequence. An effective pre-processing method is incorporated to save the computation time during substructures comparison. We have tested this method with comprehensive dataset obtained from the literature, which contains 514 known human disease-associated SNPs mapped to the UTR regions of mRNA (Halvorsen et al., 2010). Of these, 30% of the SNPs are predicted with high local changes comparatively to the global measure. For example, in Hyperferritinemia Cataract Syndrome, most of the known SNPs are predicted to have local disruption in the regions of functionally important site.

Poster A64

Heme Detoxification Protein: A promising antimalarial drug target.

Dedan Githae Technische University, Munich

Andrea Schafferhans (Technische University, Munich, faculty of informatics); Marco Punta (Technische University, Munich, faculty of informatics); Burkhard Rost (Technische University, Munich, faculty of informatics);

Short Abstract: The malaria causing parasite, Plasmodium falciparum, has always been a threat to mankind. It has developed resistance to several antimalarial drugs that have been available, dwindling the hopes of ever eradicating this disease. W.H.O. reports that approximately half of the world’s population is at risk of malarial infection. During the erythrocytic stage, hemoglobin breakdown by the parasite releases the heme prosthetic group, which is toxic to the parasite, and can lead to cell lysis and parasite death if it is not converted to hemozoin. Previously, efficacious drugs had their antimalarial activity by binding heme and thereby inhibiting hemozoin formation. However, rapid development of drug resistance and cross-resistance to closely related drug families creates an urgent need to develop new classes of novel drugs. Here, we analyze the main protein responsible for hemeozoin formation, heme detoxification protein (HDP) as a potential drug target. HDP catalyses the dimerisation of heme to ?-Hematin. HDP inhibitors would lead to a novel class of drug compounds, an important step towards an efficient treatment for Malaria. Our aim is to identify the putative HDP heme-binding site, through sequence and structure level analyses. First results show conserved histidine residues on the surface of our model structures. We propose these to be essential for heme binding. Though a lot remains unknown about this protein, its indispensability and conservation among all plasmodium species makes it an attractive and promising drug target, with reduced likelihood of resistance development.

Poster A65

Network Analysis and validation of gene expression profiling of salivary gland from human and mice with primary Sjögren’s syndrome

Hui Zhou University of California Los Angeles

Short Abstract: Background: Network analysis of gene expression data from human and mouse salivary glands of primary Sjögren's Syndrome (pSS) and health controls could help prioritize and discover novel disease associated signaling pathways as well as identify suitable mouse models for elucidation of pathogenesis.
Methods: Two different sets of human samples and one mouse samples were obtained and assessed by gene expression microarray profiling separately, followed by weighted gene co-expression network analysis (WGCNA).
Results: Gene co-expression modules related to pSS were significantly enriched genes known to be involved in the immune/defense response, cell cycle, cell death, cancer and cell signaling. Detailed functional pathway analysis indicated that pSS-associated modules were enriched in natural killer cell mediated cytotoxicity (P =3.5E-6), T cell receptor signaling pathway (P=1.1E-5), B cell receptor signaling pathway (P=2.5E-4) and pathway in cancers (P=5.4E-2). Using module preservation statistics, six pSS- related modules were found to be highly preserved and reproducible in the second independent human data set (p<10E-24). Five of them were moderately preserved in mouse dataset.
Conclusion: Systems biology analysis gene profiling of salivary gland tissue from of the human and mouse with pSS revealed key pathways and molecular targets associated with pathogenesis. Known pSS related pathways (immune-related pathways) were identified and newly identified pathways (cell death, cell cycle and pathways in cancers) were preserved in both human and mouse data. The identified and computationally validated known and novel pathways represent promising targets for biological validation experiments and may ultimately inform therapeutic intervention studies and elucidate pathogenesis with suitable mouse models.

Poster A66

The ChipYard system for high-throughput microarray data analysis

Grischa Toedt EMBL

Nicolas Delhomme (EMBL, High Throughput Functional Genomics Center); Frederic Blond (DKFZ, Molecular Genetics); Peter Lichter (DKFZ, Molecular Genetics);

Short Abstract: The poor understanding of the molecular mechanisms underlying tumor development and progression along with the lack of diagnostic- and prognostic genetic markers poses a significant problem for targeted cancer therapy.
To identify pathomechanisms that cause tumor formation and to discover novel molecular markers for early clinical diagnosis and prognosis, we developed a system for the high-throughput analysis of microarray data called ChipYard.
Using up-to-date annotation information and analysis methods, ChipYard was used for the detailed analysis of numerous individual microarray studies at the German Cancer Research Center (DKFZ), as well as for the evaluation of existing data sets from public databases.
We are going to present the microarray analysis system, our software package for the annotation of microarray experiments, as well as the possibility to reanalyze experiments from the Gene Expression Omnibus (GEO).

Poster A67

Candidate mutations for early-onset lung cancer by family genome sequencing

Evangelos Simeonidis University of Luxembourg

Jared Roach (Institute for Systems Biology, -); Mary Brunkow (Institute for Systems Biology, -); Gustavo Glusman (Institute for Systems Biology, -); Sheila Reynolds (Institute for Systems Biology, -); Rudi Balling (University of Luxembourg, Luxembourg Centre for Systems Biomedicine); Leroy Hood (Institute for Systems Biology, -); David Galas (Institute for Systems Biology, -); H.-Erich Wichmann (Helmholtz Centre Munich, -); Sara Grimm (National Institute of Environmental Health Sciences, -); Richard Gelinas (Institute for Systems Biology, -);

Short Abstract: Early-onset lung cancer has been studied as a rare, but distinct, sub-type of lung cancer. Genome-wide association studies (GWAS) have linked several genes with this form of malignancy. We sequenced the genomes of a family quartet in which one of the offspring was diagnosed with early-onset lung cancer at about 48 years of age. The family has a history of heavy smoking and the father had in the past been diagnosed with head and neck cancer. The DNA source was blood, which leads us to concentrate our analysis on Mendelian inheritance models. To make the inheritance pattern explicit, we establish the parental origin of the offspring’s genomes through phasing of their chromosomes. This helps identify whether mutations in the proband came from the father or the mother. More than 18 million sequence variants were initially identified in the proband through comparison to the hg19 reference genome. We reduce this list to fewer than 200 potentially functional variants (e.g. single nucleotide variations and short indels) present in the genomes of the proband and at least one parent, by applying a series of filters. We refine the list of candidate mutations further by comparison to gene candidates from GWAS studies and genes that are mutated in lung cancer tissue as recorded by The Cancer Genome Atlas. The results of our analysis are discussed and conclusions about possible causative mutations for early-onset lung cancer are drawn.

Poster A68

Improving the detection of deleterious mutations integrating the predictions four well-tested methods

Yana Bromberg The State University of NJ

Emidio Capriotti (Stanford University) Yana Bromberg (Rutgers University, Department of Biochemistry and Microbiology); Russ Altman (Stanford University, Departments of Bioengineering and Genetics);

Short Abstract: In the past few years the number of human genetic variations deposited in the web available databases has been increasing exponentially. The last version of dbSNPs (build 132) contains about 20 million validated Single Nucleotide Polymorphisms (SNPs). SNPs make up most of human variation and are often the primary causes of disease. The coding region non-synonymous SNPs (SAPs) result in amino acid changes and may affect protein function causing severe genetic diseases. Although several methods for the detection of SAPs have been implemented, the increasing amount of annotated data is now offering the opportunity to develop more accurate algorithms.
Here we present an approach for the prediction of the effect that integrates four methods including PANTHER, PhD-SNP, SIFT and SNAP. We first tested the accuracy of each method using a dataset of 35,986 annotated mutations from 2,269 proteins extracted in October 2009 from SwissVar database. The four methods reached overall accuracies ranging between 64% and 76% and a correlation coefficient from 0.38 to 0.53. We then developed an SVM-based approach that takes as input a ten elements vector derived for the output of the 4 methods. When tested using a cross-validation procedure, the integrated method reaches 80% overall accuracy and 0.60 correlation coefficient resulting in 4% higher accuracy and 0.07 higher correlation coefficient with respect to the best method.
Availability: http://snps.uib.es/meta-snp

Poster A69

Protein-level effects of mutations in ovarian cancer, acute myeloid leukemia and glioblastoma multiforme

Janita Thusberg Buck Institute for Research on Aging

Charles Vaske (University of California, Santa Cruz, Center for Biomolecular Science and Engineering); Zack Sanborn (University of California, Santa Cruz, Center for Biomolecular Science and Engineering); Joshua Stuart (University of California, Santa Cruz, Center for Biomolecular Science and Engineering); Christopher Benz (Buck Institute for Research on Aging) David Haussler (University of California, Santa Cruz, Center for Biomolecular Science and Engineering); Sean Mooney (Buck Institute for Research on Aging)

Short Abstract: The heterogeneity of mutation profiles in cancer patients calls for detailed annotation of the putative downstream effects of mutations. A subset of the somatic mutations is expected to function as drivers, and identifying variants responsible for tumor progression among dozens, or even hundreds within a patient, is not a trivial task. Protein-level annotation of mutations may reveal functions beyond the analysis of genes enriched in mutations, providing molecular level explanations for the roles of missense mutations in cancer progression. By elucidating protein-level effects of missense mutations common patterns among cancers can be found.
Coding variants identified from exon-capture and whole genome sequencing of tumor samples have been bioinformatically characterized by a suite of applications to identify variants likely to disrupt molecular function or clinical phenotype and hypothesize their molecular effects. Based on a training set of disease causing mutations and neutral polymorphisms, the program MutPred utilizes a Random Forest method to predict functional variants using proteomic, genomic and bioinformatic attributes. Since these attributes are based on known and predicted protein functional sites, the disrupted function can be quantitatively hypothesized. The MutPred scores and functional attributes identify variants likely to drive tumor progression and are also utilized further with other genomic data in pathway analysis. PARADIGM is a probabilistic graphical model that integrates genomic and functional genomic data to infer tumor-specific alterations in gene activity in the context of known gene pathways. The protein-level interpretations of mutations are used to extend the PARADIGM model of gene disruptions beyond deletions and amplifications.

Poster A70

Functional profiling of pharmacogenetic non-synonymous SNPs

Sean Mooney Buck Institute for Research on Aging

Janita Thusberg (Buck Institute for Research on Aging) Emidio Capriotti (Stanford University, Department of Bioengineering); Jim Auer (Buck Institute for Research on Aging)

Short Abstract: Bioinformatic study of the effects of disease-related SNPs is well established and several tools for annotation and prediction of the effects of these variants have been developed. Due to large-scale genotyping and sequencing efforts, increasing amounts of knowledge about genetic variants associated with diseases, complex phenotypes as well as drug response and metabolism is accumulating in the literature and databases, that will be invaluable data for personal genetics applications and also for clinical setting.
Little is known about the nature of pharmacogenetics variants as compared to disease-causing mutations and neutral polymorphisms, and whether the same methods can be utilized in studying and predicting disease-causing variants and pharmacogenetic SNPs. We have annotated the protein level consequences of pharmacodynamic and pharmacokinetic variants in PharmGKB by bioinformatics methods and elucidated features that differentiate a pharmacogenetic variant from other types of genetic variants. We analyzed a set of 352 SNPs from 222 proteins that have been annotated in the SwissVar database. We found that 92% of them were annotated as neutral polymorphisms and only 74% of them were correctly annotated by mutation prediction algorithms. This suggests that about one over four mutations could have some functional effect. In the near future, the results of this analysis are aiming to provide the characteristic features that can be used to define a pharmacogenetic variant fingerprint, further utilized in the development of methods for discovering new variants with putative pharmacogenetic effects, and to hypothesize new biomarkers for predicting drug response.

Poster A71

Shared susceptibility factors between autoimmune disorders and autism

Jae-Yoon Jung Harvard Medical School

Dennis Wall (Harvard Medical School, Center for Biomedical Informatics);

Short Abstract: There have been family history studies suggesting that autism spectrum disorder (ASD) and autoimmune diseases (ADs) may share common factors, but only a few shared, candidate genes have been known.
We obtained ASD genotype data from Autism Genetic Resource Exchange(AGRE), and 6 AD data sets from Wellcome Trust Case Control Consortium (WTCCC). We applied two state-of-the-art cross-disorder analysis approaches to examine whether ASD and AD genotype data have allele-specific associations.
We found that genetic variation profiles of ASD and ankylosing spondylitis (AS) are positively correlated, while multiple sclerosis (MS) and ASD are negatively correlated when we consider nominally significant SNPs of P < 0.05 in both diseases.
The correlation coefficient between ASD and AS profiles is higher than any of autoimmune disease pairs, supporting the idea that autism may share some genetic factors with autoimmune disorders. The strength of association in autism and MS profiles is comparable with autoimmune groups, and this result is reasonable considering that AS and MS pair is shown to be negatively correlated. Autoimmune thyroid disease is slightly positively correlated with ASD, and the other diseases (Crohn's disease, rheumatoid arthritis, and type 1 diabetes) showed little association in the profiles with autism.
This work supports the hypothesis of immunological etiology of autism, and suggests that their associations are highly selective and directional.

Poster A72

Discovery of novel cancer pathway-activating genomic aberrations via a mutual information metric

Olga Botvinnik Broad Institute

Pablo Tamayo (Broad Institute, Cancer Program); Jill Mesirov (Broad Institute, Cancer Program);

Short Abstract: Mutations in oncogenes commonly activate downstream signaling pathways. However, oncogenic pathway expression can be high in the absence of oncogenic mutation. REVEALER can discover novel candidate pathway activators. For example, many lung cancers are driven by an activating mutation in V-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog (KRAS), inducing overexpression of the KRAS pathway, however others lack such genetic lesion yet have similar pathway activation. We hypothesize existence of alternative genetic lesions that induce KRAS pathway activation in these rogue cases. Using a stepwise search algorithm, we uncover putative pathway-activating genomic aberrations in large collections of cell lines. First, we correlate the mutations of a handful of known KRAS pathway genes to the expression of a number of candidate signatures of KRAS pathway activation. This is accomplished with a sensitive universal metric derived from Mutual Information (MI). Second, we remove samples that already have an activating mutation, focusing our search space on those that have no mutations in KRAS pathway genes. Then, we mine our database to find which mutations and copy number changes fill in the gaps, namely, those cases of high KRAS pathway expression not explained by the known KRAS pathway genes.
REVEALER can be used in a variety of feature selection problems and we plan to release the code as an R package. Once they are validated, these activating genomic features can be combined with clinical data to generate inferential models for classifying cancer subtypes and stratifying patients into different clinical outcome and drug response groups.

Poster A73

The Cancer Genomic Browser

Kyle Ellrott University of California, Santa Cruz

Kord Kober (University of California, Santa Cruz, CBSE); Brian Craft (University of California, Santa Cruz, CBSE); Jing Zhu (University of California, Santa Cruz, CBSE); David Haussler (University of California, Santa Cruz, CBSE);

Short Abstract: As the rate of cancer genomic information production has increased so has the task of managing the computational infrastructure required to host, visualize and analyze that data.

The UCSC Cancer Genomics Browser (https://genome-cancer.ucsc.edu) provides a variety of tools to help researcher analyse their data. It includes sortable genomic heatmaps linked to patient clinical attributes that can be explored on demand, run statistical tests between user defined groupings of samples, genomic signature evaluations, and toggle between chromosomal and custom gene sets views. The cancer browser is linked to the UCSC Human Genome Browser, helping to provide genomic context for interpreting cancer genomics data.

One of the fastest growing dataset hosted on public cancer browser portal comes from the The Cancer Genome Atlas (TCGA) project. The Cancer Genome Browser hosts TCGA data from thousands of samples over a dozen cancer types, covering multiple genomics profiles: DNA copy number variation (e.g. SNP6 array, whole genome sequencing), gene expression (microarrays, RNAseq), DNA methylation, miRNA expression, somatic mutation (whole genome, exome sequencing), and the assocaited clincial information. The UCSC cancer genome browser is regularly updated to reflect the latest results of the ongoing research at TCGA.

Designed with the privacy issues that come along with patient data in mind, cancer browser accounts provide user-specific sessions, gene sets, genomic profiles, and user-specific data tracks controls. In addition to the public portal, the Cancer Genomics Browser also supports projects requires secured access control, where both the public poral and private data can be viewed under one browser.

Poster A74

Prediction of miRNA gene targets – a combined computational and experimental approach

Anastasis Oulas Institute of Molecular Biology and Biotechnology

Nestoras Karathanasis (University of Crete, Biology); Ioannis Iliopoulos (University of Crete, Medical Division); Panayiota Poirazi (Institute of Molecular Biology and Biotechnology, Computatioinal Biology lab); Kriton Kalantidis (University of Crete, Biology);

Short Abstract: Genomic studies have shown that various regions of the human genome are often associated with a cancerous phenotype. These regions are often hotspots for genetic perturbations such as deletions. In most of these cases the actual regulators present in these "cancer associated genomic regions (CAGRs)", which govern the molecular mechanisms leading to a tumourigenic phenotype, are scarcely known. Frequently, tumour suppressors or oncogenes reside within these regions. It is only until recently that miRNAs have been shown to encompass novel regulatory subunits present within CAGRs. Our research in this area focuses in elucidating the molecular and regulatory mechanisms underlying novel miRNA genes present in CAGRs.
Our laboratory recently published a combined computational as well as experimental approach for predicting and verifying novel miRNA gene candidates within CAGRs. These regions are commonly deleted in various cancer types.
This work is being pursued in order to elucidate the molecular mechanisms that are distorted as a result of the deletion of these miRNAs and furthermore provide a link to tumourigenic processes.
The first research step towards this is to identify the downstream targets of the novel miRNAs. This requires the use of computational tools. We are currently working on the development of new in silico miRNA target prediction tools which, in combination with established prediction platforms are used to screen for putative gene targets for these miRNAs. We are utilizing luciferase reporter assays to verify direct interactions of miRNAs with predicted target.

Poster A75

analysis of structural and functional impacts of bruton tyrosine kinase mutations

Jouni Väliaho Institute of Biomedical Technology

Mauno Vihinen (Institute of Biomedical Technology, Bioinformatics);

Short Abstract: X-linked agammaglobulinemia (XLA) is caused by mutations in the gene encoding Bruton tyrosine kinase (BTK). XLA patients have a decreased number of mature B cells and a lack of all immunoglobulin isotypes, resulting in susceptibility to severe bacterial infections. We are collecting XLA-causing mutations are collected in a mutation database (BTKbase), which is available at http://bioinf.uta.fi/BTKbase/. The database contains 1155 cases coming from all around the world. Btk protein consist of five distinct structural domains, from the N-terminus: pleckstrin homology (PH), Tec homology (TH), Src homology 3 (SH3), SH2, and the catalytic kinase domain (TK).
We did a detailed analysis of XLA causing missense mutations in Btk tyrosine kinase domain by using numerous methods for predicting the effects of amino acid substitutions. We have utilized the available 3D structures of Btk kinase domain for studies of the contacts between residues, their implication for the stability of the protein, and the effects of the introduced residues. Investigations of steric and stereochemical consequences of substitutions provide insights on the molecular fit of the introduced residue. Mutations that change the electrostatic surface potential of a protein have wide-ranging effects. Analyses of the effects of mutations on interactions with ligands and partners have been performed for elucidation of functional mutations. Detailed analyses of the variations have allowed us to provide tentative explanation for all the 478 unique variations found in XLA patients.

Poster A76

Aptamer Binding, Simulation & Experiment Unexpected Agreement

Bjoern Hansen University of Hamburg

Cindy Meyer (University of Hamburg, Department of Chemistry); Tijana Zivkovic (University of Hamburg, Department of Chemistry); Ulrich Hahn (University of Hamburg, Department of Chemistry); Andrew Torda (University of Hamburg, Center for Bioinformatics);

Short Abstract: Aptamers are small oligonucleotides which bind specific target molecules. Their small size, lack of immunogenicity and ease of selection
(SELEX) and synthesis has made them a promising class of potential pharmaceuticals. Unfortunately, structural studies are still not routine, so they provide a good excuse to apply molecular modelling
techniques. We have been in the position to compare experimental
measurements with properties from molecular dynamics simulations.

We have modelled a set of aptamers which bind the interleukin-6 receptor
(IL-6R). Although one might be tempted to approach this as a molecular
docking, the first simulations showed that the measured binding
affinities may really reflect the quadruplex stability. Stability is
notoriously hard to predict, so the correlations with experiment may be
a remarkable coincidence, but the trends are rather persuasive.
Since experiment seems to be correlated with simulation, we are now
experimentally testing the first sequences selected based on the
simulation results.

Poster A77

Systematic Analysis of MicroRNAs Targeting the Androgen Receptor in Prostate Cancer Cells.

Pekka Kohonen University of Turku

Päivi Östling (VTT Technical Research Centre of Finland, and Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Medical Biotechnology); Suvi-Katri Leivonen (VTT Technical Research Centre of Finland, and Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Medical Biotechnology); Anna Aakula (VTT Technical Research Centre of Finland, and Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Medical Biotechnology); Rami Makela (VTT Technical Research Centre of Finland, and Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Medical Biotechnology); Zandra Hagman (Lund University, Department of Laboratory Medicine, Division of Clinical Chemistry); Anders Edsjö (University and Regional Laboratories Region Skåne and Lund University, Clinical Pathology, Malmö and Department of Laboratory Medicine, Centre for Molecular Pathology); Henrik Edgren (University of Helsinki, Institute for Molecular Medicine Finland (FIMM)); Daniel Nicorici (University of Helsinki, Institute for Molecular Medicine Finland (FIMM)); Anders Bjartell (Lund University, Department of Clinical Sciences, Division of Urological Cancers); Yvonne Ceder (Lund University, Department of Laboratory Medicine, Division of Clinical Chemistry); Merja Perala (VTT Technical Research Centre of Finland, and Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Medical Biotechnology); Olli Kallioniemi (VTT Technical Research Centre of Finland and University of Helsinki, Institute for Molecular Medicine Finland (FIMM));

Short Abstract: Androgen receptor (AR) is expressed in all stages of prostate cancer progression, including in castration-resistant tumours, but regulation of AR levels remains poorly understood. To systematically characterize AR regulatory mechanisms, we conducted a gain-of function screen of 1129 miRNA molecules. Changes in the AR protein content were quantified using protein lysate microarrays in five AR-positive prostate cancer cell lines. Rank-based meta-analysis methods were especially good at identifying consistent changes across a diverse panel cells; we propose that diverse panels of cells should be used in high-throughput screening and the results combined with meta-analysis approaches. We defined 71 miRNAs that influenced the AR protein levels. Analysis of prostate cancer samples confirmed a negative correlation of miR-34a and miR-34c expression with AR levels. Our findings establish that miRNAs interacting with the long 3'UTR of the AR gene are important regulators of AR protein levels.

Long Abstract: Click Here

Poster A78

Detecting epistatic interactions in genome-wide association studies based on an extended random forest approach

Makiko Yoshida Hitachi, Ltd., Central Research Laboratory

Asako Koike (Hitachi, Ltd., Central Research Laboratory, Biosystems Research Department);

Short Abstract: Background: Identifying genetic interactive effects, i.e., epistatic interactions, in genome-wide association studies (GWAS) remains one of the significant challenges in the context of understanding complex diseases. Method: We developed a new method for detecting epistatic interactions in GWAS by extending a learning approach called random forest. The random forest technique is one of the predictive methods that produces a series of classification trees using a large set of predictor variables, and has been proposed for use to discover SNPs which are most predictive of the disease status in large-scale association studies. It has, however, some limitations for identifying epistatic interactions. First, it may perform poorly for detecting SNPs with little marginal effects. Furthermore, it does not explicitly exhibit information on interaction patterns of susceptibility SNPs. We extended the random forest framework to overcome the above limitations by means of (i) modifying the construction of the random forest, and (ii) implementing a procedure of extracting interaction patterns from the random forest constructed. Results: Performance of the proposed method was evaluated by using simulated data under a wide spectrum of disease models. The new method performs very well in successfully identifying pure epistatic interactions with high precision, and is still more than capable of concurrently identifying multiple interactions under the existence of genetic heterogeneity. We also applied this method to real genome-wide data to demonstrate that it is promising for practical use in GWAS as a way to reveal the epistatic interactions involved in common complex diseases.

Poster A79

Computational analysis of genome-wide DNA methylation using MeDIP-seq

Lukas Chavez Max-Planck-Institute for Molecular Genetics

Christina Grimm (Max-Planck-Institute for Molecular Genetics, Vertebrate Genomics); Jörn Dietrich (Max-Planck-Institute for Molecular Genetics, Vertebrate Genomics); Bernd Timmermann (Max-Planck-Institute for Molecular Genetics, Next Generation Sequencing Group); Hans Lehrach (Max-Planck-Institute for Molecular Genetics, Vertebrate Genomics); Bernhard Herrmann (Max-Planck-Institute for Molecular Genetics, Developmental Genetics); Michal Schweiger (Max-Planck-Institute for Molecular Genetics, Vertebrate Genomics); Kerstin Neubert (Max-Planck-Institute for Molecular Genetics, Vertebrate Genomics); Jörn Walter (Universität des Saarlandes, Epigenetik); Justyna Jozefczuk (Max-Planck-Institute for Molecular Genetics, Vertebrate Genomics); James Adjaye (Max-Planck-Institute for Molecular Genetics, Vertebrate Genomics); Ralf Herwig (Max-Planck-Institute for Molecular Genetics, Vertebrate Genomics);

Short Abstract: The generation of genome-wide data derived from methylated DNA immunoprecipitation followed by sequencing (MeDIP-seq) has become a major tool for epigenetic studies in health and disease. So far the major bottleneck of MeDIP-seq experiments was on the computational side. Here, we present our recent software development MEDIPS, the first comprehensive approach for normalization and differential analysis of MeDIP-seq data. MEDIPS is a full pipeline consisting of QC features and methods for data pre-processing and statistical analysis. MEDIPS has been intensively tested on currently more than 1.5 billion sequence reads. Here, we describe important factors, such as local CpG density and copy number variation that may influence methylation signals and show how MEDIPS compensates for these factors. We highlight the performance of the pre-processing and statistical analysis with public benchmark data as well as project data.In order to demonstrate the computational approach, we have analyzed alterations in DNA methylation during the differentiation of human embryonic stem cells (hESCs) to definitive endoderm. We show improved correlation of normalized MeDIP-seq data in comparison to available whole-genome bisulfite sequencing data, and investigated the effect of differential methylation on gene expression. Furthermore, we analyzed the interplay between DNA methylation, histone modifications, and transcription factor binding and show that in contrast to de novo methylation, demethylation is mainly associated with regions of low CpG densities.

Poster A80

Robust and accurate data enrichment statistics via distribution function of sum of weights

Aleksandar Stojmirovic National Center for Biotechnology Information

Yi-Kuo Yu (National Center for Biotechnology Information)

Short Abstract: Term enrichment analysis facilitates biological interpretation by assigning to experimentally/computationally obtained data annotation associated with terms from controlled vocabularies. This process usually involves obtaining statistical significance for each vocabulary term and using the most significant terms to describe a given set of biological entities, often associated with weights. Many existing enrichment methods require selections of the most significant entities and/or do not account for weights of entities. Others either mandate extensive simulations to obtain statistics or make unjustifiable assumptions about data distribution. In addition, most methods have difficulty assigning correct statistical significance to terms with few entities.

Implementing the well-known Lugananni-Rice formula, our approach, called SaddleSum, approximates the distribution of sum of weights asymptotically by saddlepoint method to arrive at analytically and computationally tractable form. We evaluated SaddleSum against several existing methods and demonstrated its ability to adapt equally well to distributions with widely different properties. With entity weights properly taken into account, SaddleSum is internally consistent and stable with respect to the choice of number of most significant entities selected. Making few assumptions on the input data, the proposed method is universal and can thus be applied to areas beyond analysis of microarrays, such as deep sequencing, quantitative proteomics and in silico network simulations. SaddleSum also provides a term-size dependent score distribution function that gives rise to accurate statistical significance even for terms with few entities. As a consequence, SaddleSum enables researchers to place confidence in its significance assignments to small terms that are often biologically most specific.

Poster A81

Measuring the complexity of solving diseases

Raul Rodriguez-Esteban Boehringer Ingelheim

Murat Cokol (Sabanci University, Biology); Jorg Hakenberg (Arizona State University, Computer Science);

Short Abstract: Evaluating quantitatively the complexity of solving a disease might provide insights on the difficulty associated to finding novel treatments and on the reasons for the stagnation of pharmaceutical research. We have analyzed several properties of the genes that have been studied for each disease (disease gene pools) and found different dynamics in play. Diseases are pockets of knowledge typically focused on a small set of genes but they differ in the narrowness of their focus, reflecting their cohesion or lack thereof. Diseases that have less scientific scrutiny have a wider range of complexity and tend to have faster-shifting sets of canonical genes, while diseases with more scrutiny are more stable. The complexity of solving a disease is influenced by the disease boundaries and is reflected in the stability of the research undertaken. Disease gene pools allow a direct comparison of the knowledge accrued for different diseases.

Poster A82

Nonparametric Estimation of an Unknown Probability Distribution Using Maximum Likelihood and Bayesian Approaches

Alona Chubatiuk University of Southern California

Tatiana TATARINOVA (University of Glamorgan, Mathematics); Alan Schumitzky (University of Southern Califirnia, Mathematics);

Short Abstract: The general problem is estimation of unknown distribution F from nonlinear noisy data. One important medical example is in the analysis of clinical trials. A new drug is given to a population of subjects. The drug’s behavior is stochastically determined by an unknown subject-specific vector parameter ?. This pharmacokinetic parameter ? varies significantly (sometimes genetically) between subjects, which accounts for the variability of the drug response in the population. The pharmacokinetic population problem is to determine the distribution F of ? based on the nonlinear noisy clinical data.

Two nonparametric statistical approaches for estimation F are investigated:
Maximum Likelihood and Bayesian. For the maximum likelihood approach, convex analysis shows that the maximum likelihood estimator F* of F is a discrete distribution with no more that N support points where N is the number of subjects in the population. The position and weights of the support points are unknown. We have devised and programmed a new algorithm based on the Expectation – Maximization method to determine F*. The discrete property of F* is important for optimal drug dosage design as it reduces integration problems over F* to summation problems. There are two main problems associated with these Bayesian computations 1) How to determine E[F| Data]. (Direct integration is out as F is infinite dimensional); and 2) How to choose G0. (The Bayesian estimate of F is very sensitive to G0). We will present applications of this method to different pharmacokinetics datasets.

Poster A83

Exploratory analysis of the structure and dynamics of healthy intestinal microbiota

Leo Lahti University of Helsinki

Jarkko Salojärvi (University of Helsinki, Veterinary Bioscience); Anne Salonen (University of Helsinki, Veterinary Bioscience); Willem de Vos (University of Helsinki, Veterinary Bioscience);

Short Abstract: The intestinal microbiota has a profound impact on human health and well-being. The number of microbial organisms in the human gut exceeds 10-fold the number of human cells, and the overall gene content of this metagenome is 100-fold larger than the number of human genes. The microbiota has a central role in key bodily functions such as digestion and immune system, and the gut microbiota has been suggested to play role in many common diseases.

However, the associations between the microbiota composition, health, and disease remain largely uncharacterized. The phylogenetic Human Intestinal Tract chip (HITChip) microarray provides a sensitive measurement platform for phylogenetic fingerprinting and relative quantification of the intestinal microbiota across measurement conditions. The HITchip array has been designed to target V1 and V6 hypervariable 16S rRNA regions of >1000 phylotypes, largely covering the known bacterial diversity of the human gut.

Although a shared set of common core species between individuals can be identified, the microbiota is highly specific to each individual. The considerable biological variation and high dimensionality of the data pose particular challenges for computational analyses. Adaptive machine learning techniques provide efficient tools that are robust to uncertainties and can detect relevant patterns and regularities in the data with minimal human intervention. We apply these tools to carry out a preliminary meta-analysis across a versatile collection of >1000 measurement conditions to characterize associations between the gut microbiota and health status.

Poster A84

The Adult Intestinal Core Microbiota Is Determined by Analysis Depth and Health Status

Jarkko Salojärvi University of Helsinki

Anne Salonen (University of Helsinki, Department of Veterinary Biosciences); Leo Lahti (University of Helsinki, Department of Veterinary Biosciences); Willem de Vos (University of Helsinki, Department of Veterinary Biosciences);

Short Abstract: High-throughput molecular methods are currently exploited to
characterize the complex and highly individual intestinal microbiota
in health and disease. Definition of the human intestinal core
microbiota, i.e., the number and the identity of bacteria that are
shared among different individuals, is currently one of the main
research questions. Here we apply a high throughput phylogenetic
microarray for a comprehensive and high-resolution microbiota
analysis, and a novel computational approach for a quantitative
analysis of the core microbiota in over hundred individuals. In the
presented approach we study how the criteria for the phylotype
abundance or prevalence influence the resulting core in parallel to
biological variables such as the number and health status of the study
subjects. We observed that the core size is highly conditional, mostly
depending on the depth of the analysis and the required prevalence of
the core taxa. Moreover, the core size is also affected by biological
variables, of which the health status had larger impact than the
number of the studied subjects. We also introduced a computational
method to estimate the most probable size of the core, given the
varying prevalence and abundance criteria. The approach is directly
applicable to sequencing data derived from intestinal and other
host-associated microbial communities, and can be modified to include
further definitions of the core microbiota. Hence, we anticipate its
utilization will facilitate the conceptual definition of the core
microbiota and its consequent characterization so that future studies
yield conclusive views on the intestinal core microbiota, eliminating
the current controversy.

Poster A85

Identification of secondary targets for clinically used anti-tubercular drugs through binding site similarities

Praveen Anand Indian Institute of Science

Nagasuma Chandra (Indian Institute of Science)

Short Abstract: Tuberculosis continues to be the most devastating among all infectious diseases, causing nearly 9.4 million infections and 2 million deaths annually. The exact mechanism of action, especially in relation to multiple targets, has not been characterized for most drugs. Here we use a proteome-wide binding site comparison to detect the additional targets for anti-tubercular drugs. A pocketome of 9029 sites was constructed from the previously obtained structural models of about 2800 proteins of Mtb, by obtaining a consensus from three different site prediction methods. Known binding sites of anti-tubercular drugs were taken and compared individually against the constructed pocketome. High-scoring sites were analyzed further to estimate feasibility of drug binding through docking simulations. For isoniazid, a front line drug, the binding pocket from dihydrofolate reductase was identified as a possible target, which was a chance discovery reported in literature, which indeed validates our approach. In addition a site from RibD, a riboflavin-specific deaminase enzyme is also identified as a possible target for isoniazid. Similarly possible additional targets have been identified for other drugs as well. This information will be useful to (a) decipher mechanism of action of drugs in a systemic sense, (b) obtain design principles for modifying existing drugs and (c) design drugs from a polypharmacology perspective. The approach and the algorithms used are inherently generic and can be easily adapted in other drug discovery studies as well.

Poster A86

Re-visiting classification of HLA class-I molecules

Sumanta Mukherjee Indian Institute of Science

Nagasuma Chandra (Indian Institute of Science)

Short Abstract: In most vertebrates, MHC/HLA molecules play an important role in adaptive immune recognition and autoimmunity. A first step in the HLA functional cycle is the recognition of an antigenic peptide. Hundreds of Class I alleles are known to be present which cater to recognizing a diverse range of peptides. The alleles differ from each other through sequence and structural variations at the binding sites.Thus, the precise arrangement of residues at the binding sites has an important bearing on the exact sequence of immunological events that occur in an individual. Here we analyze the three-dimensional structures at the binding sites of all known structures of HLA molecules from PDB, in order to rationalize and subsequently predict peptide specificities.

All-versus-all comparison of the binding sites in all the alleles was made using an in-house algorithm PocketMatch, the scores from which used to compute a similarity tree. Trees were also computed independently, based on all-versus-all sequence as well as whole structure alignments. Distinct branches are observed indicating a finer classification of the molecules based on binding site similarities. Though the branching pattern is in general agreement with the serological
typing used in conventional classification, several surprises were also observed in terms of diverse allele types possessing similar sites. Similarities in peptide binding preferences across alleles have also been reported in literature, correlating with this. The site-based trees allows re-visiting classification of these molecules, besides having implications for vaccine design.

Poster A87

A molecular based subtyping in juvenile idiopathic arthritis

Margherita Squillario University of Genoa

Clara Malattia (IRCCS G. Gaslini, Pediatria II, Reumatologia); Curzio Basso (University of Genoa, DISI - Department of Computer and Information Science); Annalisa Barla (University of Genoa, DISI - Department of Computer and Information Science);

Short Abstract: Juvenile idiopathic arthritis (JIA) is not a single disease, but a term that encompasses all forms of arthritis with onset in patients less than 16 years old, lasting more than 6 weeks and of unknown origin. JIA is considered a complex genetic disease, caused by a combination of multiple genetic and environmental factors. The affected joints develop synovial proliferation and infiltration by inflammatory cells, with subsequent increased secretion of synovial fluid and pannus formation.
Synovial inflammation may lead to articular cartilage and bone damage. Current classification, mainly based on clinical grounds, identifies several different subtypes. While some of them appear to represent rather homogenous entities others seem to include heterogeneous conditions.
A subtyping based on molecular features may be useful to stratify JIA into more homogenous sub-groups. To this aim, we considered 268 publicly available microarrays measuring the gene expression in blood samples from controls and cases diagnosed with different JIA subtypes. The first step of our analysis consisted in defining binary classification problems (subtype vs. control) in order to identify one signature for each subtype. For this purpose, we applied a supervised feature selection procedure based on regularization.
The functional enrichment analysis of the signatures highlighted the strong molecular heterogeneity of JIA, suggesting that the current subtyping might not detect molecular differences.
Taking a step forward a molecular-based subtyping, we also attempted a biclustering unsupervised analysis based on matrix factorization methods.
We present the results of the two analyses and comment the comparison from a biological viewpoint.

Poster A88

Identification of confounding technical bias in microarray data sets and its implications for the utility of clinical microarray cohorts

Marcin Krzystanek Technical University of Denmark

Zoltan Szallasi (Technical University of Denmark, Department of Systems Biology); Aron Eklund (Technical University of Denmark, Department of Systems Biology);

Short Abstract: Microarrays can be used in biomedical research to identify genes that are correlated with a clinical feature of interest, such as patient outcome or response to a particular drug. However, microarrays are susceptible to various types of technical bias that can affect the resulting expression measurements. If such technical bias is confounded with the clinical variable of interest, the likelihood of identifying false positive genes, i.e. genes that are correlated with outcome due to various forms of technical bias, is increased. Therefore, a method to identify data sets with confounding technical factors would lead to more efficient biomarker discovery and thus enable improved patient outcomes. We applied previously described technical "bias metrics" to expression data from four independent studies of lung cancer, and identified one study in which technical bias was predictive of patient outcome. The genes identified in the affected study did not generalize to the other three cohorts, whereas the genes identified in the three unaffected studies were consistent with each other and were predictive in a fifth independent cohort. These results confirm that technical bias is an obstacle to biomarker discovery, and that the correlation between technical bias and clinical factors can be used to identify problematic data sets, which should probably be eliminated from further analysis.

Poster A89

Shaping the healthy and diseased human proteomes by the tRNAs abundance

Shelly Mahlab The Hebrew University of Jerusalem

Tamir Tuller (Weizmann Institute of Science , Mathematics and Computer Science & Molecular Genetics ); Michal Linial (The Hebrew University of Jerusalem, Biological Chemistry);

Short Abstract: Trends in transcriptional regulation were studied in many cells and tissues while studying protein translation has been lagging behind. Recently, several technologies (e.g., mass spectrometry, ribosomal profile, RNA-Seq) have been developed that allow a fresh view on translational regulation and specifically its elongation step. Translation rate is strongly dependent of the availability of amino-acid labeled tRNA molecules. Each stage in translation elongation is carried by a specific tRNA(s) molecule that recognizes the translated codon according to the codon-anticodon complementarily rules; thus, the concentration of the different tRNA molecules in the cell is a strong determinant of translation efficiency. It is technically challenging to measure the cellular abundance of tRNA molecules. Thus, the genomic copy number has been used as a proxy of these measurements. While this approximation has been validated in unicellular organisms, its quality in humans has not been studied.
In this study, we report the first evaluation of the tRNA abundance approximation by their genomic copy number in humans. To this end, we use several experimentally resources including data from short RNA deep sequencing projects and a direct measurement of the levels of cytosolic tRNAs. We show that in humans this approximation is only partially supported. Our research indicates that direct measurements of tRNA levels are crucial for the assessment of translation elongation. We apply our methods to a few cell lines and to cancerous tissues and show the relevance of tRNAs abundance in the shaping the expressed proteomes in healthy and diseased states.

Poster A90

Improving the assessment of the outcome of non-synonymous SNVs with a Consensus deleteriousness score (Condel)

Abel Gonzalez-Perez Universitat Pompeu Fabra

Nuria Lopez-Bigas (Univeristat Pompeu Fabra, Biomedical Genomics Research Group);

Short Abstract: Several large ongoing initiatives that profit from next generation sequencing technologies have motivated –and will continue to impulse in coming years– the emergence of long catalogues of missense single nucleotide variants (SNVs) in the human genome. As a consequence, various methods and their related computational tools have been developed to classify these missense SNVs as likely deleterious or probably neutral polymorphisms. The outputs produced by each of these computational tools are of different nature and thus difficult to compare and integrate. This challenges the possibility to obtain more accurate classifications by taking advantage of the possible complementarity between different tools. Here we propose an effective approach to integrate the output of some of these tools into a unified classification, based on a Weighed Average of the normalized Scores of the individual methods (WAS). (The approach is illustrated in this work with the integration of five tools.) We show that this WAS outperforms each individual method in the task of classifying missense SNVs as deleterious or neutral. Furthermore, we demonstrate that this WAS can be used, not only for classification purposes (deleterious vs neutral variants), but also as an indicator of the impact of the mutation on the functionality of the mutant protein. In other words, it may be used as a deleteriousness score of missense SNVs. Therefore, we recommend the use of this WAS, as a Consensus deleteriousness score of missense variants (Condel).

Poster A91

The evolutionary rate of antibacterial drug targets

Arkadiusz Gladki Institute of Biochemistry and Biophysics PAS

Piotr Zielenkiewicz (Institute of Biochemistry and Biophysics PAS, Department of Bioinformatics ); Pawel Szczesny (Institute of Biochemistry and Biophysics PAS, Department of Bioinformatics); Szymon Kaczanowski (Institute of Biochemistry and Biophysics PAS, Department of Bioinformatics);

Short Abstract: Motivation: One of the major issues in the infectious disease field is the notable increase in multiple drug resistance in pathogenic species. For that reason, newly acquired high-throughput data on virulent microbial agents attracts the attention of many researchers seeking potential new drug targets. Over the years, many approaches have been used to score proteins from infectious pathogens, including, but not only, similarity analysis, conservation, reverse docking, statistical 3D structure analysis, machine learning, topological properties of interaction networks or a combination of the aforementioned methods. From a biological perspective, every essential protein is a potential target. This concept may be exemplified by ribosomal proteins, many of which are well-known drug targets in bacteria. However, in practice, the knowledge about essentiality is not sufficient to distinguish between good and bad potential drug targets.
Results: By evaluating the relative rate of evolution of genes of known pathogens, we find that the dN/dS ratio of genes coding for known drug targets is significantly lower than the genome average and also lower than that for essential genes identified by experimental methods. We also show that wide-spectrum drug targets have consistently low average omega values. Additionally, ranking genes by their omega values identifies putative wide-spectrum antibiotic potentiators. We suggest that these findings may become a useful addition to a repertoire of drug target prediction methods.

Accepted Posters

Preparing your Poster - Information and Poster Size
Poster Schedule
Vienna Poster Printing Services
Poster Categories
Search for a Poster

Attention Poster Authors: The ideal poster size should be max. 1.30 m (130 cm) high x 0.90 m (90 cm) wide. Fasteners (Velcro / double sided tape) will be provided at the site, please DO NOT bring tape, tacks or pins. View a diagram of the the poster board here

Posters Display Schedule:

Odd Numbered posters:

Set-up timeframe: Sunday, July 17, 7:30 a.m. - 10:00 a.m.
Author poster presentations: Monday, July 18, 12:40 p.m. - 2:30 p.m.
Removal timeframe: Monday, July 18, 2:30 p.m. - 3:30 p.m.*

Even Numbered posters:

Set-up timeframe: Monday, July 18, 3:30 p.m. - 4:30 p.m.
Author poster presentations: Tuesday, July 19, 12:40 p.m. - 2:30 p.m.
Removal timeframe: Tuesday, July 19, 2:30 p.m. - 4:00 p.m.*

* Posters that are not removed by the designated time may be taken down by the organizers and discarded. Please be sure to remove your poster within the stated timeframe.

Delegate Posters Viewing Schedule

Odd Numbered posters:
On display Sunday, July 17, 10:00 a.m. through Monday, June 18, 2:30 p.m.
Author presentations will take place Monday, July 18: 12:40 p.m.-2:30 p.m.

Even Numbered posters:
On display Monday, July 18, 4:30 p.m. through Tuesday, June 19, 2:30 p.m.
Author presentations will take place Tuesday, July 19: 12:40 p.m.-2:30 p.m

Want to print a poster in Vienna - try these options:

Repacopy- next to the congress venue link [MAP]

Also at Karlsplatz is in the Ring Center, Kärntner Str. 42, link [MAP]

If you need your poster on a thicker material, you may also use a plotter service next to Karlsplatz: http://schiessling.at/portfolio/

View Posters By Category

Search Posters:

↑ TOP