ISMB2010 - Home

Accepted Posters

Category 'M'- Machine Learning'

Poster M01

Integration of Pathway Knowledge into a Support Vector Framework using Reweighted Recursive Feature Elimination

Marc Johannes- German Cancer Research Center

Jan C. Brase (German Cancer Research Center, Molecular Genome Analysis); Holger Froehlich (Bonn-Aachen International Center for IT, Algorithmic Bioinformatics); Stephan Gade (German Cancer Research Center, Molecular Genome Analysis); Mathias Gehrmann (Siemens Healthcare Diagnostics Products GmbH, Diagnostics Research Germany); Maria Faelth (German Cancer Research Center, Molecular Genome Analysis); Holger Sueltmann (German Cancer Research Center, Molecular Genome Analysis); Tim Beissbarth (University Medicine Gottingen, Medical Statistics);

Short Abstract: Reweighted Recursive Feature Elimination (RRFE) is a classification algorithm that combines pathway knowledge and gene expression data. RRFE gives a gene with low fold-change an increased influence on the classifier if it is connected to differentially expressed genes. Evaluation showed both significantly increased AUC and high reproducibility of selected genes.

Long Abstract:Click Here

Poster M02

Plasma metabolites in the mammalian hibernation cycle

Anis Karimpour-Fard- University of Colorado Denver

Elaine Epperson (University of Colorado School of Medicine, Department of Cell and Developmental Biology); Lawrence Lawrence Hunter (University of Colorado School of Medicine , Center for Computational Pharmacology); Sandra Martin (University of Colorado School of Medicine, Department of Cell and Developmental Biology);

Short Abstract: Hibernation is a dynamic endogenous circannual rhythm in which metabolism, heart rate and body temperature all decrease drastically through most of the winter season in order to conserve energy. In this study using machine learning tools twenty compounds were identified that distinguish plasma among the eight different stages of hibernation.

Long Abstract:Click Here

Poster M03

Discovering knowledge hidden in mutation data using Inductive Logic Programming

TIEN-DAO LUU- IGBMC

NGOC-HOAN NGUYEN (IGBMC, Bioinformatics); Anne Friedrich (IGBMC, Bioinformatics); Jean Muller (IGBMC, Bioinformatics); luc Moulinier (IGBMC, Bioinformatics); Olivier Poch (IGBMC, Bioinformatics);

Short Abstract: Knowledge discovery in the SM2PH-db database based on Inductive Logic Programming (ILP) allows the automated discrimination of disease-causing mutations from nonpathogenic mutations. The useful rules generated from ILP, which can be interpreted both by automated reasoners and humans, provide a better understanding of the relationships between genotypic and phenotypic features.

Long Abstract:Click Here

Poster M04

CONFIDENT PREDICTABILITY: A STATISTICAL TOOL FOR IDENTIFYING PATIENTS FOR WHICH A MACHINE LEARNING ALGORITHM PROVIDES A RELIABLE DIAGNOSIS

Lee Jones- University of Massachusetts Lowell

Aik Choon Tan (University of Colorado Denver School of Medicine, Medical Oncology/Medicine); Fei Zou (University of Massachusetts Lowell, Department of Mathematical Sciences); Alexander Kheifets (University of Massachusetts Lowell, Department of Mathematical Sciences); Konstantin Rybnikov (University of Massachusetts Lowell, Department of Mathematical Sciences); Damon Berry (University of Massachusetts Lowell, Department of Mathematical Sciences);

Short Abstract: We apply local minimax learning to k-top scoring pairs feature selection and provide individual probability estimation and individual accuracy for each patient in a microarray dataset. We test this method on three cancer datasets, achieve significantly lower error in a large subsample and justify its use for personalized medicine.

Long Abstract:Click Here

Poster M05

ProQM : The first membrane MQAP

Arjun Ray- Stockholm University

Bjorn Wallner (Stockholm University, Department of Biochemistry and Biophysics); Erik Lindahl (Stockholm University, Department of Biochemistry and Biophysics); Arjun Ray (Stockholm University, Department of Biochemistry and Biophysics);

Short Abstract: Protein structure prediction of membrane proteins is still in its infancy compared to globular proteins. Methods using knowledge-based potentials derived from existing water soluble proteins
needs to be changed and adapted to account for the specific membrane environment. ProQM , SVM-trained method designed specifically for the membrane model prediction.

Long Abstract:Click Here

Poster M06

Machine Learning Studies on Subcellular Localization of Human Cytosolic Sulfotransferase SULT1C1

George Acquaah-Mensah- Massachusetts College of Pharmacy and Health Sciences

Jonathan Sheng (Massachussetts College of Pharmacy & Health Sciences, Pharmaceutical Sciences);

Short Abstract: Machine learning was used to predict the subcellular localization of wild-type and Green Fluorescent Protein (GFP)-tagged human sulfotransferase SULT1C1. Amino acid physicochemical properties were used. Two learning schemes performed well, with area under ROC curve better than 0.72. Both wild-type and GFP-tagged SULT1C1 proteins are likely cytosolic.

Long Abstract:Click Here

Poster M07

SVM-based prediction of linear B-cell epitopes using Bayes Feature Extraction

Lawrence Wee- Singapore Immunology Network

Yiu-Wing Kam (Singapore Immunology Network, Infectious Diseases); Diane Simarmata (Singapore Immunology Network, Infectious Diseases); Lisa Ng (Singapore Immunology Network, Infectious Diseases); Joo Chuan Tong (Institute for Infocomm Research, Data Mining);

Short Abstract: This work presents a novel implementation of the support vector machines method using Bayes Feature Extraction for predicting linear B-cell epitopes. The prediction method was also applied on the Chikungunya proteome where regions for potential epitope hotspots were highlighted.

Long Abstract:Click Here

Poster M08

VT-shift: a novel density-based clustering algorithm and its applications multivariate biological datasets

Nikolay Samusik- Max Planck Institute for Cell Biology and Genetics

Yannis Kalaidzidis (Max Planck Institute for Cell Biology and Genetics, ); Marino Zerial (Max Planck Institute for Cell Biology and Genetics, );

Short Abstract: Nonparametric density-based clustering is a valuable data mining tool, requiring minimal assumptions about cluster shape and size. However, state-of-art density-based algorithm, mean-shift, is ill-suited for high-dimensional spaces. Here we propose a novel algorithm, VT-shift, that overcomes these shortcomings, and demonstrate its performance on synthetic and real-world multiparametric cell-based screening datasets.

Long Abstract:Click Here

Poster M09

Evolutionary distances between divergent sequences - A rational kernel approach

Roland Schwarz- Cancer Research UK Cambridge Research Institute and Department of Oncology, University of Cambridge

William Fletcher (University College London, Department of Genetics, Evolution and Environment and Centre for Mathematics and Physics in the Life Sciences and Experimental Biology); Frank Foerster (University of Wuerzburg, Department of Bioinformatics); Benjamin Merget (University of Wuerzburg, Department of Bioinformatics); Joerg Schultz (University of Wuerzburg, Department of Bioinformatics); Florian Markowetz (University of Cambridge, Cancer Research UK Cambridge Research Institute and Department of Oncology);

Short Abstract: We describe a novel method of using finite-state transducers to construct a positive semi-definite rational kernel capable of determining evolutionary distances between sequences. We show this kernel to be superior to classical distance measures in terms of the reconstruction accuracy of phylogenetic trees derived from it.

Long Abstract:Click Here

Poster M10

Local Projection Kernels for Computational Biology

Mehmet Gönen- Bo?aziçi University

Ethem Alpayd?n (Bo?aziçi University, Department of Computer Engineering);

Short Abstract: Dimensionality reduction is commonly used to alleviate the effect of redundant or correlated features and to visualize data using few dimensions. In this poster, we give local projection kernel results on two splice site detection (Acceptors and Donors) and one translation initiation site prediction (Arabidopsis) data sets.

Long Abstract:Click Here

Poster M11

Sparse Dynamic Models for Detecting Responsive Pathways

Prasad Siddavatam- Purdue University

Yao Zhu (Purdue University, Computer Science); Michael Gribskov (Purdue University, Biological Sciences); Yuan Qi (Purdue University, Computer Science);

Short Abstract: We have developed a new computational approach to uncover responsive pathways and model their dynamics from mRNA gene expression. Our method integrates temporal expression with metabolic and regulatory pathways. We use latent Markov random fields to capture temporal interactions between the pathway states. Application to Dengue virus infection identifies 78 pathways responsive for virus infection.

Long Abstract:Click Here

Poster M12

Identification of biological features distinguishing meiotic recombination hot and cold spots in yeast.

Loren Hansen- NIH

David Landsman (NIH, NCBI); Nak-Kyeong Kim (Old Dominion University, Mathematics and Statistics); Leonardo Marino-Ramirez (Corporacion Colombiana de Investigacion Agropecuaria , Plant Molecular Genetics Laboratory);

Short Abstract: In eukaryotes there are regions with high meiotic recombination rates (hot spots) and regions with low recombination rates (cold spots). Here we construct vectors representing biological features for 2207 hot/cold spots. Using a feature selection algorithm we identified a subset of features associated with hot/cold spots in Saccharomyces cerevisiae.

Long Abstract:Click Here

Poster M13

Multitask Learning: Combining Various Genomic Features To Better Explain Phenotypic Variation

Brian Bennett- Duke University

Sayan Mukherjee (Duke Institute for Genome Sciences & Policy, Statistical Science); Phil Febbo (Duke Institute for Genome Sciences & Policy, Medicine, Oncology); Terry Furey (Duke Institute for Genome Sciences & Policy, Biostatistics and Bioinformatics);

Short Abstract: We have combined Gene Set Enrichment Analysis (GSEA) with Multitask
Learning to integrate genome-wide expression, copy number, DNA
methylation, and genotype data to identify biologically meaningful
differences between samples with varying phenotypes. Operating at the
gene set level, results from these analyses are robust and easily
interpretable.

Long Abstract:Click Here

Poster M14

Nonlinear dimension reduction and clustering by Minimum Curvilinearity unfold neuropathic pain and tissue embryological classes

Carlo Vittorio Cannistraci- King Abdullah University for Science and Technology (KAUST)

Timothy Ravasi (King Abdullah University for Science and Technology (KAUST), Red Sea Integrative Systems Biology Lab, Computational Bioscience Research Center, Division of Chemical & Life Sciences and Engineering); franco maria montevecchi (politecnico di torino, dipartimento di meccanica); Trey Ideker (University of california san diego, 4Departments of Medicine and Bioengineering); massimo alessio (San Raffaele Scientific Institute, Proteome biochemistry lab);

Short Abstract: Analysis of nonlinear-small-size-datasets characterized by few samples and large number of measures is problematic. 'Minimum Curvilinearity' is a principle which inspires two novel nonlinear machine learning – Minimum Curvilinear Embedding for dimensionality reduction and Minimum Curvilinear Affinity Propagation for clustering – for efficient unsupervised visualization and classification of nonlinear-small-size-datasets.

Long Abstract:Click Here

Poster M15

A method for predicting compound-protein interactions using canonical correlational analysis

Takuya Hashimoto- Osaka University

Shigeto Seno (Osaka University, Graduate School of Information Science and Technology); Yoichi Takenaka (Osaka University, Graduate School of Information Science and Technology); Hideo Matsuda (Osaka University, Graduate School of Information Science and Technology);

Short Abstract: We proposed a method for predicting compound-protein interactions based on canonical correlational analysis using amino acid sequences and chemical compound structures. Proposed method requires only positive instances for learning. As a result of a verification using the known compound-protein interactions, the effectiveness was actually validated.

Long Abstract:Click Here

Poster M16

A non-parametric Bayesian algorithm for predicting gene regulatory networks with a Gaussian process

Takaaki Kikuchi- Waseda University

Yohei Nakada (Aoyama Gakuin University, Department of Industrial and Systems Engineering); Takashi Kaburagi (Gakushuin University, Computer Center); Takashi Matsumoto (Waseda University, Department of Electrical Engineering and Bioscience);

Short Abstract: A Gaussian process-based algorithm is proposed for predicting gene regulatory networks with continuous values. The proposed algorithm attempts to capture nonlinearities associated with gene expression values with Bayesian non-parametrics. A Markov Chain Monte Carlo (MCMC) implementation was adopted. Results of a preliminary experiment appear encouraging.

Long Abstract:Click Here

Poster M17

A Novel Gene Ontology Prediction Algorithm Using Infinite Mixtures of Hidden Markov and Binary Models with a Dirichlet Process Prior

Takashi Kaburagi- Gakushuin University

Natsumi Tagoto (Waseda University, Department of Electrical Engineering and Bioscience); Kousuke Oota (Waseda University, Department of Electrical Engineering and Bioscience); Takaaki Tokuda (Waseda University, Department of Electrical Engineering and Bioscience); Yohei Nakada (Aoyama Gakuin University, Department of Industrial and Systems Engineering); Takashi Matsumoto (Waseda University, Faculty of Science and Engineering);

Short Abstract: We propose an algorithm to predict protein functions from a given amino acid sequence, using the gene ontology terms. The algorithm utilizes a combination of two models: hidden Markov and binary models that were extended to an infinite mixture with a Dirichlet process prior.

Long Abstract:Click Here

Poster M18

A Gene Regulatory Network Prediction Algorithm Using a Gaussian Bayesian Network Model with a Box-Cox Transformation

Haruka Miyachika- Waseda University

Tomomi Kimiwada (National Center of Neurology and Psychiatry, Department of Neurosurgery); Jun Maruyama (Waseda University, Department of Electrical Engineering and Bioscience); Yohei Nakada (Aoyama Gakuin University, Department of Industrial and Systems Engineering); Takashi Kaburagi (Gakusyuin University, Computer Center); Takashi Matsumoto (Waseda University, Electrical Engineering and Bioscience); Keiji Wada (National Center of Neurology and Psychiatry, Department of Neurosurgery);

Short Abstract: We present an algorithm for predicting a gene regulatory network structure using a Gaussian Bayesian network model. Since the output of a Gaussian Bayesian network is assumed to be a normal distribution, we adopt a Box-Cox transformation for the obtained data. We applied the algorithm to actual gene expression data.

Long Abstract:Click Here

Poster M19

ClanTox: A predictor tool for toxin-like proteins reveals 500 such proteins within viral genomes

Michal Linial- The Hebrew University of Jerusalem

Guy Naamati (Research student, Computational Biology); Manor Askenazi (Dana-Farber Cancer Institute, Proteomics);

Short Abstract: Animal toxins are short proteins that are produced in venom glands. We developed a classifier called ClanTox based on machine learning methods for identifying toxin-like protein. We applied ClanTox on all 26,000 short non redundant viral proteins and found 510 sequences that resemble conotoxins, growth factor receptors and antibacterial peptides.

Long Abstract:Click Here

Poster M20

A classifier approach to predict protein secretion in Aspergillus niger

Bastiaan van den Berg- Delft University of Technology

Jurgen Nijkamp (Delf University of Technology, The Delft Bioinformatics Lab); Marcel Reinders (Delf University of Technology, The Delft Bioinformatics Lab); Liang Wu (DSM Biotechnology Center, ); Herman Pel (DSM Biotechnology Center, ); Johannes Roubos (DSM Biotechnology Center, ); Dick de Ridder (Delft University of Technology, The Delft Bioinformatics Lab);

Short Abstract: The cell-factory Aspergillus niger is widely used for industrial enzyme production. To select potential proteins for large-scale production, we developed a sequence-based classifier that predicts if an over-expressed homologous protein will successfully be produced and secreted.

Long Abstract:Click Here

Poster M21

The e-LICO project

Simon Jupp- University of Manchester

No additional authors

Short Abstract: The e-LICO project (http://www.e-lico.eu) is building a software platform for biologists to perform data-mining experiments. This platform will be demonstrated in the task of biomarker discover and pathway modelling in the study of kidney disease, a full range of -omic datasets will be analysed using workflows generated by e-LICO.

Long Abstract:Click Here

Poster M22

Cancer Outcome Prediction by Feature Selection with Top Scoring Pairs

Ping Shi- Harvard Medical School and Harvardpilgrim Healthcare Institute

Mark Kon (Boston University, Department of Mathematics and Statistics); Surajit Ray (Boston University, Department of Mathematics and Statistics);

Short Abstract: We present an approach incorporating the scoring algorithm from top-scoring pairs (TSP) to machine learning methods for feature selection. We show this hybrid scheme enhances the classification performance in cancer prognosis datasets as well as simulated datasets. Simulation study suggests that with certain data structures, this algorithm is advantageous over other feature selection methods

Long Abstract:Click Here

Poster M23

L1-L2 regularization framework for Alzheimer's molecular characterization

Annalisa Barla- Universita di Genova

Salvatore Masecchia (Universita di Genova, DISI - Department of Informatics and Computer Science); Margherita Squillario (Universita di Genova, DISI - Department of Informatics and Computer Science);

Short Abstract: The aim of our work is to uncover the molecular characteristics of Alzheimer's disease and take a step forward in the understanding of its etiology. We apply l1l2 with double optimization, a regularization technique for feature selection, to a set of gene expression profiles of normal and affected brain tissues.

Long Abstract:Click Here

Poster M24

Scalable graph kernels with approximate matching of subtree patterns

Nino Shervashidze- Max Planck Institute for Biological Cybernetics, Max Planck Institute for Developmental Biology

Alexander J. Smola (Yahoo! Research, Machine Learning); Karsten M. Borgwardt (Max Planck Institute for Biological Cybernetics, Max Planck Institute for Developmental Biology, Machine Learning and Computational Biology);

Short Abstract: We present a scalable graph kernel for approximately matching subtree patterns in two graphs. It takes into account discrete node and edge labels and allows us to compute pairwise similarities of N graphs in a runtime that scales linearly with N and the number of edges in each graph.

Long Abstract:Click Here

Poster M25

A Mixture of Experts model for predicting expression from sequence

Sushmita Roy- Broad institute, MIT

Pouya Kheradpour (MIT, CSAIL); Aviv Regev (Broad institute, MIT, HHMI, Biology and Computational biology); Manolis Kellis (MIT, Broad institute, CSAIL and Computational Biology); Chris Bristow (Broad institute, MIT, CSAIL); Jay Konieczka (Broad institute, Harvard, Molecular & Cellular Biology);

Short Abstract: Predictive models of gene expression describe how cis-regulatory elements translate trans signals into output expression. We present a Mixture of Experts model that predicts expression from cis-regulatory elements, automatically grouping genes into sequence-driven expression clusters. On yeast and fly data we identified gene clusters and relevant cis-regulatory associations per cluster.

Long Abstract:Click Here

Poster M26

A mixture model approach to high-throughput cell cycle analysis in budding yeast

David Warde-Farley- University of Toronto

Yolanda Chong (University of Toronto, Terrence Donnelly Centre for Cellular and Biomolecular Research); Judice Koh (University of Toronto, Terrence Donnelly Centre for Cellular and Biomolecular Research); Jason Moffat (University of Toronto, Terrence Donnelly Centre for Cellular and Biomolecular Research); Quaid Morris (University of Toronto, Terrence Donnelly Centre for Cellular and Biomolecular Research);

Short Abstract: We present a probabilistic framework for the automatic shape-based clustering,
sorting and quality control of objects from high-throughput microscope assays
of budding yeast. Given a possibly noisy segmentation image, our method identifies mother-daughter cell pairs with high confidence, facilitating quantitative analysis of the cell cycle in a high-throughput setting.

Long Abstract:Click Here

Poster M27

Experimental design for genome-wide association studies

Christoph Lippert- Max Planck Institutes, Tuebingen

Oliver Stegle (Max Planck Institutes, Tuebingen, Machine Learning and Computational Biology Research Group); Karsten Borgwardt (Max Planck Institutes, Tuebingen, Machine Learning and Computational Biology Research Group);

Short Abstract: Genome-wide association studies aim at finding genetic loci that explain phenotypic variation. Due to technological advances, large-scale genotype information is readily available. However, measuring the phenotypes is often expensive and time consuming. We address the problem of selecting the individuals for phenotyping that are most informative for accurate association mapping.

Long Abstract:Click Here

Poster M28

Multivariate multi-way analysis of multi-source data

Ilkka Huopaniemi- Aalto University School of Science and Technology

Tommi Suvitaival (Aalto University School of Science and Technology, Department of Information and Computer Science); Janne Nikkilä (University of Helsinki, Department of Veterinary Biosciences); Matej Oresic (VTT Technical Research Centre of Finland, Quantitative Biology and Bioinformatics); Samuel Kaski (Aalto University School of Science and Technology, Department of Information and Computer Science);

Short Abstract: ANOVA-type methods are the default tool for the analysis of data with multiple covariates. However, existing multi-way analysis methods are not designed for the experiments where data is obtained from multiple sources. We extend the applicability area of multivariate, multi-way ANOVA-type
methods to multi-source cases by introducing a novel Bayesian model.

Long Abstract:Click Here

Poster M29

Optimizing Genome Context Methods and their Combination

Luciana Ferrer- SRI International

Luciana Ferrer (SRI International, Artificial Intelligence Center);

Short Abstract: Genome context methods have been introduced in the last decade as automatic methods to predict functional relatedness between genes.

We present a thorough study of the four main families of genome context methods: phylogenetic profile, gene fusion, gene cluster, and gene neighbor. We find that for most organisms a gene neighbor
method outperforms the phylogenetic profile methods by as much as 40% in sensitivity, being competitive with the gene cluster method at low sensitivities. Gene fusion is generally the worst performing of the four methods. A thorough exploration of the parameter space for each method is performed and results across different target organisms are presented.

In addition, we propose the use of normalization procedures for the genome context scores and show that significant gains can be achieved from their implementation. In particular, the sensitivity
of the phylogenetic profile method is improved by around 25% after normalization, resulting, to our knowledge, on the best-performing phylogenetic profile system in the literature.

Finally, we show results from combining the various genome context methods into a single score. When using a cross-validation procedure to train the combiners, with both original and normalized scores as input, a decision tree combiner results in gains of around 15% with respect to the best individual score. This represents a gain of around 12% over the state of the art in this area. Unfortunately, we find that combination procedures can lead to highly suboptimal results when phylogenetically distant organisms are used to train the combiner's parameters.

Long Abstract:Click Here

Poster M30

High resolution models of transcription factor-DNA affinities improve in vitro and in vivo binding predictions

Phaedra Agius- MSKCC

Phaedra Agius (MSKCC, Computational Biology);

Short Abstract: Accurately modeling DNA sequence preferences of transcription factors (TFs) and using them to predict in vivo TF genomic binding sites is key to deciphering the regulatory code. These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs. Recently, protein binding microarray (PBM) experiments have emerged as a new source of high-resolution data on in vitro TF binding specificities. PBM data has been analyzed either by estimating PSSMs or via rank statistics on probe intensities, where sequence patterns are assigned enrichment scores (E-scores). This representation is informative but unwieldy because every TF is assigned thousands of scored sequence patterns. We have developed a novel, flexible and discriminative framework for learning TF binding preferences from high-resolution in vitro and in vivo data. Using a novel k-mer based string kernel called the di-mismatch kernel, we trained support vector regression (SVR) models on PBM data to learn the mapping from probe sequences to binding intensities. Our compact and expressive SVR models can scan genomic regions to predict in vivo occupancy. Using data for yeast and mouse TFs, our SVR models better predicted probe intensity than E-scores or PSSMs. Moreover, SVR scores for yeast, mouse, and human genomic regions gave improved predictions for genomic occupancy as measured by ChIP-chip and ChIP-seq experiments. Finally, we trained our model directly on ChIP-seq data and found greatly improved in vivo occupancy predictions, and by comparing a TF's in vitro and in vivo models, we identified cofactors and disambiguated direct and indirect binding.

Long Abstract:Click Here

Accepted Posters

Preparing your Poster - Information and Poster Size
Poster Schedule
Poster Categories
Search for a Poster

View Posters By Category

Search Posters:

↑ TOP