Using Partial Least Square Regression to Classify Protein Family with Weak Sequence Similarities
Stephen Opiyo (Department of Agronomy and Horticulture, University of Nebraska, Lincoln); Etsuko Moriyama (School of Biological Sciences and Plant Science Initiative, University of Nebraska, Lincoln)
Abstract: For a better protein classification, how many samples should be included in a training dataset? We examined the effect of training samples on various protein classifiers. We found that the size of training datasets affected hidden Markov models and PSI-BLAST but had little effect on partial least square regression methods.
Identification of GPI modification signal peptides and prediction of their cleavage sites.
Yu Zhang (Center for Biological Sequence Analysis, BioCentrum, Technical University of Denmark); Thomas Skøt Jensen (Center for Biological Sequence Analysis, BioCentrum, Technical University of Denmark); Ulrik de Lichtenberg (Center for Biological Sequence Analysis, BioCentrum, Technical University of Denmark); Søren Brunak (Center for Biological Sequence Analysis, BioCentrum, Technical University of Denmark)
Abstract: We have developed a new sequence-based prediction tool of Glycosylphosphatidylinositol (GPI)-anchored proteins based on neural networks trained on experimentally verified data extracted from Swiss-Prot. The method predicts GPI-anchored proteins with high accuracy and performs better than all other prediction schemes currently available.
Poster D-4 (Selected for oral presentation)
Identification of novel human splice variants using "genomic fossils"
Ronen Shemesh (Compugen Ltd); Amit Novik (Compugen Ltd); Sarit Edelheit (Compugen Ltd); Rotem Sorek (Tel Aviv University and Compugen)
Abstract: We present a novel method to discover new splice variants in the human genome using processed pseudogenes. We detected hundreds of new transcript variants so far unidentified. An experimental verification of a subset of these variants indicates that most of them are still active transcripts in the human transcriptome.
Wnt Pathway Analysis with Automated Natural Language Processing
Carlos Santos (University of Michigan / Bioinformatics Program); Daniela Eggle (University of Michigan / Bioinformatics Program); David States (University of Michigan / Bioinformatics Program)
Abstract: We present a natural language processing pipeline to extract and analyze biomolecular interaction assertions from biomedical text. Focusing on the Wnt signaling pathway, the pipeline expands the existing canonical reference pathway with interaction assertions relating to that pathway, as well as renders various organism-specific variations upon the canonical pathway.
Poster C-28 (Selected for oral presentation.)
Genome-wide transcription regulatory circuits controlling cellular malignant
Yuval Tabach (The Department of Complex System, Weizmann Institute of Science); Michael Milyavsky (The Department of Molecular Cell Biology Weizmann Institute of Science); Or Zuk (The Department of Complex System, Weizmann Institute of Science); Assif Yitzhaki (The Department of Complex System, Weizmann Institute of Science); Paz Polak (The Department of Complex System, Weizmann Institute of Science); Eytan Domany (The Department of Complex System, Weizmann Institute of Science); Varda Rotter (The Department of Molecular Cell Biology Weizmann Institute of Science); Yitzhak Pilpel (The Molecular Genetics Weizmann Institute of Science)
Abstract: Here we analyzed a 600-days in-vitro malignant transformation process. Following genome-wide transcription-profiling and promoter analysis we focused on a cell-cycle related cluster that is prominent in a diversity of cancers. Our work links three levels in a complex regulatory network, namely, gene expression, promoter architecture and activity of tumor suppressors.
Poster D-72 (Selected for oral presentation.)
Trees and forests: A genome-wide reconstruction of orthologous gene groups in fungi
Ilan Wapinski (Harvard University); Nir Friedman (Hebrew University); Aviv Regev (Harvard University); Avi Pfeffer (Harvard University)
Abstract: As more genome sequences become available, resolving orthologous relations between genes is becoming increasingly challenging and important. We present a reliable method for genome-wide reconstruction of ancestral relations between genes across multiple species. Tracing divergence and duplication events allows us to hypothesize about the evolutionary forces governing genetic adaptations.
Poster D-3 (Selected for oral presentation.)
Combining Phylogenetic Data and Network Topology to Identify Regulatory Motifs
Ting Wang (Washington University); Gary Stormo (Washington University)
Abstract: Predicting regulatory motifs from whole genome sequences remains a challenging problem due to the statistic limitations of conventional motif finding algorithms. We introduce a new algorithm "PhyloNet" that combines phylogenetic information and network topology to define conserved motifs and co-regulated promoters and to build a regulatory network ab initio.
Using GEMS for Cancer Diagnosis and Biomarker Discovery from Microarray Gene Expression Data
Alexander Statnikov (Discovery Systems Laboratory, Department of Biomedical Informatics, Vanderbilt University); Ioannis Tsamardinos (Discovery Systems Laboratory, Department of Biomedical Informatics, Vanderbilt University); Constantin Aliferis (Discovery Systems Laboratory, Department of Biomedical Informatics, Vanderbilt University)
Abstract: We introduce a system GEMS for automated development and evaluation of high-quality cancer diagnostic models and biomarker discovery from microarray gene expression data. The development of GEMS was informed by a rigorous algorithmic evaluation. The system is freely available for non-commercial use from <http://www.gems-system.org/>http://www.gems-system.org
Unravelling the Architecture of Duplications in Tumor Genomes
Benjamin Raphael (University of California, San Diego); Pavel Pevzner (University of California, San Diego)
Abstract: We describe a computational method for reconstructing the architecture of duplicated regions in tumor genomes. Our method relies on data from End Sequence Profiling experiments and a model of breakage-fusion-bridge (BFB) cycles. We demonstrate our technique on the MCF7 breast tumor cell line.
Biomarker KnowledgeTree - A Flexible And Versatile Visualization Tool For Hierarchical Data
Mary Gaylord (Eli Lilly and Company); John Calley (Eli Lilly and Company); Huahong Qiang (Eli Lilly and Company); Birong Liao (Eli Lilly and Company)
Abstract: we describe a platform where non-hierarchical biological data can be visualized through the application of a customized hierarchy incorporating MeSH classifications. This platform gives users flexibility in update and ease of manipulation, and can facilitate fresh scientific insight by highlighting biological information through cross-referencing in different hierarchical branches.