Search Instructions
Categories are listed at the top of each abstract. Using the first authors last name, you can search the index below. To search by another name or key word, please use your browser's 'Find' function. In Internet Explorer or Netscape, go to the Edit menu and select 'Find'. Type in the name or keyword and hit 'Find'.
SA Tuesday, Poster #44
AILEY
Sub Clustering ESTs Using Sequence Conflicts
Bartlett G. Ailey, Byungkook Lee, Ira Pastan, National Cancer Institute, National Institutes of Health
An algorithm to cluster ESTs and thus find new tissue-specific genes from the EST database. The ESTs from a tissue are aligned to the EST database. The alignments are then used to cluster the ESTs in a predetermined order to optimize the quality of the cluster.
PF Monday, Poster #46
AL-LAZIKANI
Analysing Protein Families: Sequence, Structure, and Specificity of the SH2 Domain
Bissan Al-Lazikani, Felix Sheinerman, Barry Honig, Columbia University
We combined various sequence and structure analysis methods to analyse the diverse SH2 family. As well as gaining substantial insights into the the biology of SH2 domains, an effective suite of sequence and structure analysis tools was developed that can be applied to study any diverse protein family.
GA Sunday, Poster #15
ALBA
Cross Genome Analysis of Herpes Viruses
Mar Alba, Rhiju Das, Christine Orengo, Paul Kellam, University College London
Protein families based on sequence homology were derived from complete herpesvirus genomes. The families were used to construct phylogenetic trees, based on all herpesvirus conserved protein sequences or on gene function conservation. Functional groups were correlated with expression clusters derived from array data of human herpesvirus 8 genes. All the data is being stored in VIDA (Virus Database).
PF Monday, Poster #47
ALTUVIA
Sequence Signals for Generation of Antigenic Peptides by the Proteasome
Yael Altuvia and Hanah Margalit, The Hebrew University, Israel
Analysis of the termini and flanking regions of peptides eluted from MHC class I molecules suggests that the C-terminus and its immediate flanking position possess proteasomal cleavage signals. These signals can be used to assess the cleavage potential of peptides and aid in discrimination between immunodominant and cryptic peptides.
ME Sunday, Poster #53
ANASTASSIOU
New Visualization Tools for Biomolecular Sequence Analysis
Dimitris Anastassiou, Columbia University
We introduce new computational and visual tools for frequency-domain biomolecular sequence analysis, improving upon traditional Fourier analysis performance in distinguishing coding from noncoding regions in DNA sequences. Color spectrograms provide visual information about local patterns. Color maps identify not only the existence of protein coding areas, but also the coding direction and the reading frame for each of the exons, from the phase of the Fourier Transform.
SA Tuesday, Poster #45
ARGENTAR
Tuples, Tables, and Trees: A New Approach to the Discovery of Patterns in Biological Sequences
D. R. Argentar, K. M. Bloch, H. A. Holyst, A. R. Moser, W. T. Rogers, D. J. Underwood, A. G. Vaidyanathan, J. VanStekelenborg, DuPont Company
We describe a new pattern discovery algorithm based upon a novel set of data structures ("k-tuples", and associated tuple tables) combined with a new tree-traversal method, designed to efficiently discover all patterns at all levels of support. Results of applying this algorithm to the analysis of GPCRs will be shown.
ME Sunday, Poster #54
ASAI
Clustering and Averaging of Images in Single Particle Analysis
Kiyoshi Asai, Yutaka Ueno, Chikara Sato, Electrotechnical Laboratories; Katsutoshi Takahashi, Japan Advanced Institute of Science and Technology Hokuriku
We have been developing a single particle analysis system, which estimates the 3-D structures from randomly oriented electron-microscopic images. Iterative alignments and bottom-up algorithms with large computational power are the keys for the robust clustering, which require no manually designed reference.
SA Tuesday, Poster #46
ASAI
GeneDecoder--A Gene Finding System with Multi-stream HMMs
Kiyoshi Asai, Yutaka Ueno, Katunobu Itou, Electrotechnical Laboratories; Tetsushi Yada, Genomic Science Center, RIKEN
The GeneDecoder, a gene finding system for Eucaryote, have multi-stream Hidden Markov Models that integrate the sequence and various types of pre-processed information. This drastically reduces the complexity of the models, and enables flexible model designs by the weights for the streams dependent to each component models.
PT Monday, Poster #80
AYTEKIN-KURBAN
Constructing Sequence Domains for Protein Structural Genomics Target Selection
Gulriz Aytekin-Kurban, Terry Gaasterland, The Rockefeller University, NY
We seek to select structural genomics targets to maximize information gained from each structure and minimize total structures solved. Partitioning and clustering protein subsequences using pairwise local alignments yields sequence domains for proteins from 40 genomes. We compare domain clusters with CATH structure domains and select representative sequences for determination.
SA Tuesday, Poster #47
BAFNA
Implementation of the Conserved Exon Method for Gene Finding
Vineet Bafna, Daniel H. Huson, Celera Genomics Corporation
The "Conserved Exon Method" is a new approach to gene prediction, based on the idea of looking for conserved protein sequences by comparing pairs of DNA sequences. It simultaneously predicts gene structures e.g. in both human and mouse genomic sequences. We demonstrate the program "CEMexplorer" that implements this method.
ME Sunday, Poster #55
BAILEY-KELLOGG
The NOESY Jigsaw: Automated Protein Secondary Structure and Main-chain Assignment from Sparse, Unassigned NMR Data
Chris Bailey-Kellogg, Alik Widge, John J. Kelley, III, Marcelo J. Berardi, John H. Bushweller, and Bruce Randall Donald, Dartmouth College
We present a novel algorithm for high-throughput automated assignment of nuclear magnetic resonance spectra. Our approach identifies secondary structure patterns in unassigned data, and then aligns identified elements against the primary sequence. By deferring the traditional assignment bottleneck, our approach achieves fast, reasonably accurate results using only four spectra.
RDNA Tuesday, Poster #35
BALASUBRAMANIAN
Study of the Structural Alignment of DNA Sequences over the Core Nucleosome Particle
Sreekala Balasubramanian, Wilma K. Olson, Rutgers University
We are studying the structural similarities in Protein-DNA complexes. Currently, we are analyzing the structural similarities in nucleosome positioning sequences using a computational technique called threading. We are looking for patterns in the DNA structure that might lead to nucleosome positioning and thereby come up with rules for nucleosome positioning.
RDNA Tuesday, Poster #36
BALDI
Sequence Analysis by Additive Scales: A General Framework and its Application to DNA Structure
Pierre Baldi, Pierre-Francois Baisnée, University of California, Irvine
Motivated by the analysis and prediction of DNA structure, we develop a framework for the computational analysis of sequences by additive scales. The framework is used to determine extremal sequences, inter-scale correlations, and to mine large data sets. It is applied to DNA structural scales and tandem repeats.
DB Monday, Poster #1
BANNAI
Genomic Hypothesis Creator: An Environment that Assists the Design and Implementation of Computational Experiments for Knowledge Discovery from Genomic Databases
Hideo Bannai, University of Tokyo; Yoshinori Tamada, Tokai University; Osamu Maruyama, Kyushu University; Satoru Miyano, University of Tokyo, Japan
We present Genomic Hypothesis Creator: a genome-oriented version of a programming library that enables domain experts to effectively conduct computational knowledge discovery experiments. Several hypothesis generation algorithms implemented with POSIX threads are available. Design goals: support the creation and seamless integration of new attributes and/or existing attributes accumulated in major genomic databases.
SA Tuesday, Poster #48
BARTON
New Developments to SCANPS: High Performance Parallel Iterated Protein Sequence Searching with Full Dynamic Programming and on-the-fly Statistics
Geoffrey J. Barton, Caleb Webber, Stephen M. J. Searle, EMBL-European Bioinformatics Inssitute (EBI), UK
SCANPS performs sequence database searches using full dynamic programming. Enhancements in three areas are described: 1) On-the-fly statistics based on the score distribution found with each search; 2) Iterative searching using profiles created from the results of each iteration; and 3) Parallel processing implementations using OpenMP and MPI libraries and Intel MMX instructions.
GE Tuesday, Poster #8
BEISSBARTH
Transcription Factor CREM Dependent Expression During Mouse Spermatogenesis
Tim Beissbarth, Igor Borissevitch, Andreas Hoerlein, Annette Klewe-Nebenius, Bernhard Korn, Martin Vingron, Guenther Schuetz, German Cancer Research Center, Heidelberg
Transcription factor CREM seems to be the trigger of expression at late stages of spermatogenesis. Large-scale cloning, sequencing, and expression profiling of messages expressed in CREM dependent manner was performed: 1) Subtractive cloning (SSH) of messages absent in CREM-/- mice; 2) Sequencing the obtained library and clone selection; and 3) Gene expression profiling.
GA Sunday, Poster #16
BEKIRANOV
Integrating Gene Expression and Genome Sequence Analysis
Stefan Bekiranov, Eduardo Fajardo, Charlie Forster, Mathias Katzer, Carsten Meyer, Kass Schmitt, Alexander Sczyrba, Terry Gaasterland, The Rockefeller University, NY
Explanation of gene expression cluster patterns requires integrating genomic features of genes, gene function annotations, and gene expression measurements. It also requires careful characterization of the amount of noise and error on each hybridized microarray through correlation of repeated experiments. The TANGO system has been designed to accomplish these goals.
PS Monday, Poster #63
BENOS
Quantitative Modeling of DNA-protein Interactions
Panayiotis V. Benos, Alan S. Lapedes, Dana S. Fields, Gary D. Stormo, Washington University
We are investigating the rules behind the DNA-protein interactions, using probabilistic algorithms similar to Boltzmann machines. The network is trained on data, collected from the literature, in a multipass method. In the poster we present the general training algorithm as well as some preliminary results/predictions of the method.
RDNA Tuesday, Poster #37
BISHOP
Dynamics of DNA Depends on Conformation: An Implied Communication Network
T. C. Bishop, Tulane University and Xavier University of Louisiana; Y. M. Shi, J. E. Hearst, University of California, Berkeley
A continuum model describing DNA structure and dynamics has been developed. It is based on elastic rod theory and predicts a conformational dependence of the response of DNA to an external impulse. We propose that DNA binding proteins exploit this phenomenon to achieve signaling through the DNA.
DB Monday, Poster #2
BIZZARO
Distributing Bioinformatics Applications with Piper
J. W. Bizzaro, Gary Van Domselaar, Brad Chapman, Jean-Marc Valin, Jarl van Katwijk, Dominic Letourneau, Deanne Taylor, University of Massachusetts Lowell
Piper (http://bioinformatics.org/piper) is an interactive system for creating and managing links between Internet-distributed components such as those used in bioinformatics analyses. Components can reside remotely, on higher performance and capacity computers, while only representations reside locally. Links can depict protocol-independent data flow, procedural steps, and relationships.
SA Tuesday, Poster #49
BLAIR
A New Distributed System for Large-Scale Sequence Analyses
Douglass Blair, Gabriel Robins, University of Virginia
We implemented a new distributed system for performing large-scale sequence library comparisons. Our techniques are ideal for widely distributed computing architectures where many typical computers with modest interconnect bandwidth can be utilized in unison, and our system scales to arbitrarily large number of processing nodes.
DB Monday, Poster #3
BLANCHARD
PathDB: A Second Generation Metabolic Database
J. L. Blanchard, D. L. Bulmore, A. D. Farmer, M. Gonzales, P. A. Steadman, M. E. Waugh, S. T. Wlodek, P. Mendes, National Center for Genome Resources, NM, USA
PathDB is a relational database that stores detailed metabolic information. The database is coupled to query, visualization and discovery tools that allow for pathway diagrams to be drawn "on the fly" and for new connections to be made between independently discovered facts thereby avoiding the rigid confines of "textbook pathways."
PSP Sunday, Poster #77
BOLTEN
Clustering Protein Sequences - Structure Prediction by Transitive Homology
Eva Bolten, Alexander Schliep, Universität zu Köln, Germany ; Sebastian Schneckener, Science Factory; Dietmar Schomburg, Rainer Schrader, Universität zu Köln, Germany
We investigated the limits on transitivity when inferring structural similarity of proteins based upon their sequence similarities. We developed a novel graph-based clustering algorithm capable of handling multi-domain proteins. We will present our algorithmic advances yielding a 24 percent improvement over pair-wise comparisons, statistics of the clusterings and our general methodology.
DB Monday, Poster #4
BOUTON
DRAGON: Database Referencing of Array Genes Online
Christopher M. L. S. Bouton, Johns Hopkins School of Medicine and The Kennedy Krieger Institute, Baltimore, MD; Elizabeth Johnson, Johns Hopkins School of Hygiene and Public Health, Carlo Colantuoni, Johns Hopkins School of Medicine and The Kennedy Krieger Institute; Scott Zeger, Johns Hopkins School of Hygiene and Public Health; Jonathan Pevsner, Johns Hopkins School of Medicine and The Kennedy Krieger Institute
We have developed "Database Referencing of Array Genes ONline" (DRAGON). DRAGON is a web-accessible database that contains information derived from public databases. DRAGON defines the characteristics of genes in microarray data sets. The inclusion of this information during analysis allows for deeper insight into gene expression patterns. Web Site: www.kennedykrieger.org/pevsnerlab/dragon.htm
DB Monday, Poster #5
BOYCE
END: the Enzyme Nomenclature Database
Sínead Boyce, Trinity College, Dublin, Ireland; Andrej Bugrim, Washington University, MO; Andrew McDonald, Trinity College; Francis Fabrizio, Jakub Slomczynski, Washington University; Keith Tipton, Trinity College; Toni Kazic, Washington University
We are developing END, a database of Enzyme Nomenclature, to be used for updating the enzyme data, amending existing entries, and for on-line queries. We have written a suite of parsers and other software tools to convert various data inputs, substitute terms, bring older nomenclature up to date, and check for data consistency and duplications.
SA Tuesday, Poster #50
BRETT
Title: WWW Tools for Detecting SNPs and Alternative Splice Forms in ESTs
David Brett, Gerrit Lehmann, Jens Hanke, Stepfan Gross, Jens Reich, Max-Delbruck Center for Molecular Medicine, Germany; Peer Bork, EMBL, Germany
Two WWW tools that allow an end user to search for novel alternative splice forms or SNPs in a query protein or mRNA sequence. Candidate alternative splice forms or SNPs are detected by alignment with ESTs. The tools filter for paralogues, pseudogenes and sequencing errors. (http://mahe.bioinf.mdcberlin.de/home.html).
DB Monday, Poster #6
BUGRIM
The Agora - an Environment for Distributed Deposit, Review, and Analysis of Biochemical Information
Andrej Bugrim, Washington University, MO; Sínead Boyce, Trinity College, Dublin, Ireland; Guang Yao, University of Minnesota, Minneapolis, MN; Francis Fabrizio, Washington University; Andrew McDonald, Trinity College; Jakub Slomczynski, Washington University;, Jun Ong, University of Minnesota; Brian Feng, William Wise, Washington University; Keith Tipton, Trinity College; Lynda Ellis, University of Minnesota; Toni Kazic, Washington University
We present The Agora, a distributed computational environment for the deposit, review, and analysis of biochemical information. It provides an interface for sharing curatorial functions and queries among the independent, participating databases, while allowing each database and algorithm to preserve its native semantics, data model, and query language and the scientific community to deposit, review, and query biochemical information.
BN Sunday, Poster #1
BUGRIM
A Logic-based Approach for Computational Analysis of Spatially Distributed Biochemical Networks
Andrej Bugrim, Washington University
I present a novel approach for representing data on cell structure, the compartmentalization and localization of molecules in the cell, and an algorithm for using this information in reconstructing signaling pathways and computing their properties. The inference algorithm is based on qualitative ideas of reaction-diffusion systems theory, encoded as production rules for logic-based computations.
ME Sunday, Poster #56
BUHLER
Improved Techniques for Finding Spots on DNA Microarrays
Jeremy Buhler, Trey Ideker, David Haynor, University of Washington
Dapple is a program for finding and quantitating spots on fluorescent cDNA microarrays. Dapple's spot finder exploits consistent spot morphology to improve its robustness to image artifacts and variations in spot size and placement. The finder can be trained on manually classified examples to identify poor-quality and incorrectly found spots.
SA Tuesday, Poster #51
BUNDSCHUH
A New Method in Rapid Significance Assessment of Smith-Waterman Alignments
Ralf Bundschuh, University of California, San Diego
For significance assessment of sequence alignments the score distribution of random alignments has to be known. In gapped alignment, only its shape is known. Its parameters must be determined by time consuming computations for every scoring system. We present an importance sampling technique which estimates these parameters within minutes to within 0.5%.
PF Monday, Poster #48
BURDICK
Multiple Sequence Alignment and Homology Modeling of Sulfotransferase Enzymes
Keith W. Burdick, Irwin D. Kuntz, University of California, San Francisco
Members of the sulfotransferase superfamily of enzymes catalyze the transfer of a sulfuryl group from 3'-phosphoadenosine 5'-phosphosulfate to a wide variety of substrates. The global sequence similarity between the families within sulfotransferases is low (14-20%), making it difficult to align sequences. The crystal structure of estrogen sulfotransferase was used as a template in modeling the nucleotide binding site of four sulfotransferases.
GE Tuesday, Poster #9
BUTTE
Comparing the Similarity of Time-series Gene Expression Using Signal Processing Metrics
Atul J. Butte, Childrens Hospital, Boston, MA; Ling Bao, Massachusetts Institute of Technology; Ben Y. Reis, Timothy W. Watkins, Isaac S. Kohane, Childrens Hospital, MA
Treating gene expressions as discrete time-invariant signals, we "tuned" the matrix of 2,467 yeast genes at 18 time points revealing gene-gene associations with particular phase shift and gain. We found 18 associations (two known in the literature). All are ranked poorly using conventional clustering. Signal processing can enhance clustering algorithms.
GE Tuesday, Poster #10
CAI
Classification of Cancer Tissue Types by Support Vector Machines Using Microarray Gene Expression Data
Jinsong Cai, Columbia University; Aynur Dayanik, Rutgers University; Hong Yu, Naveed Hasan, Tachio Terauchi, William Noble Grundy, Columbia University
A support vector machines classifier was used to classify cancer and normal tissues based on DNA microarray gene expression patterns with ~99% accuracy. A list of genes (100 of the total 4026 genes) whose expression profiles had the best correlations with tissue types was identified using the Fisher discriminant criterion.
PS Monday, Poster #64
CAMMER
Recognition of Tertiary Packing Motifs in Protein Structures using Delaunay Tessellation
Stephen A. Cammer, Alexander Tropsha, University of North Carolina at Chapel Hill
An approach to recognizing residue packing motifs in proteins has been developed based on Delaunay tessellation of protein structure. The proposed methodology termed Simplicial Neighborhood Analysis of Protein Packing (SNAPP) can be used to locate recurrent tertiary contacts in non-homologous structures as well as functionally relevant patterns in related proteins.
DB Monday, Poster #7
CAMPAGNE
Drivers for Mutant Databases in the RbDe Web Service: Design Principles and Implementation
Fabien Campagne, Harel Weinstein, Mount Sinai School of Medicine, New York, NY
The Residue-based Diagram editor web service (RbDe, http://transport.physbio.mssm.edu/rbde/RbDe.html) allows online creation of Residue-based diagrams of proteins. The presentation will outline the design of interfaces for the query of a mutant database and illustrate their use in the context of RbDe.
DB Monday, Poster #8
CAMPAGNE
Representation of Sequence Data: A Comparison of Prototypes
Fabien Campagne, Mount Sinai School of Medicine, New York, NY
This presentation compares four representations of sequence data (OMG lifesci/99-04-04, BioPerl, BioJava, crover) to find common design patterns and major differences. I attempt to relate the design choices that underlie these representations to the amount of interoperability they achieve.
RDNA Tuesday, Poster #38
CANNON
A Model for the Synchronization of Leading and Lagging Strand DNA Synthesis
William R. Cannon, Pacific-Northwestern National Laboratory, Richland, WA
DNA replication: the recycling mechanism of the lagging strand polymerase from the end of an Okazaki fragment to the primosome is unknown. A simple diffusion simulation model is not only consistent with available data, but the data that was used for evidence for mechanical recycling was misinterpreted.
DB Monday, Poster #9
CARACO
A Database of Recently Diverging Paralogus Genes in C. elegans
M. Daniel Caraco, University of Florida; Sridhar Govindarajan, Stephen G. Chamberlin, EraGen Biosciences Inc.; Steven A. Benner, University of Florida
We categorize all paralogous genes of C. elegans into those that arose recently, and those that were established near or before the creation of the C. elegans developmental biology plan. We then hypothesize sets of genes that are not involved in core developmental biology of C. elegans using a recently developed statistical parametric method to compute NED (Neutral Evolutionary Distance) to clock divergence.
PT Monday, Poster #81
CARDENAS-GARCIA
Genia: A System Facilitating the Search for Therapeutic Targets
Maura Cárdenas-García, Jaime Lagúnez-Otero, Instituto de Quimica UNAM Ciudad Universitaria, circuito Exterior CP Coyoacán, México
We present recent progress with Genia: an expert system, which has Ras signal transduction pathways. Ras is a key protein in the amplification of the signal induced by the growth factors. It is a oncoprotein associated with different types of cancer.
BN Sunday, Poster #2
CARRILLO
Using the Riboweb System to Compare Ribosomal Models to Experimental Data
Michelle W. Carrillo, Russ B. Altman, Stanford University
The Web-based RiboWeb system contains a knowledge base of experimental ribosomal structural data and computational tools to facilitate modeling and model evaluation. One tool compares models to knowledge base data. We applied it to five 30S subunit models and found patterns in the overall satisfaction of data by the models.
PF Monday, Poster #49
CHAN
Identifying the DNA Binding Specificity of Hoxb-3
C. N. L. Chan, L. M. Jakt, M. H. Sham, D. K. Smith, University of Hong Kong
Hox proteins are mammalian transcription factors involved in antero-posterior axial identity in embryonic development. To identify the DNA binding specificity of Hoxb-3, chick and mouse genomic DNA were fragmented and the fragments were bound by murine Hoxb-3. Chi-square test was applied to find out the consensus binding sites and TCATTAATTGGC is proposed.
ME Sunday, Poster #57
CHANG
Natural Language Processing for Remote Homology Detection
Jeffrey Chang, Russ Altman, Stanford University
Biology is an ideal field for the application of natural language processing. The enormous amount of literature being generated exceeds the capacity of humans to interpret. Thus, we are working on methods to improve automated remote homology detection of protein sequences using unstructured text information available in Medline abstracts.
PSP Sunday, Poster #78
CHEN
CELIAN: A Side-chain Modelling Program Using Structural Environment-Specific Substitution Tables and Energy Strategy
Lan Chen, Kenji Mizuguchi, Tom L. Blundell, University of Cambridge, UK
CELIAN is a program for protein side-chain prediction using structural environment-specific substitution tables and energy strategy. Based on the test-set derived from HOMSTRAD, CELIAN built side-chains on 102 structures with chi1 accuracy of 72% on structurally conserved regions, compared with 66% by SCWRL.
GA Sunday, Poster #17
CHERN
Using Homologous Gene Pairs Between Mouse and Human to Identify Gene Candidates Which Display Sequence Variation
Tzu-Ming Chern, Winston Hide, University of Western Cape, South Africa
We attempt to understand the diversity of expression forms between human and mouse genes. Using a cross-species homologous approach, it is possible to rapidly identify potential splice variants. Candidate splice clusters are visualized using CRAW reports under STACKPACK package. Integration of alternate forms between species is presented.
PS Monday, Poster #65
CHILLEMI
Molecular Dynamics Simulation Study of the Human Topoisomerase I-DNA Covalent Complex
G. Chillemi, T. Castrignano, CASPUR, Supercomputing Center for University and Research Italy; A. Resideri, University of Rome, Italy
Here we present one nanosecond molecular dynamics (MD) simulation of the covalent topoisomerase-DNA complex carried out in periodic boundary condition, using parallel architecture, starting from the crystallographic structure. Structural and dynamical information obtained from MD simulations allows us to identify the DNA-enzyme interactions important for the stability of the complex.
PS Monday, Poster #66
COOPER
Essential and Molecular Dynamics of Post-Translationally Modified Gamma-B Crystallin
L. R. Cooper, D. Corne, J. C. Crabbe, University of Reading, UK
Purpose: Simulate the structural changes that occur in Gamma-B-crystallin following post-translational modification; one of the major causes of cataract. Results. The mechanism of unfolding through simulated heating and glycation is analysed and discussed.
DB Monday, Poster #10
CORRADI
Gene Expression Database (GXD): An Integrated Resource for Mouse Gene Expression Information
John P. Corradi, Dale A. Begley, Geoffrey L. Davis, Janan T. Eppig, David P. Hill, Jim A. Kadin, Ingeborg McCright, Joel E. Richardson, Martin Ringwald, The Jackson Laboratory, Bar Harbor, ME
The major goal of GXD is to provide for the storage, integration and retrieval of primary gene expression data for the developing and adult laboratory mouse. Gene expression information is placed in a larger biological context via careful curation, the use of controlled vocabularies, and integration with the Mouse Genome Database. Future plans for GXD will be addressed.
BN Sunday, Poster #3
CRAIG
Simulating Biochemical Separations Using Protein Data Bank Files
Paul A. Craig, David Mix, Kristin Cotton, Rochester Institute of Technology
We have developed two computer simulations of separations processes, which extract their data from Protein Data Bank files. These modules are intended as instructional tools for biochemistry students: an ion exchange simulation has been designed for the Windows environment and a JAVA applet models electrophoresis on a web site (http://www.rit.edu/~pac8612/electro/E_Sim.html).
PSP Sunday, Poster #79
CUMBAA
Conjoined Hidden Markov Models for Protein Secondary Structure Prediction
Christian A. Cumbaa, University of Waterloo
Patterns in protein sequence influence secondary structure. These influences can be modeled by probability distributions. The joint structural influence of two overlapping patterns is obtained by conjoining their probability distributions. Ab initio
SA Tuesday, Poster #52
DARCY
The Topology of Recombination
Isabel K. Darcy, Stephen D. Levene, Kenneth Huffman, University of Texas at Dallas
Recombination results in the deletion/insertion or inversion of DNA sequences. When acting on circular DNA, many recombinases produce a spectrum of topologically knotted/catenated products. By solving mathematical equations determined by the topology of these products much information about the recombinase mechanism may be gained.
PSP Sunday, Poster #80
DAYANIK
Domain Parsing: Detecting Signals of Continuous Structural Domains from Protein Sequence Data
A. A. Dayanik, H. J. Yun, D. Zhang, G. Armhold, Y. Song, D. Snyder, Nevill-Manning, I. Muchnik, C. A. Kulikowski, G. T. Montelione, Rutgers University
HMMs were constructed from 1471 DDD domains and their NR sequence homologs. Detecting the structural domains from an independent testing subset of 347 protein sequences from corresponding SCOP families was most reliably achieved by combining HMM with BLAST results, yielding 94% correct predicted alignments, and 73.5% fold recognition assignments.
PF Monday, Poster #50
DE BRUIJN
Protein Name Tagging
Berry de Bruijn, Joel Martin, National Research Council, Canada
Automatic tagging of protein names is one component of our system for information extraction from biomedical articles. Our tagger combines three approaches: 1) analysis of string morphology; 2) context analysis; and 3) dictionary lookup. An evaluation compares our tagger with expert performance and a competing system.
BN Sunday, Poster #4
DE JONG
Simulation of Genetic Regulatory Systems: A Qualitative Approach
Hidde de Jong, Michel Page, Institut National de Recherche en Informatique et en Automatique (INRIA), France
A method for modeling and simulating genetic regulatory systems is presented. The method has been designed to deal with qualitative models, as quantitative information on regulatory interactions is usually missing. Experiments with a Java implementation of the method have shown that regulatory systems of currently up to 18 genes involved in complex feedback loops can be simulated.
PSP Sunday, Poster #81
DELORENZI
Prediction of Coiled-Coil Domains
Mauro Delorenzi, Terry Speed, The Walter & Eliza Hall Institute, Parkville Melbourne, Australia
The performance of Coiled-Coil predictors from primary protein sequence was analysed with curves of sensitivity versus specificity on a heterogenous collection of test sequences. At specificity levels useful for genome-wide screenings, it seems that an HMM-algorithm (Marcoil) can give a higher sensitivity than the programs Paircoil and Coils.
SA Tuesday, Poster #53
DIMMIC
New Models for Likelihood Analysis of Protein Sequences
M. W. Dimmic, J. S. Rest, D. P. Mindell, R. A. Goldstein, University of Michigan
We present several new maximum likelihood (ML) models for phylogenetic analysis which differ in the manner which they accept new mutations. A Bayesian formalism is extended to account for lack of data and/or parameters. These models can potentially yield structural or functional information about the proteins of interest as well as information on the population level.
PS Monday, Poster #67
DING
Multi-class Protein Fold Classification Using Support Vector Machines and Neural Networks
Chris H. Q. Ding, Inna Dubchak, Lawrence Berkeley National Laboratory, CA
Most current discriminative methods for protein fold prediction use the one-against-others approach, which has the well-known "False Positives" problem. We investigated two advanced approaches, the unique one-against-others and the all-against-all approaches, both improve prediction accuracy by 13-20% on a 27 folds problem. Support vector machine and Neural networks are used.
DB Monday, Poster #11
DOWELL
A Distributed Annotation System Client
Robin Dowell, Sean R. Eddy, Washington University, MO; Lincoln D. Stein, Cold Spring Harbor Laboratory, NY
The distributed annotation system (DAS) client software is a java-based browser-like application which allows a researcher to query one or more disparate annotation servers to retrieve features about a region of interest within a genome. The client displays graphical maps of the data, which returned in a standard XML format.
GA Sunday, Poster #18
ECKMAN
Extending Traditional Query-Based Integration Approaches for Gene Characterization in Genomic Data
Barbara A. Eckman, Leo A. Laroco, SmithKline Beecham Pharmaceuticals
Gene characterization in genomic sequence requires integration of multiple heterogeneous datasources and analysis techniques. We provide SQL-like query access over relational and flatfile databases, internet web sites, and results of on-the-fly analyses. Special-purpose query conditions like regular expression pattern matching on HSP alignments enable quick identification of potentially interesting results.
BN Sunday, Poster #5
ELLIS
Heuristic-Based Prediction of Specialized Metabolism
L. B. M. Ellis, J. Liu, M. Vigoroux, C. D. Hershberger, L. P. Wackett, Jeffrey D. Varner, University of Minnesota
We will predict a compound's route of biodegradation by mapping the chemical functional groups of that compound against the capabilities of organism(s) to generate enzymes that operate on these functional groups, based on information in the University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD, http://www.labmed.umn.edu/umbbd/).
DB Monday, Poster #12
ELLIS
The University of Minnesota Biocatalysis/Biodegradation Database: Predicting Biodegradative Metabolism for a Post Genomic World
Lynda B. M. Ellis, C. Douglas Hershberger, Lawrence P. Wackett, University of Minnesota
One goal of the UM-BBD (http://www.labmed.umn.edu/umbbd/) is to cover the wide range of organic functional groups that can be metabolized by microbes. We discuss the latest developments in the UM-BBD and the methodology through which we may be able to use this knowledge to predict biodegradation pathways of novel compounds.
PT Monday, Poster #82
ELLIS
Target Selection for Staph and Strep Structural Genomics
Lynda Ellis, Paul Tavernier, Edward Bryan, Doug Ohlendorf, University of Minnesota
We describe the target selection protocol for a structural genomics project (SSSG) developed around the soon-to-be completed genomes of S. aureus and S. pyogenes. Project results are available on the SSSG website, URL = http://strep.ahc.umn.edu
BN Sunday, Poster #6
EMANUELSSON
Predicting Subcellular Localization of Proteins Based on Their N-Terminal Amino Acid Sequence
Olof Emanuelsson, Stockholm University, Henrik Nielsen, Søren Brunak, The Technical University of Denmark; Gunnar von Heijne, Stockholm University
TargetP is a neural-network based predictor of protein subcellular location. It discriminates between proteins destined for the mitochondrion, the chloroplast, the secretory pathway, and "other" localizations with a success rate of 85-90%. We estimate that 10% of all Arabidopsis and Homo proteins are mitochondrial, 10% secretory, and, in Arabidopsis, 14% chloroplastic.
RDNA Tuesday, Poster #39
EVERS
Folding Space Alegras for RNA Structure Exploration
Dirk Evers, Bielefeld University, Germany
A new approach to the systematic development of dynamic programming algorithms is presented and applied to RNA folding. We demonstrate the method by deriving the recurrences for the complete suboptimal folding algorithm of Wuchty et al. and providing counting and free energy evaluation algebras, thereby extending the algorithm to correctly deal with dangling end contributions.
GA Sunday, Poster #19
FADIEL
The Influence of Oligonucleotide Frequency on Global Genome Structure and Biological Complexity
Ahmed Fadiel, Dong Qi, A. Jamie Cuticchia, The Hospital for Sick Children, Toronto, ON, CA
Complete genomes of 25 organisms belonging to three cellular life domains (Archae, Bacteria, and Eukaryota) were explored. Genomic analysis was performed using proprietary software coupled with other programs. The influence of oligonucleotide frequencies on the global genome structure is discussed and the biological significance of some patterns is also explored.
GE Tuesday, Poster #11
FINK
2HAPI, A Comprehensive, Internet-Based Microarray Data Analysis System
J. Lynn Fink, Michael Gribskov, University of California, San Diego
2HAPI is a system for computational microarray data analysis that attempts to create an integrated analytical environment that is highly accessible, fully-featured, and free to academic users. 2HAPI is designed with the notion that the user need not be a computer scientist or statistician.
DB Monday, Poster #13
FISCHER
The Lipase Engineering Database
Markus Fischer, Rolf D. Schmid, Jürgen Pleiss, University of Stuttgart
The Lipase Engineering Database (LED) is a WWW-accessible resource on sequence-structure-function relationships of microbial lipases (www.led.uni-stuttgart.de). A set of data mining and data processing tools have been developed to provide multisequence alignments of lipase families and consistently annotated, superposed X-ray structures. It has been shown to be a powerful tool for protein engineering.
PS Monday, Poster #68
FLIGELMAN
Sequence Independent Comparison of Flexible Molecules by Geometric Hashing
Zipora Y. Fligelman, Tel Aviv University, Israel, Ruth Nussinov, Tel Aviv University, IRSP-SAIC Lab of Experimental and Computational Biology; Haim J. Wolfson, Tel Aviv University
We present an efficient general scheme for structural alignment and docking of flexible molecules. The method is protein sequence order independent and allows handling of molecules with numerous pre-defined internal degrees of freedom (e.g. hinges). The technique employs the Geometric Hashing Method and graph theoretic method.
PS Monday, Poster #69
FORSTER
Application of a Computational Docking Procedure to Predict the Location of Heparin Binding Sites in Growth Factors
Mark J. Forster, Barbara Mulloy, National Institute for Biological Standards and Control, UK
A protocol has been developed for docking model heparin pentasaccharide fragments to known protein structures. Two validation cases, basic fibroblast growth factor and antithrombin, are shown. Predictive docking is reported for hepatocyte growth factor and vascular endothelial growth factor, predicted binding sites are compared with available experimental data.
ME Sunday, Poster #58
FRIDLYAND
Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data
Jane Fridlyand, University of California, Berkeley, Sandrine Dudoit, Mathematical Sciences Research Institute, Berkeley, Terence P. Speed, UC Berkeley
We compare the performance of different discrimination methods for the classification of tumors based on gene expression profiles. These methods include traditional approaches as well as machine learning techniques. The methods are applied to three recently published datasets: the leukemia (ALL/AML) dataset of Golub et al. (1999), the lymphoma dataset of Alizadeh et al. (2000), and the 60 cancer cell line (NCI 60) dataset of Ross et al. (2000).
GE Tuesday, Poster #12
FRIEDMAN
Inferring Regulatory Structure from Expression Profiling of Mutants
Nir Friedman, Dana Pe'er, Hebrew University of Jerusalem, Israel
An important experimental design towards understanding the regulatory program involves global gene expression measurements of mutations. We infer causal structure from such measurements in a Bayesian framework. We present methods that derive statistical confidence in features of the regulatory network and apply these methods to data from the Young lab of S.cerevisiae mutations.
ME Sunday, Poster #59
FRIEZE
Optimal Sequencing by Hybridization in Rounds
Alan M. Frieze, Bjarni Halldorsson, Carnegie Mellon University
We present an algorithm where we assume that SBH chips can be constructed interactively and the results of one hybridization experiment can be used to construct another SBH chip. We present experimental results as well as algorithmic analysis showing the algorithm is optimal in an information theoretical sense.
BN Sunday, Poster #7
FUJIBUCHI
A Probabilistic Boolean Approach to Find Regulatory Networks from Microarray Data
Wataru Fujibuchi, David Landsman, National Library of Medicine, Bethesda, MD, USA
We introduce a probabilistic approach for finding realistic biological regulatory circuits that represent stable Boolean networks from gene expression patterns. This method is applied to the yeast glucose- response and cell-cycle systems. Statistical evaluations show that the reproducibility of known regulatory networks is more significant than randomly drawn networks.
ME Sunday, Poster #60
FUKUDA
A Workbench for Refining and Annotating Biological Binary Relation Data
Ken-ichiro Fukuda, Toshihisa Takagi, University of Tokyo
A graphical interface system represents protein linkage maps as graphs will be described. Since each attribute of proteins can have hierarchically structured values and multiple values, the system holds the information about each protein in DAGs to draw a graph that represents the underlying structure effectively.
GA Sunday, Poster #20
GAASTERLAND
Identifying Small ORFs in Microbial Genomes
Terry Gaasterland, Alexander Sczyrba, The Rockefeller University, NY
Proteins shorter than 100 aa remain difficult to identify computationally in genomic sequences. Statistical analysis of of codon usage and position bias in small open reading frames is unreliable. We rank small ORFs by conservation across genomes and nominate them for further experimentation.
SA Tuesday, Poster #54
GALBRAITH
Searching for Coding Regions in Neurospora crassa Using a Simple Codon Bias Algorithm and Consensus Sequences
Judith Galbraith, University of New Mexico, Albuquerque High Performance Computing Center; Don Natvig, Mary Anne Nelson, Laura Salter, University of New Mexico
To locate coding regions in sequences with no similarity to known genes, characteristics distinctive to Neurospora are examined. The exaggerated difference between the counts of cytosine and adenosine residues in the third position of codons is measured using a log ratio. Consensus sequences are used to create a table of potential exons with P-values. The results are available using a web interface.
SA Tuesday, Poster #55
GAMIELDIEN
Identifying Genes Upon Which Positive Selection may Operate: A Promising Means of Identifying Novel Virulence Genes in Bacterial Pathogens
Junaid Gamieldien, Winston Hide; South African National Bioinformatics Institute
We have performed an intraspecies search for genes on which positive selection may operate between pairs of strains of Helicobacter pylori, Chlamydia trachomatis and Neisseria meningitidis, to ascertain whether new virulence genes may be identified in this way. 34 previously described virulence factors are demonstrated to be under positive selection.
GA Sunday, Poster #21
GARG
Identifying Candidate Coding Region Single Nucleotide Polymorphisms (cSNPs) and Alternative Splice Variants in Human Genome Using Assembled Expressed Sequence Tags (ESTs
Kavita Garg, Deborah Nickerson and Phil Green, University of Washington, WA
Using assembled ESTs from 50 different cDNA libraries, we have identified contigs that represent the complete coding sequences of 850 known human genes, and have scanned them for high quality substitutions. We report the analysis and characteristics of candidate cSNPs found in coding regions of 165 of these genes.
PSP Sunday, Poster #82
GEORGE
Domain Prediction from Sequence Information Alone
Richard Anthony George, Jaap Heringa, National Institute for Medical Research, London, UK
The dissection of a protein into its structural units is essential in preparation for structure determination by NMR. Using individual domains to search a database for related sequences is often more successful than using the whole protein sequence. The work described here predicts structural domains from sequence by using several methods developed in our lab.
RDNA Tuesday, Poster #40
GERLAND
Dynamics of DNA Slippage: Driven Bulge Loop Diffusion
Ulrich Gerland, Terence Hwa, University of California, San Diego, USA
We investigate the dynamics of bulge loop creation and diffusion in double-stranded DNA theoretically, and propose to probe it experimentally by pulling the two strands apart at opposite ends. This mechanism is thought to be responsible for the occurence of indels during the replication of repeat regions in the genome.
GA Sunday, Poster #22
GLASSCOCK
Eugene: A Non-redundant View of the Human Transcriptome
Jarret Glasscock, Warren Gish, Washington University, MO
Eugene makes it easier to get transcript data for a given region, provides a non-redundant representation of the transcript data, provides high quality (genomic) sequence, and alleviates problems associated with chimeras and paralogues. Eugene accomplishes this is by stringent clustering of the transcript data with genomic data and subsequent extraction of the genomic segment.
EV Tuesday, Poster #1
GOH
Co-evolution of Proteins with their Interaction Partners
Chern-Sing Goh, Andrew A. Bogan, Marcin J. Joachimiak, Dirk Walther, and Fred E. Cohen, University of California, San Francisco
The divergent evolution of proteins requires ligands and their receptors to co-evolve, creating new pathways when a new receptor is activated by a new ligand. Based on this concept, we have developed a method for measuring the co-evolution of interaction proteins and applied this analysis to chemokines and their receptors.
GA Sunday, Poster #23
GOLLERY
Analytical Methods for Genome to Genome Comparisons
Martin Gollery, David Rector, Al Shpuntof, Jim Lindelien, Time Logic Corporation
TeraBLAST is a hardware-accelerated implementation of BLAST that solves large homology search problems in a drastically shorter timeframe, yet provides a higher sensitivity than software BLAST. In this presentation we will present results of comparisons of over two dozen completed genomes. Output files will be available on CD-ROM for interested researchers.
ME Sunday, Poster #61
GONCALVES
Development of a Modular Gene Index for the Rational Design of cDNA Microarray Probes
Jason Goncalves, James R. Woodgett, A. Jamie Cuticchia, University of Toronto, CA
We describe a novel method to organize nucleotide sequence information for splice isoform discovery and the rational design of cDNA microarrays. Modular clustering is a novel method to organize nucleotide sequence information. Based on this work PCR primers will be generated to amplify microarray probes to unique splice variants of unique genes.
DB Monday, Poster #14
GOPALAKRISHNAN
A Framework for Evaluating Global Strategies for Paallel Experiment Design under Varying Resource Constraints
Vanathi Gopalakrishnan, University of Pittsburgh, PA
In this research, a Parallel Experiment Planning (PEP) framework is developed that: (1) provides a computational representation and set of tools to manage information about parallel experiments (or trials), and (2) can provide intelligent assistance for decision-making by suggesting likely places in search space for new trials and portions of space that are unlikely to yield results so that they could be closed.
RDNA Tuesday, Poster #41
GORODKIN
Automated Structural Alignment of a Hairpin Region in Archaeal SSU rRNA
J. Gorodkin, University of Aarhus, Denmark; S. L. Stricklin, G. D. Stormo, Washington University Medical School, MO
The FOLDALIGN method presented here uses a combination of structure and sequence information to automatically produce alignments with approximately the same score as those published in the SSU rRNA database. We also show that FOLDALIGN complements stochastic context-free grammar (SCFG)-based detection of covariance, given sequences which do not align well globally.
GE Tuesday, Poster #13
GORYACHEV
Unfolding Expression Data from cDNA Microarrays
Andrew Goryachev, Princess Margaret Hospital, Toronto, CA; Pascale Macgregor, Aled Edwards, University of Toronto, CA
We present a mathematical model describing relationship of measured fluorescent intensities and actual mRNA concentrations. Employing methods of robust statistics we developed algorithms allowing one to estimate parameters of the model and apply unfolding transformation extracting actual ratios from raw data. We also discuss methods for measurement of ratio noise and reproducibility of experiments.
SA Tuesday, Poster #56
GRABER
Computational Characterization of mRNA Localization Control Sequences in 3'-untraslated-sequences
Joel H. Graber, Charles R. Cantor, Martin Frith, Jahnavi C. Prasad, James O. Deshler, Boston University
We are searching for control sequences responsible for the subcellular localization of mRNA transcripts, specifically mRNAs such as Vg1 in X. laevis, which localizes to the vegetal half of the developing oocyte through protein interactions with a series of short repeated sequence elements in its 3'UTR.
SA Tuesday, Poster #57
GRABER
Computational Characterization of mRNA 3'-end-processing Control Sequences
Joel H. Graber, Charles R. Cantor, Scott C. Mohr, Temple F. Smith, Boston University
We have computationally investigated 3'-end-processing (cleavage and polyadenylation) control sequences through analysis of EST sequences from several different organisms. The control sequences consist of multiple, short elements, where the individual elements can vary widely from a consensus sequence and yet remain functional as part of the whole.
ME Sunday, Poster #62
GRAS
New Learning Method to Improving Protein Identification from Peptide Mass Fingerprinting
Robin Gras, Elisabeth Gasteiger, Swiss Institute of Bioinformatics, Geneva Switzerland; Bastien Chopard, Department of Computer Science, Geneva, Switzerland; Markus Müller, Ron D. Appel, Swiss Institute of Bioinformatics, Geneva, Switzerland
We developped an algorithm to identify proteins by peptide mass fingerprinting. The masses and environmental data are used to search in sequence database. We use a score providing a ranking according to the quality of the match of environmental parameters. We compute the weights of parameters in the score using a genetic algorithm. We improve our method allowing to classify the learning set during the learning phase in order to find the set of parameters weights that are optimized for each family of proteins.
PSP Sunday, Poster #83
GRIGORIEV
Protein Fold Recognition by Combining Evolutionary, Structural, and Proximity Information
Igor V. Grigoriev, Chao Zhang, Sung-Hou Kim, Lawrence Berkeley National Laboratory, CA
We propose a new method for detecting remote homologues on basis of sequence derived properties. Instead of traditional comparison between single residues the local segments of protein sequence are compared. The method is shown to substantially enhance the sensitivity of the conventional sequence alignment methods, and applied to complete genomes.
DB Monday, Poster #15
GROMIHA
Development of Protein Thermodynamic Database and its Application for predicting the Stability of Protein Mutants
M. Michael Gromiha, Jianghong An, Motohisa Ootulake, Hiditoshi Kono, Hatsuko Vedairo, Akinore Sarai, RIKEN Tsukuba Institute, Japan; Motohisa Oobatake, Meijo University
We developed a "thermodynamic database for proteins and mutants (ProTherm)" containing important thermodynamic parameters, experimental details, structural, functional and literature information [http://www.rtc.riken.go.jp/jouhou/Protherm/protherm.html]. Hydrophobicity is the major factor for the stability of buried mutants whereas partially buried coil mutations are mainly influenced by entropy.
SA Tuesday, Poster #58
GROTE
A Method for Modeling Promoter Structures in a Non-Heuristic Manner Using a Modified Self-Organizing Map Algorithm
Korbinian Grote, GSF-National Research Center for Environment and Health, Germany; Wilfried Brauer, Technische Universität München, Germany; Thomas Werner, GSF-National Research Center for Environment and Health
We present a new method based on a combination of different self-organizing map algorithms, that is able to derive highly specific formal models of promoter structures consisting of an ordered combination of transcription factor binding sites. Apart from a set of functionally related promoter sequences no additional knowledge is required.
PSP Sunday, Poster #84
GUDA
Multiple Protein Structure Alignment Using Monte Carlo Optimization
Chittibabu Guda, Philip E. Bourne, Ilya N. Shindyalov, University of California, San Diego
We have developed a new algorithm for alignment of multiple protein structures based on Monte Carlo simulation technique. Scoring function is based on inter protein distances calculated for aligned and superimposed residues with penalties for gaps. The algorithm improves alignment for the majority of protein families when starting from pair-wise structural alignments.
SA Tuesday, Poster #59
GUDA
Sequence Data Analysis of Voltage-gated Ion Channel Proteins
Purnima Guda, Boojala V. B. Reddy, Mauricio Montal, Philip E. Bourne, University of California, San Diego
Voltage-gated ion channel (VGC) proteins mediate the selective diffusion of K+, Na+ or Ca2+ across cell membranes. We present an analysis of these multiple sequences to identify conserved and semi-conserved residues in the VGC family of sequences. The poster will detail our efforts to define a sequence based pattern / profile for proteins of the VGC family and its usefulness in modeling these proteins.
SA Tuesday, Poster 60
GUERMEUR
Combining Protein Secondary Structure Prediction Methods with a New Multi-Category SVM
Yann Guermeur, LORIA; Dominique Zelus, LBMC, France
Vapnik's learning theory has given birth to an inference paradigm, implemented in the Support Vector Machines (SVMs). The theory grounding these machines was developed for two-class discriminant analysis. Building upon a new uniform convergence result, we propose a theoretical foundation for multi-category SVMs. From this framework, original models are derived, which are used to combine protein secondary structure prediction methods.
ME Sunday, Poster #63
GUO
Improving Base Calling Accuracy by Peak Space Equalization
Hong Guo, Mark Welsh, Steve Gold, CuraGen Corporation, CT
DOLPHIN is a trace processor that manipulates raw electrophoretic traces for PHRED basecalling. The peak density for a trace follows an exponential decay, yet PHRED requires evenly spaced peaks to perform basecalling. An algorithm has been developed to equalizes the peak spacing. It is shown to improve PHRED scores dramatically.
GA Sunday, Poster #24
GUPTA
Proteome-wide Prediction of Glycosylation Sites
Ramneek Gupta, Technical University of Denmark; Eva Jung, Swiss Institute of Bioinformatics, Switzerland; Jan Hansen, Søren Brunak, Technical University of Denmark
Glycosylation is an important post-translational modification which influences protein function. Two major types of protein glycosylation are N-linked (affecting Asn residues) and O-linked (Ser/Thr). No discriminative acceptor consensus sequence exists for either type. We train artificial neural networks to identify glycosylated sites, and observe their spread across different classes of proteins in a cell.
SA Tuesday, Poster #61
HAGHIGHI
A Novel Gene Finding System
Fatemeh Haghighi, Columbia University; Mark Diekhans, David Haussler, University of California, Santa Cruz; William Noble Grundy, Columbia University
Accurate recognition of genes and gene components is central in annotation of data from the genome sequencing projects. We present a new gene-finding system that is designed to be scaleable and flexible with respect to the gene features it models, the machine learning algorithms it employs, and the range of experimental data from which it learns.
ME Sunday, Poster #64
HALASKA
Mel4D: A Web-based Application for the Visualization and Animation to Proteins
Jakob Halaska, Stockholm University, Sweden
Mol4D is a www-based system that provides a shortcut to generate molecular visualizations and animations of proteins, ranging from static to simulated structures, and simultaneously connecting many levels of information into a single multidimensional visualization. The poster describes this new graphical approach of spatiovisual mapping - how biological information originating from multiple sources can be successfully visualized, correlated and presented in a 3D or 4D environment (see: http://www.biokemi.su.se/Mol4D).
DB Monday, Poster #16
HANCOCK
maxd - A Data Warehouse, Analysis, and Visualisation Environment for Expression Data
David J Hancock, Norman Morrisson, Magnus Rattray, Andy Brass, Michael J. Cornell, Unversity of Manchester
'maxd' is a warehousing and analysis environment specifically for expression data. The database, based on the EBI's ArrayExpress+ model and decribed in ANSI SQL92 for portablilty, can store data from a variety of sources, including cDNA and olionucleotide based microarrays. A complementary suite of modular, open-source JAVA tools for data storage, retrieval, analysis and visualisation is described.
GA Sunday, Poster #25
HANSEN
Interrogating and Visualizing Annotated Whole Genome Databases
David P. Hansen, Guenther Kurapkat, Thure Etzold, , LION Bioscience, Heidelberg, Germany
Using LION's bioSCOUT, we have annotated a number of whole Microbial Genomes. The unique data integration capabilities of SRS is then used to compare the annotations of different genomes. SRS enables very rapid queries to, for example, find all of the members of a specific protein family in H. pylori that have an ortholog in E. coli and Haemopholis and show their respective localisation on the genome.
ME Sunday, Poster #65
HATZIGEORGIOU
An Improved Method for Prediction of Translation Initiation Sites (TIS) in Human cDNA's and EST's
Artemis G. Hatzigeorgiou, Synaptic Ltd., Greece
The presented method is based on statistics and artificial neural networks. It consists of two modules: one sensitive to the conserved motif before the TIS and one sensitive to the coding/non-coding potential around the TIS. These predictions are integrated in an algorithm that simulates, in a simplified form, the ribosome scanning model. It leads to 95% prediction of TIS on cDNA's and more than 50% on EST's.
SA Tuesday, Poster #62
HAYES
AcE: A System for Analyzing the Accuracy of Gene Prediction Programs
William S. Hayes, Smith Kline-Beecham Pharmaceuticals
AcE, an Accuracy Evaluation tool for eukaryotic gene prediction, is displayed along with results from several test sets versus several eukaryotic gene prediction tools. Ease of use and flexibility were primary design considerations.
GE Tuesday, Poster #14
HERWIG
Comparison of Statistical Tests in the Analysis of differential Expression in Hybridization-based Experiments
Ralf, Herwig, Pia Aanstad, Matthew Clark, Hans Lehrach, Max-Planck Institut für Molekulare Genetik, Berlin, Germany
We compare statistical tests in their performance on detecting differential expression on cDNA arrays. Evaluation of experimental and simulated data indicates that the analysis via repeated hybridization experiments followed by statistical testing is an accurate and sensitive way to identify even small expression changes (1:1.5) on a large scale.
DB Monday, Poster #17
HEYMANN
Functional Gene Networks - a Case Study of Novel Data Management
S. Heymann, Peter Rieger, Kelman Gesellschaft für Geninformation mbH, Germany
Kelman's high-end solution of bioinfomatics and functional genome research ensures new levels of data consistency and exploitation. The resulting Gene Network provides an indepth understanding of gene interplay, involving gene products in all their molecular versions. This is exemplified here by means of a case study for hereditary disease research.
GE Tuesday, Poster #15
HODAR
A Practical Filtering Procedure of Gene Expression Data for Clustering Analysis
Christian Hödar, Verónica Cambiazo, Mauricio González, INTA University of Chile; Chris Vulpe, University of California, Berkeley
We measured and filtered the expression values of each gene in a mouse cDNA microarray to diminish their initial number, and to compare their relative expression levels between two experimental conditions. We found 86 genes that changed in one condition, 80% of them remained associated within five main clusters.
SA Tuesday, Poster #63
HOLMES
Telegraph: A New Dynamic Programming Template Library
Ian Holmes, University of California, Berkeley; Guy St. C. Slater, Ewan Birney, Wellcome Trust Genome Campus, Cambridge, UK; Gerald M. Rubin, University of California, Berkeley
Many algorithms in bioinformatics rely on dynamic programming for the alignment of sequences to finite state machines of various architectures. Telegraph is a new library enabling exploitation of this paradigm using an implementation that is object-oriented, scriptable, modular, portable and probabilistic. This poster illustrates Telegraph with a number of examples.
GA Sunday, Poster #26
HOLMES
A Proteomic System for Tracking and Identifying Yeast Mitochondrial Proteins
Mark Holmes, Melissa Kimball, Raymond Gesteland, Michael Giddings, University of Utah, USA
Two database services for protein tracking and identification are described. Investigators enter data using assisted forms which automatically link to prior work. Subsequent protein identification can be done using common fields matched against public databases. Putative post-translational modifications are also predicted, using a heuristically-bounded, depth-first search; fuzzy logic evaluates for relative accuracy, depth and known frequency of candidate modifications.
DB Monday, Poster #18
HORN
Collecting and Harvesting Biological Data: The NucleaRDB
Florence Horn and Fred H. Cohen, University of California, San Francisco
We have set up a database for nuclear hormone receptors, the NucleaRDB. It already holds sequence information for 500 receptors. Our main aim is to capture and provide heterogeneous experimental data, such as ligand binding constants, mutation and expression data. This data will be automatically extracted from electronically available literature.
GE Tueday, Poster #16
HOSHIMOTO
General Toolkit of the E-Cell System for Modeling Gene Expression System
Kenta Hoshimoto, Sae Seno, Fumihiko Miyoshi, Masaru Tomita, Keio University, Japan
We present a generic model of gene expression and regulation for the E-CELL system, a general purpose cell simulator. Using this generic model, we have simulated: (1) regulation of the lac operon in E. coli, and (2) the lytic-lysogenic switch network in bacteriophage lambda.
SA Tuesday, Poster #64
HUANG
Assessing Sequence Comparison Methods Using A Pfam Annotated Database
Ming-qian Huang, William R. Pearson, University of Virginia
We have developed a new reference database for evaluating protein and DNA search methods. FASTA is better than BLASTN for DNA sequence comparison and the FASTX/TFASTX programs are better than BLASTX/TBLASTN when frameshifts are present. The FASTA programs also provide reliable statistical estimates for protein and DNA sequence searches.
GA Sunday, Poster #27
HUBER
Data Analysis for Large-scale Genome wide Transcription Profiling
Wolfgang Huber, Anja von Heydebreck , German Cancer Research Center; Judith Boer, Leiden University Medical Center, The Netherlands; Friederike Wilmer, Holger Sültmann, Martin Vingron, Annemarie Poustka, German Cancer Research Center
We investigate 140 complex hybridisations from 38 paired kidney tumor and normal samples on nylon membranes spotted with a non-redundant human library of 32,000 clones. A non-parametric statistic is used to identify differentially expressed genes, giving quantifiable errors of first and second kind. The correlation structure of a hierarchical model of variation sources is calculated.
SA Tuesday, Poster #65
HUSMEIER
Detection of Recombination in DNA Multiple Alignments with Hidden Markov Models
Dirk Husmeier, Frank Wright, Biomathematics and Statistics Scotland (BioSS) SCRI, UK
A hidden Markov model (HMM) is employed to detect recombination events in multiple alignments of DNA sequences. The emission probabilities in a given state are determined by the branching order (topology) and the branch lengths of the respective phylogenetic tree, while the transition probabilities depend on the global frequency of recombination.
SA Tuesday, Poster #66
IRIE
An Estimate of Statistics of Alternative Human Gene Transcripts by Assembling Partial cDNA Sequences
Ryotaro Irie, Yasuhiko Masuho, Keiichi Nagai, Helix Research Institute, Inc., Japan
We developed a DNA sequence assembly program (MakeAllContigs) that assembles partial cDNA sequences to create all contigs (sets of consistently aligned partial sequences) each of which could correspond to a transcript. MakeAllContigs was applied to UniGene clusters to estimate the statistics on alternative gene transcripts from human chromosome 22.
GA Sunday, Poster #28
ISOKPEHI
Analysis of Helicobacter pylori Genome for Codon Usage and Base Composition in H. pylori-specific Genes of Unknown Function
Raphael Isokpehi, A. B. Sofoluwe, A. O. Coker, University of Lagos, Nigeria
Codon usage and base composition was determined for 206 Helicobacter pylori-specific genes of unknown function. Bias in codon usage and statistically significant deviation in base composition of some of the genes compared to the average H. pylori gene was observed. Understanding the function of these genes may be useful in drug discovery.
PSP Sunday, Poster #85
ISRAELOWITZ
Computational Model of Collagen Type I Protein
Meir Israelowitz, Phil Campbell, Lauren Ernst, Syed W. Hussain, Carnegie Mellon University; Isabella Verdinelli , Carnegie Mellon University, University of Rome; Troy Wymore, Pittsburgh Supercomputer Center; Daniel L. Farkas, Carnegie Mellon University, University of Pittsburgh
Type I collagen is a long-sequence fibrous biopolymer. Our modeling reduced the conformation space by using random packing. Then we used Monte-Carlo methods to derive a three-strand structure that represents a thermodynamic minimum. To address the limitations arising from the long sequence, we have developed a models concept of braids derived topology.
PSP Sunday, Poster #86
JACOBONI
Predicting the Structure of Membrane Porins
Irene Jacoboni, P. L. Martelli, P. Fariselli, R. Casadio, University of Bologna, Italy
We describe a neural network based predictor to locate putative beta strands adopting the TM beta barrel structure, starting from the protein sequence. The predictor is trained and cross validated using porins from the PDB data base. Network outputs are then filtered with a Hidden Markov Model procedure.
SA Tuesday, Poster #67
JAGANNATHAN
Motif Mining Reveals More Links in Plant Stress Responses
Vidhya Jagannathan, Avestha Gengraine Technologies, India; S.Krishnaswamy, Madurai Kamaraj University,.India; Jason Stewart, Villoo Morawala-Patell, Avestha Gengraine Technologies
An analysis was conducted to discover potential regions of homology between twelve abiotic and biotic stress-related genes and resulted in 3 non-overlapping motifs. A Position Specific Scoring Matrix and a Hidden Markov Model were generated and a total of 113 new stress related genes were identified.
RDNA Tuesday, Poster #42
JAUREGUI
DNA Curvature in Whole Genomes
Ruy Jáuregui (Sandoval), Enrique Merino, Instituto de Biotechnologia-UNAM
The advent of new DNA sequencing technologies opens the possibility to make broad spectrum analysis of DNA curvature profiles in complete genomes. The non fortuitous nature of DNA curvature and its relationship with codon usage and aminoacid composition of proteomes is here studied.
PT Monday, Poster #83
JOHNSON
A Computational Screen for Novel Targets of Sno-Like RNA's in Pyrococcus
Steven Johnson, Sean R. Eddy, Washington University, MO
C/D box small nucleolar RNAs (snoRNAs) contain antisense, or guide, sequences that direct the methylation of rRNA. Several "orphan" snoRNAs have been identified that possess poor complementarity to rRNA. These snoRNAs may be modifying novel targets. We are performing a computational screen for candidate novel snoRNA targets in the Archaea.
GA Sunday, Poster #29
JONES
Sister Chromatid Cohesion: Phylogenies, Motifs, and Interaction Networks
Susan Jones, John Sgouros, Computational Genome Analysis Laboratory, London, England
Cohesin links sister chromatids on the mitotic spindle. In budding yeast cohesin comprises Smc1p, Smc3p, Scc1p and Scc3p. We have identified new homologues of SMC proteins and created a phylogenetic tree. Using proteomic databases we have created a cohesion interaction network and identified sequence motifs common to pairs of proteins within the network.
DB Monday, Poster #19
KADIN
Structured Vocabularies in Mouse Genome Informatics
J. A. Kadin, J. A. Blake, J. E. Richardson, M. Ringwald, C. J. Bult, J. T. Eppig, and the Mouse Genome Informatics Group, The Jackson Laboratory, Bar Harbor, ME
MGI is involved in the development of several large structured vocabularies and is using these to annotate mouse genes and expression results. These vocabularies include the Anatomical Dictionary of Mouse Development, representing anatomical structures, and the Gene Ontology (GO), describing molecular functions, biological processes, and cellular locations of gene products.
GE Tuesday, Poster #17
KADOTA
Efficient Data Processing Method for Large-Scale cDNA Microarray Analysis
Koji Kadota, Yasushi Okazaki, Hidemasa Bono, Rika Miki, Kentaro Shimizu, Yoshihide Hayashizaki, RIKEN Tsukuba Institute, Japan
It is important to obtain a reproducible data set when analyzing the expression profile using cDNA microarray. We have developed an efficient data processing method from duplicated experimental results. We applied this method to the tissue expression profiling data and will present the feasibility and importance of our method.
PF Monday, Poster #51
KALLBERG
Classification of the Short-chain Dehydrogenase/reductase Family
Yvonne Kallberg, Johan Nilsson, Bengt Persson, Karolinska Institutet, Sweden
We have classified the short-chain dehydrogenase/reductase (SDR) family of proteins by applying and comparing methods for protein homology search and subfamily classification. The low residue identity between some of the protein sequences (15-30%) and the large number of family members (~1500) provides a true challenge for the techniques investigated.
SA Tuesday, Poster #68
KAN
UTR Reconstruction and Analysis Using Genomically Aligned EST Sequences
Zhengyan Kan, Washington University; Warren Gish, Washington University School of Medicine; Eric Rouchka, Washington University; Jarret Glasscock, Washington University School of Medicine; David States, Washington University
We have developed a computational method to detect poly-A sites in human genomic sequences and to infer UTR sequences using genomically aligned ESTs. The accuracy of the method is evaluated by reconstructing functionally cloned transcript sequences. Using the method to analyze 908 genic regions, we estimate that 40-50% of human genes undergo alternative polyadenylation.
SA Tuesday, Poster #69
KANN
OPTIMA: A New Score Function for Distantly Related Protein Sequence Comparison
Maricel Kann, Bin Qian, Richard Goldstein, University of Michigan
We describe a new method of determining the score function by optimizing the ability to discriminate between homologs and non-homologs. This new score function out-performs currently available score functions at identifying both distant and close homologies.
SA Tuesday, Poster #70
KARPLUS
The SAM-T99 Protein-search Method Works Well as a Multiple Aligner
Kevin Karplus, Birong Hu, University of California, Santa Cruz
We evaluated the SAM-T99 method as a multiple aligner, using the BAliBase multiple-alignment test suite. Using SAM-T99 -tuneup option, then building an HMM to align all sequences, SAM-T99(tuneup) seems comparable to other multiple aligners such as Clustal and PRPP (much better on reference 2, slightly worse on reference 1v1, comparable on the others).
SA Tuesday, Poster #71
KAWAJI
A Method to Detect Conserved Domains in Mouse full-length cDNA Data
Hideya Kawaji, Osaka University, Japan, NTT Software Corporation; Hideo Matsuda, Osaka University; Shinji Kondo, RIKEN Genomic Sciences Center, Japan; Jun Kawai, Yoshihide Hayashizakik RIKEN Genomic Sciences Center, , RIKEN Tsukuba Institute, Japan; Akihiro Hashimoto, Osaka University, Japan
We present a method for detecting conserved domains between cDNA sequences. The method explores a set of fixed-length and ungapped subsequences that exhibit similarity to each other by using the maximum-density subgraph algorithm. The results obtained by applying it to mouse full-length cDNA sequences are also presented.
SA Tuesday, Poster #72
KELSO
Identification of Alternatively Spliced Candidates Genes Present in Cancer-specific cDNA Libraries
Janet Kelso, Winston Hide, University of the Western Cape, South Africa
A novel transcript clustering and viewing system to mine cancer-specific EST libraries for aberrantly expressed genes is presented. Analysis of 13 881 ESTs provides in excess of 20 aberrantly processed candidate genes with between 2 and 7 alternative consensus sequences. Results demonstrate tissue and neoplastic state specificity of aberrant expression forms.
DB Monday, Poster #20
KERSHAW
A Protein Localization Knowledge Base Populated by Text-Extracted Assertions
Kiarri Kershaw, Toby Goldstein, Francisco Pereira, Chris Hauser, Mark Craven, Robert Murphy, Carnegie Mellon University
We have developed a protein localization knowledge base that describes more than 50 subcellular structures and nearly 700 relations among them. Completeness has been confirmed by analyzing the subcellular location field in SWISS-PROT. We are presently populating our knowledge base with instances of protein-location relations extracted automatically from the literature.
SA Tuesday, Poster #73
KIM
A Computational Approach to Sequence Assembly Validation
Sun Kim, Li Liao, Michael P. Perry, Shiping Zhang, Jean-Francois Tomb, DuPont Experimental Station
Correct sequence assembly is critical to the success of large scale sequencing projects. We propose a computational approach to sequence assembly validation. Among the several analysis techniques we developed, the "good-minus-bad clone analysis" approach correctly identified all misassembled regions in the assembly of the Mycoplasma genitalium genome.
PF Monday, Poster #52
KIRYUTIN
Geometric Interpretation of Homologous Protein Families
Boris Kiryutin, Roman L. Tatusov, National Center for Biotechnology Information
Interpreting the similarity scores as scalar products allows the representation of biological sequences as vectors in multidimensional space. The eigenvectors technique produces the coordinates of these vectors in the most characteristic low dimensional subspace. The eigenvector approach was applied to Clusters of Orthologous Groups (COGS) as well as to complete genomes.
GA Sunday, Poster #30
KLEIN
Identification of Putative Non-coding RNAs in Hyperthermophilic Archaea Genomes
Robert J. Klein and Sean R. Eddy, Washington University, MO
We are using a bias in G+C content as the basis for a computational screen for novel structural RNAs in sequenced, AT-rich, hyperthermophile genomes. We have shown that the screen identifies most known and several putative novel noncoding RNA loci. We are experimentally testing whether these loci are expressed.
DB Monday, Poster #21
KLOSKA
AraXDb: The Arabidopsis thaliana Expresson Database
Sebastian Kloska, Max-Planck-Institut of Molecular Plant Physiology; André Flöter, University of Potsdam; Bernd Essigmann, Thomas Altmann, Max-Planck-Institut of Molecular Plant Physiology; Torsten Schaub, University of Potsdam
Microarray technologies are promising tools for investigating the molecular physiological status of organisms as a whole. It is clear that in order to manage the massive datasets generated using these approaches, standard data processing systems must give way to specialized programs. The implementation of a work flow system for the storage and analysis of large scale expression profiling data is reported.
ME Sunday, Poster #66
KOCH
Application of Parallel Techniques for Likelihood Estimation in Linkage Analysis
Ina Koch, Klaus Rohde, Jens G. Reich, Max-Delbrueck-Center for Molecular Medicine, Germany
We implemented two parallel versions of the adaptive simulated annealing algorithm - ASA. For problems with sufficiently complicate optimization functions a significant improvement of the runtime results in the parallelized versions can be shown.
DB Monday, Poster #22
KOH
SLAD, a Model Data Warehouse in Molecular Biology
Judice L. Y. Koh, Christian Schönbach, Vladimir Brusic, National University of Singapore
SLAD is a small database of swine leukocyte antigen (SLA) genes. The multi-dimensional data model of SLAD allows for a) quick and easy annotation of data, b) combination of qualitative, quantitative and descriptive data types and c) ease of adding new analyses. SLAD has demonstrated that data warehousing can provide the means for efficient analysis and data mining in molecular biology.
DB Monday, Poster #23
KOKINIS
ArrayBankTM, a Community Microarray Database and Knowledgebase with Integrated Analysis Tools
John Kokinis, David Jones, Gary Lindstrom, Ron Lundstrom, University of Utah, Salt Lake City
Our project sets out to combine a relational database of mRNA expression data, sample descriptive data (pathology, tissue type, drug, concentation, etc.), and putative functional information to form a knowledgebase of gene expression profiles and an open-ended set of distributed and easy-to-use analysis tools that provide the ability to effectively discover and compare patterns of gene expression between a priori unrelated data.
GE Tuesday, Poster #18
KONU
A Comparative Analysis of Gene Expression Profiles in Yeast
Özlen Konu (Grantham), Ming D. Li, University of Tennessee, Memphis
Comparative studies are essential for determining the degree to which gene activity is compartmentalized in response a particular treatment. Using publicly available yeast microarray data, we identified different sets of genes that were solely responsive to either heat- or cold-shock. Functional profiles of these heat- and cold-shock specific gene sets also were compared.
PF Monday, Poster #53
KOSLOFF
Mechanism and Benefits of Catalytic Inefficiency in G-proteins
Mickey Kosloff, The Hebrew University, Israel
The G-protein family act as molecular switches in many signalling cascades. They are turned "on" by exchange of bound GDP to GTP and turned "off" by the intrinsic GTPase reaction. We analyze this inefficient catalysis using available crystal structures combined with functional analysis. We propose a model for the rate-limiting step of GTPase and discuss its implications for catalysis and biological function.
PF Monday, Poster #54
KRIEL
Protein Set Analysis with SRS
David P. Kriel, T. M. Etzold, Wellcome Trust Genome Campus, Cambridge UK
We present selected features of a survey for homopolymeric peptides (amino acid `runs') in SRS-extracted protein sets. For example, a puzzling lack of asparagine runs longer than five residues in vertebrate proteins was discovered. This is in marked contrast to the many long glutamine runs, considering the high similarity of the residues.
SA Tuesday, Poster #74
KRIVAN
Modeling of Liver-Specific Transcriptional Regulatory Regions
William Krivan, Wyeth Wasserman, Karolinska Institutet, Stockholm, Sweden
We present a model for the identification and analysis of transcriptional regulatory regions in promoters of genes with liver-specific expression. >From a collection of experimentally determined regulatory regions, taken from the literature, we generated profiles for the binding specificity of each transcription factor. We use logistic regression to characterize the interaction between transcription factors bound to distinct elements within regulatory regions.
ME Sunday, Poster #67
LANGMEAD
Time-frequency Analysis of Protein NMR Data
Christopher James Langmead, Bruce Randall Donald, Dartmouth College
Time-frequency analysis of NMR data exposes behavior orthogonal to the magnetic coherence transfer pathways, thus affording new avenues of NMR discovery. In particular, we demonstrate the heretofore unknown presence of inter-atomic distance information within ^{15}N-edited heteronuclear single-quantum coherence (^{15}N HSQC) data.
GE Tuesday, Poster #19
LAPP
Expression Profiling on cDNA Arrays: A Robust Method for Resolving Hybridisation Intensities into Background and Positives
Hilmar Lapp, Marion Weissmann, Gudrun Werner, Novartis Research Institute Vienna
For the analysis of cDNA array hybridisations the intensity values are assumed to reflect the abundance level of a transcript in the sample, which is only valid for signals significantly different from noise. We devised a method that robustly resolves the population of observed intensities into two populations, namely background and positives.
PF Monday, Poster #55
LAWRENCE
Maximum Likelihood Methods Reveal Conservation of Function among Closely Related Kinesin Families
Carolyn J. Lawrence, Russell L. Malmberg, University of Georgia; Michael G. Muszynski, Pioneer Hi-Bred International; R. Kelly Dawe, University of Georgia
We used progressive and HMM alignment methods along with parsimony, NJ, and ML treebuilding methods to construct a phylogeny of the kinesin superfamily. Intron positions are compared. Organismal and kinesin phylogenies are reconciled. A method that determines which regions of an alignment contribute to long branch attraction is described.
DB Monday, Poster #24
LEUNG
Computational Linguistics of DNA: A Case in the Knowledge Representation and Pattern Recognition of Escherichia coli Promoters
Siu-Wai Leung, Chris Mellish, Dave Robertson, University of Edinburgh, Scotland, UK
Basic Gene Grammars (BGGs) was developed to represent the knowledge of E. coli promoters, including a domain theory, consensus sequences, weight matrices, the results of symbolic learning and knowledge-based neural networks. DNA-ChartParser provided bidirectional parsing facilities for BGGs. The knowledge of E. coli promoters was assessed by parsing actual DNA sequences.
PF Monday, Poster #56
LI
GTC birdsEye: An Integrated Visualization Tool for Protein-Protein Interaction Analysis
Anzhi Li and Veronique Damagnez, Genome Therapeutics Corporation
An integrated visualization tool, GTC birdsEye, has been developed for Y2H functional genomics to view the alignment patterns and binding domains of prey. GTC birdsEye supports the Y2H system to identify and characterize protein-protein interactions, and further extend the protein-protein interaction network.
GE Tuesday, Poster #20
LI
Probe Design on Genomic Level for High-density DNA Oligo Microarray
Fugen Li, Gary D. Stormo, Washington University School of Medicine, MO
High-density DNA oligo microarrays are widely used in biomedical research. In this paper we describe algorithms to optimize the selection of specific probes for each gene in an entire genome for oligo chips. We test the algorithm with a few model organisms.
GA Sunday, Poster #31
LI
Identification of Regulatory Sites in Bacterial Genome
Hao Li, University of California, San Francisco; Eric D. Siggia, The Rockefeller University
We present a new algorithm that identifies the binding sites of many different regulatory proteins from genome sequences and detects all significant patterns of the form w_1N_xw_2. The patterns are grouped into clusters and weight matrices derived by an alignment of the matching sequences. For E. Coli, we obtained ~100 matrices that match 1/3 of the factors with high statistical significance.
PF Monday, Poster #57
LI
Tao Li, University of California, Davis
Reaction Mechanism Study of Phospholipase D by Combination of Bioinformatics and Biochemistry
Interpretation of reaction mechanism of phospholipase D was attempted by the combination of the similarity comparison, the secondary structure prediction and the substrate specificity.
SA Tuesday, Poster #75
LIAO
Clustering Protein Sequences with a Linkage Graph
Li Liao, Sun Kim, Jean-Francois Tomb, DuPont Central Research and Development
By representing sequence similarity relationships among proteins as linkage graphs, this Candidate-Elimination-like clustering algorithm identifies the maximal quasi-complete subgraphs, a concept recently introduced by Matsuda et al to represent protein clusters. The role of graph connectivity, as a confidence measure, is studied by analyzing the statistical distributions of similarity scores.
PT Monday, Poster #84
LILIEN
Computational Screening Studies for Core Binding Factor Beta: Use of Multiple Conformations to Model Receptor Flexibility
Ryan Lilien, Mohini Sridharan, Xumei Huang, John Bushweller, Bruce Donald, Anthony Yan, Dartmouth College
We present an approach in computer aided drug design which attemps to account for protein flexibility. We use the computer program LUDI (Bohm, 1996) in a novel way, by representing a protein with a set of low energy confirmations, and then running several drug design simulations on this set.
SA Tuesday, Poster #76
LISTWAN
Characterisation of the Epidermal Differentiation Complex (EDC) on Mouse Chromosome 3 and Human Chromosome 1q21
Pawel Listwan, Joseph A. Rothnagel, University of Queensland, Australia
Terminal differentiation of the mammalian epidermis involves the expression of structurally and functionally related genes found in the epidermal differentiation complex (EDC) located on human chromosome 1q21 and mouse chromosome 3. The research presented shows the techniques used in mapping studies, sequence analysis and database searches for novel human sequences.
DB Monday, Poster #25
LIU
mtmDB: A Maize Targeted Mutagenesis Database
Hong Liu, Robert Martienssen, Cold Spring Harbor Laboratory, NY; Mike Freeling, University of California, Berkeley; Lincoln Stein, Cold Spring Harbor Laboratory
mtmDB is a maize targeted mutagenesis database currently containing information on 43,776 transposon insertion mutants. The database contains phenotype information (including images), pedigrees, and partial DNA sequencing information, as well as other information. Using an online request form, researchers may request mutant strains that affect genes of interest, and are invited to return phenotypic information for incorporation into the database.
GA Sunday, Poster #32
LIU
Non-Structured Regions in Genomic Proteins: Junk or Functional?
Jinfeng Liu, Burkhard Rost, Columbia University, NY
We found that regions with no regular secondary structure over more than 50 consecutive residues were common in all three kingdoms, in particular in eukaryotes. Although not structured, we found the floppy regions to be evolutionarily conserved, and - on average - relevant for functional (SwissProt annotation, yeast-two-hybrid data).
GE Tuesday, Poster #21
LIU
Comparative Analysis of Yeast Microarray Gene Expression Data: Hierarchical Clustering and Self-Organizing Maps
Jinfeng Liu, Lei Shi, William Grundy, Columbia University, NY
We analyzed a set of yeast gene expression data by hierarchical clustering and self-organizing maps (SOMs). According to the MIPS Yeast Genome Database (MYGD), Clusters were obtained by minimizing a defined cost function. Only a few MYGD classes could be clustered by either method. Hierarchical clustering performed slightly better than SOMs when evaluated both externally and internally.
SA Tuesday, Poster #77
LIU
BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-expressed Genes
Xiaole Liu, Jun Liu, Douglas L. Brutlag, Stanford University
BioProspector finds DNA sequence motifs from upstream region of co-expressed genes. Using a modified Gibbs sampling algorithm, the program can consider background Markov dependency, find motifs present only in part of the sequences, motifs with two blocks separated by a variable length gap or palindromic patterns.
PSP Sunday, Poster #87
LIU
Polypeptide Structure Prediction Via Multiple Copy Simulated Annealing in Torsional Space Based on Amber Energy Functions, Generalized Born Solvation and Solvent Accessible Surface Areas
Yongzing Liu, David L. Beveridge, Wesleyan University, CT
The multiple copy simulated annealing in torsional space algorithm using AMBER 5.0 force field and Generalized Born model is devised for molecular conformational optimization. The implementation of this algorithm using MPI(Message Passing Interface) on a BEOWULF PC cluster shows a linear scale for parallelization.
DB Monday, Poster #26
LIU
A Database and Browser for Genome Analysis and cDNA Assembly
Yuan Liu, Yuhong Wang, Guochun Xie, Yu Lin, Richard Blevins, Merck and Co., Inc.
To extract information from genomic sequence data and to facilitate gene discovery, A genomic data mining system is developed to provide scientists with the most up-to-date information. All data is stored in an Oracle relational database. A set of interactive visualization tools has been developed to access the Oracle database.
SA Tuesday, Poster #78
MA
Ancient Fungi Entrapped in Glacial Ice
Lijun Ma, Scott O. Rogers, State University of New York, Syracuse, USA
Culture and DNA polymerase chain reaction amplification methods were employed to revive and identify the ancient fungi and their nucleic acids in ice core sections from both the Greenland and Antarctic, up to 400 000 years old. Twenty-seven isolates were obtained. In addition three fungal sequences were amplified and sequenced directly from the inner of the ice core section.
SA Tuesday, Poster #79
MABEY
Prints-S Illuminates the Midnite Zone
J. E. Mabey, P. Scordis, T. K. Attwood, University of Manchester, UK
PRINTS-S, a recent extension of the PRINTS databank, attempts to provide depth to protein family data by storing its discriminators within a relational database. Using this system, it is possible to describe hierarchical relationships within protein families. Hence, PRINTS-S is able to illuminate relationships that occur within the Midnight Zone.
PSP Sunday, Poster #88
MALLICK
Using Structurally Derived Priors for Fold Recognition
Parag Mallick, Lukasz Salwinski, Gary Klieger, Rob Grothe, David Eisenberg, University of California, Los Angeles
We show that Hidden Markov Models trained using a continuous, Mahalanobis distance metric based, structure-derived prior outperform those generated using discrete Bowie-like prior distributions. We also introduce the Directional Density Parameter and show that it is better conserved in structurally aligned positions than the Bowie parameter.
GA Sunday, Poster #33
MARCOTTE
Using Non-homology Methods for Genome-Wide Prediction of Protein Function
Edward Marcotte, University of California, Los Angeles, Protein Pathways, Inc.; Matteo Pellegrini, Michael Thompson, Todd Yeates, David Eisenberg, Protein Pathways, Inc.
Genome and expression analyses reveal functional links between genes, useful both for predicting protein function and for creating protein networks analogous to those derived experimentally. We apply these methods (phylogenetic profiles, expression clustering, and Rosetta Stone analysis) to find yeast protein networks and predict function for >50% of the uncharacterized yeast proteins.
GE Tuesday, Poster #22
MARTOGLIO
Definition of Hidden cDNA Array Gene Expression Profile TracksApplication to Molecular Diagnosis of Ovarian Cancer
Ann-Marie Martoglio, James W. Miskin, David J. C. MacKay, Stephen K. Smith, University of Cambridge, UK
We present a novel approach to array-based gene expression data which allows for "blind" separation of samples based on hidden gene expression profile tracks (gTRACKS). The method is demonstrated on data from tailored cDNA array ovarian cancer studies and shows successful application for various classifications of the tissue samples.
GA Sunday, Poster #34
MEHTA
High Message Variety and Low Intrinsic Error Correction Capabilities of Whole Genome Symbol Strings: An Information Theoretic Perspective
Preeti Mehta, Ramneek Gupta, S. Krishnaswamy, Madurai Kamaraj University, India
Information content calculations based on Shannon entropy for 22 prokaryotic genomes and 29 eukaryotic chromosomes are reported. Chromosomes in eukaryotic organisms tend to maintain similar information densities. For all genomes, information density values (Id), are low indicating a state of homeostasis with respect to maintaining high potential message variety and low intrinsic error combating capabilities.
GE Tuesday, Poster #23
MITRA
Fundamental Patterns Underlying Gene Expression Profiles: Simplicity from Complexity
Smita Mitra, Neal Holter, Amos Maritan, Marek Chiplek, Jayanth Banavar, Nina Fedorff, The University of Pennsylvania
Singular value decomposition analysis of previously published microarray expression data has uncovered underlying patterns in their expression profiles. The essential features of the profiles are captured using just a small number of these patterns, leading to the striking conclusion that the transcriptional response of a genome is orchestrated in a few fundamental patterns of gene expression change.
GA Sunday, Poster #35
MOFFETT
Whole Genome Phylogenies via Singular Value Decomposition (SVD)
Karen Moffett, Mathew Kay, Steve Baker, Gary Stuart, Indiana State University
A novel SVD-based analysis utilizing all possible overlapping tripeptides was used to produce phylogenetic trees from whole genome data. Both gene and species trees resulted from the same analysis. In one application, over 1280 mitochondrial proteins (and the nearly 100 metazoan organisms they represent) were accurately placed within phylogenetic trees.
ME Sunday, Poster #68
MOLLA
Machine Learning Approaches to Basecaller Calibration
Michael Molla, The University of Wisconsin, Madison
"Basecalling" turns a sequencing reaction into "A"s, "G"s, "T"s and "C"s. Sequencing machines already do most of this work for us. However, whenever a new sequencing machine is developed, painstaking expert calibration is required. This is a study of the effectiveness of various machine learning techniques in solving this problem.
GA Sunday, Poster #36
MORENO-HAGEISIEB
Preservation and Prediction of Transcription Units Across Microbial Genomes: Escherichia coli, and Haemophilus influenzae
Gabriel Moreno-Hagelsieb Temple F. Smith, Boston University; Julia Collado-Vides, Centro de Investigacion sobre Fijacion de Nitrogeno, Mexico
Genes among gram-negatives are found together more frequently when they correspond to Escherichia coli operons, than to transcription unit boundaries. The point of highest prediction accuracy is coincident with the point of highest preservation profile of predicted operons versus transcription unit boundaries. We extend such predictions to Haemophilus influenzae.
SA Tuesday, Poster #80
MOROZ
The Distribution of Log Likelihood Scores in Multiple Alignments
J. David Moroz, Terence Hwa, University of California, San Diego, USA
The multiple alignment of a group of sequences provides a means for recognizing relationships among the sequences. A number of methods based on weight matrices have recently been proposed for finding good multiple alignments by maximizing the log-likelihood score. We present an analytical calculation of the expected distribution of scores which may be used to determine the statistical significance of alignments.
GE Tuesday, Poster #24
MORRISON
Comparison of Normalisation Methods for Microarray Expression Data Over Multiple Experiments
Norman Morrison, Magnus Rattray, University of Manchester, UK; Kenneth Pollock, Ray Jupp, Aventis, UK; Andy Bras, University of Manchester, UK
We have compared a number of different normalisation methods for expression data derived from different technological systems. Raw data was normalised by a combination of approaches using a range of noise cut off points. The approaches were assessed by a metric describing the strength of biological signal post normalisation.
GA Sunday, Poster #37
MUELLER
A Structure Based Genome Annotaton System for Comparative Genomics
Arne Müller, Robert M. MacCallum, Lawrence A. Kelly, Michael J. E. Sternberg, Imperial Cancer Research Fund, London, UK
A system for structure-based annotation has been developed for application to all currently sequenced genomes. We aim to use this system for comparative genome analyses, such as protein fold composition. The system is based on a relational database core with an object-oriented software frontend, permitting easy incorporation of new databases and methods.
DB Monday, Poster #27
MULLER
Using a Newly Constructed Virtual Protein Database for Plasmodium in the Search for Virulence Genes on which Positive Selection May Operate
Ralhston Muller, Winston Hide, University of the Western Cape, South Africa
The aim was to apply the stack_pack clustering algorithm on 15, 468 Plasmodium sequences, and predict protein sequences from the DNA consensi using ESTScan. Secondly, using a simple method for estimating the number of synonymous and non-synonymous substitutions, and thereby detecting genes on which positive selection may operate.
GE Tuesday, Poster #25
MUMEY
Studying Network Dynamics in Genetic and Neural Systems
Brendan Mumey, Tomas Gedeon, Julie Taubman, Zuzana Gedeon, Kate Hall, Montana State University
Learning regulatory elements in genetic networks based on gene expression data collected from DNA microarrays is exciting; Bayesian networks are one proposed model. We examine a different network model based on a more restricted network trajectory graph and demonstrate our methods using gene expression data sets and neural activity trace data that we suggest as another testbed for network-learning algorithms.
BN Sunday, Poster #8
NAKAYAMA
Computer Modeling of Human Erythrocyte using E-Cell Simulation System
Yoichi Nakayama, Ayako Knoshita, Ryo Matsushima, Mitsuhiro Kita, Masaru Tomita, Keio University, Japan
We constructed a computer model of the human erythrocyte using E-CELL simulation system. The model has three major metabolic pathways, including glycolysis, the pentose phosphate pathway, and nucleotide metabolism. In this work, we carried out the simulation of enzyme deficiencies such as that of Glucose-6-phosphate dehydrogenase (G6PD).
PS Monday, Poster #70
NAYAL
Consensus Macromolecular Surfaces and the Molecular Recognition Problem
Murad Nayal, Barry Honig, Columbia University
A novel approach is devised to detect similar binding sites in a set of proteins based on surface shape and chemical properties. The technique is applied to a number of protein families and a library of consensus binding sites is constructed for each. These consensus molecular surfaces highlight potentially critical molecular recognition determinants and are shown to be distinctive of each family studied.
GA Sunday, Poster #38
NELSON
Anubis 4 - a Java Genome Map Viewer
J. Paul Nelson, Alan L. Archibald, Andy S. Law, Roslin Institute, Scotland, UK
We present a Java enabled map viewer - Anubis 4 that graphically displays genome maps from one or more species. Each label on the map provides a clickable link to details of mapped objects. Functionality is placed on the client side: maps can be rotated, flipped, zoomed or moved almost instantaneously.
DB Monday, Poster #28
NELSON
Bioinformatics Resources for Genome Analysis in Farm Animals
J. Paul Nelson, Alan L. Archibald, Andy S. Law, Roslin Institute, Scotland, UK
The Bioinformatics group at the Roslin Institute are developing bioinformatics tools and resources for scientists engaged in genome analysis in farmed and domestic animals. The resources developed encompass both the databases and the associated analytical and display tools required for genetic and physical mapping of farm animals. (http://www.ri.bbsrc.ac.uk/bioinformatics/ paul.nelson@bbsrc.ac.uk)
PSP Sunday, Poster #89
NG
PHAT: A Transmembrane-Specific Substitution Matrix
Pauline C. Ng, Jorja G. Henikoff, Steven Henikoff, University of Washington, USA
Database searching algorithms for proteins use scoring matrices based on globular proteins which may be inappropriate for transmembrane regions. In searches with transmembrane queries, the PHAT matrix significantly outperforms generalized matrices. We conclude that a better matrix can be constructed by using background frequencies characteristic of the twilight zone rather than database frequencies. Available at: http://blocks.fhcrc.org/~pauline
PS Monday, Poster #71
NILSSON
Studies of Amino Acid Frequencies in Integral Membrane Proteins from Different Subcellular Organelles
Johan Nilsson, Bengt Persson, Karolinska Institutet, Stockholm, Sweden
We have analysed transmembrane and loop regions of integral membrane proteins to find sequence features characteristic of different organelles. To eliminate homology bias, sequences were grouped into families to obtain a representative data set. The resulting families and amino acid distributions arrived at for each organelle type will be presented.
GA Sunday, Poster #39
NING
A New Hashing Algorithm for Genomic Sequence Search
Zemin Ning, James C. Mullikin, Wellcome Trust Genome Campus, UK
A new hashing algorithm has been developed for genomic sequence search. Sequences are preprocessed into a hash table and search is only conducted upon the hash table rather than the whole database. Between two to three orders of less CPU time can be achieved compared with two widely used search tools.
PS Monday, Poster #72
NORDLING
Gamma Gamma ADH A Docking Study
Erik Nordling, Bengt Persson, Karolinska Institutet, Stockholm, Sweden
A flexible side-chain docking technique was used to investigate gamma gamma ADHs role in bile acid metabolism. We demonstrate that gamma gamma ADH is involved in the conversion of iso-bile acids to bile acids by catalysing the oxidation of the 3 beta hydroxyl group.
BN Sunday, Poster #9
NOVIK
ProLoc: a New program that Predicts Sub Cellular Locations of Proteins
Amit Novik, Einat Hazkani, Erez Levanon, Compugen Ltd., Tel Aviv, Israel
ProLoc predicts, with high accuracy, the localization of protein among 22 compartments comprising the cell organelles themselves and their membranes. In addition, it divides the membrane proteins into tree groups: Type I, Type II and integral membrane proteins. To achieve high levels of accuracy several different approaches were applied concomitantly.
PS Monday, Poster #73
O'HEARN
A Method for Comparing Flexible Objects Applied to Protein Molecules
S. D. O'Hearn, National Research Council of Canada; A. J. Kusalik, University of Saskatchewan, Canada
A new method for spatially decomposing protein molecules for comparison, similarity detection, and categorization is described. The method recursively subjects molecules to a spatial decomposition that is guided by octtrees. Octtrees constitute a multiple structure alignment that provide a rapid, efficient comparison space for molecules and a compact 3-D description of their properties.The method is implemented as a computer program, MolCom3D.
SA Tuesday, Poster #81
OHLER
A Hybrid Markov Chain - Neural Network System for the Exact Prediction of Eukaryotic Transcription Start Sites
Uwe Ohler, Georg Stemmer, Heinrich Niemann, Universitä Erlangen-Nürnberg, Germany
We have designed stochastic segment models with Markov chains as submodels for vertebrate promoter and non-promoter sequences. Their output, along with additional CpG island features, is fed into a neural network classifier. This new two-step approach leads to an accurate and specific prediction of transcription start sites in genomic DNA.
DB Monday, Poster #29
ONAMI
Bio-calculus: Toward a Generalized Description System for Biology
Shuichi Onami, Kitano Symbiotic Systems Project Japan; Masao Nagasaki, Satoru Miyano, Kitano Symbiotic Systems Project, University of Tokyo; Hiroaki Kitano, Kitano Symbiotic Systems Project, Sony CSL
Bio-calculus is a knowledge description system, trying to describe any kind of biological knowledge using the same description principle. We presented its concept, and syntax and simulation software for molecular interaction. Currently, we are developing bio-calculus for more complicated biological phenomena, such as sub-cellular localization, cell-cell interaction, and cell division.
DB Monday, Poster #30
OTILLAR
Relational Databases, Statistics, and Improved Homology Assays
Robert P. Otillar, University of California, San Francisco
This work investigates empirical limits of alignment-based homology assays between genes with very low sequence identity. We show useful statistics and interesting case studies from 1.15 million sequence alignments from the Structural Classifications of Proteins, emphasizing gene-comparisons with low (sub-'twilight') residue position-by-position similarity scores. We discuss fundamental limitations of Smith-Waterman, Fasta, gapped/PSI BLAST, and hidden Markov assays.
DB Monday, Poster #31
PANDEY
A Resource for Information on Plant Chromatin Remodeling Genes
Ritu Pandey, David Selinger, David Mount, Vicki Chandler, Richard Jorgensen, University of Arizona
Genes identified in the genomes of several model organisms that regulate chromatin structure have been used as probes to find similar genes in Arabidopsis genomic sequence and maize EST sequences. Information on these genes stored in ChromDB will be accessible through the existing web site at http://ag.arizona.edu/chromatin/
DB Monday, Poster #32
PATON
GIMS - Genome Information Management System
Norman W. Paton, Shakel A. Khan, Andy Hayes, Fouzia Moussouni, Michael J. Cornell, Karen Eilbeck, Andy Brass, Carole A. Goble, University of Manchester; Simon Hubbard, University of Manchester Institute of Science and Technology; Stephen H. Oliver, University of Manchester
Complex data sets provided by genome sequencing projects, present challenges in the storage, analysis and presentation of information. Using UML, we have produced models that describe genome sequences , protein interactions and transcription data. These have been applied to data from Saccharomyces cerevisiae, a lead organism in functional genomics.
GE Tuesday, Poster #26
PAVLIDIS
Combining Microarray Expression Data and Phylogenetic Profiles to Learn Gene Functional Categories Using Support Vector Machines
Paul Pavlidis, William Noble Grundy, Columbia University
We demonstrate how to apply the support vector machine learning algorithm to a heterogenous data set. Our results suggest that combining data types should only be attempted if there is evidence that the functional classification of interest is clearly reflected in both data sets.
EV Tuesday, Poster #2
PE'ER
Incomplete Directed Perfect Phylogeny
Itsik Pe'er, Ron Shamir, Roded Sharan, Tel Aviv University, , Israel
We investigate the following variant of the "Perfect Phylogeny" problem: Input: An n*m species-characters matrix. Characters are binary, and can only be gained, never lost. Some matrix entries are unknown. Goal: Complete the missing entries in a way admitting a perfect phylogeny. We solve this problem in O(m*n*polylog) time.
GA Sunday, Poster #40
PEDERSEN
A DNA Structural Atlas for Escherichia coli
Anders Gorm Pedersen, Lars Juhl Jensen, Søren Brunak, Hans-henrik Staerfeldt, David W. Ussery, The Technical University of Denmark
We have performed a computational analysis of DNA structural features (curvature, flexibility, and stability) in several prokaryotic genomes. For visualizing results we developed the "Genome Atlas" where structural measures in entire chromosomes are plotted in the form of color-coded wheels (http://www.cbs.dtu.dk/services/GenomeAtlas/). Possible biological interpretations of our results are discussed.
PF Monday, Poster #58
PERSSON
Studies on Short-chain and Medium-chain Dehydrogenases/Reductases Using Hitherto Completed Genomes
Bengt Persson, Jan-Olov Höög, Hans Jörnvall, Karolinska Institutet, Stockholm, Sweden
The protein families SDR and MDR constitute large enzyme families. Studies on these in the completed genomes reveal different evolutionary patterns for different sub-families, reflecting their different roles in the metabolism. A sub-class of SDR with dehydratase, epimerase and isomerase activities seems to be of critical importance.
PSP Sunday, Poster #90
PETERSEN
Prediction of Protein Secondary Structure at 80% Accuracy Using a Combination of Many Neural Networks
Thomas Nordahl Petersen, Claus Lundegaard, Morten Nielsen, Henrik Bohr, Jakob Bohr, Søren Brunak, Garry P. Gippert, Ole Lund, Structural Bioinformatics Advanced Technologies A/S, Denmark
A secondary structure prediction protocol involving up to 800 neural network predictions has been developed, by use of novel methods such as output expansion and a unique balloting procedure. An overall performance of 80.2% (80.6% mean per-chain) for three-state (helix, strand, coil) prediction was obtained.
GA Sunday, Poster #41
PILPEL
A Whole-genome Analysis of Combinatorial Regulation of Gene Networks
Yitzhak Pilpel, Priya Sudarsanam, George Church, Harvard Medical School, USA
We have launched a whole-genome survey to systematically identify combinatorial sets of transcription factors that regulate the expression of multi-gene networks. A probability model was utilized to select statistically significant combinations of regulatory motifs. The biological significance of such motif combinations was assessed by their impact on mRNA expression.
DB Monday, Poster #33
PLEISSNER
Data Processing of Barley ESTs
K.-P. Pleissner, W. Michalek, U. Willschere, A. Graner, Institute of Plant Genetics and Crop Plant Research, Germany
The data processing of barley ESTs comprises the assignment of putative functions of ESTs and the identification of an unigene set of barley. For the functional assignment the ESTs were blasted against SWISSPIRPLUS. A database containing the ESTs and their BLASTX2 results was built up using the mySQL DBMS. BLAST searches allowed the identification of more then 3000 barley genes.
ME Sunday, Poster #69
PLETINCKX
APES--Automated Processing and Extraction of Sequences
Jurgen Pletinckx, Algonomics N.V., Belgium; Antoine Janssen; Jan van Oeveren, Keygene N.V., The Netherlands; Philippe Stas; Ignace Lasters, Algonomics N.V.; René van Schaik, Keygene N.V.
APES extracts sound nucleotide sequences from tracing files. Using modified alignment algorithms, flanking nucleotide sequences are localised. Fragments are trimmed and assembled. A web interface graphically displays sequences, their quality, the location of flanking sequences, and downloadable sequence files (FASTA). APES output is prepared for use by automated annotation tools.
SA Tuesday, Poster #82
POCOCK
Unsupervised Hidden Markov Models trained on P. falciparum chromosome 3 detect biologically interesting structure
Matthew R. Pocock, Thomas A. Down, Tim J. P. Hubbard, Wellcome Trust Genome Campus, UK
We have developed Hidden Markov models for whole chromosomes, containing pairs of states that emit long regions of complementary sequence. After training on the malaria P. falciparum chromosome 3 and tested on chromosome 2, they reproducibly learned states associated with telomeres, genes, sub-telomeric repeats and structures associated with the var antigen genes.
SA Tuesday, Poster #83
POISSON
A Comparative Sequence Analysis System for Detecting RNA Patterns in mRNAs: Application to Prion Protein mRNAs
Guylaine Poisson, Isabelle Barrette, Patrick Gendron, Francois Major, Université de Montréal, Québec, CA
We developed a Structural Pattern Finder computer program to detect structured RNA in human prion mRNA. The program measures folding free energies (FFE), which in this mRNA indicated the presence of several RNA patterns. One of particular interest is a pseudoknot that could affect prion protein translation.
PSP Sunday, Poster #91
POLLASTRI
SSpro: SS prediction using BRNN
Gianluca Pollastri, Pierre Baldi, University of California, Irvine
We are here presenting SSpro, a server for protein secondary structure prediction. This server is based on a set of Bidirectional Recurrent Neural Networks (BRNN) and currently achieves 76.7% correct classification on a set of 126 sequences with low similarity to the training set.
GA Sunday, Poster #42
PTITSYN
New Algorithms for Large-Scale EST Clustering
Andrey Ptitsyn, Winston Hide, South African National Bioinformatics Institute
Based on the STACK experience, we have developed a new EST clustering algorithm. It's based on a new fast linear statistical measure of sequence similarity, which ignores low-complexity. The algorithm is implemented in two variants: loose clustering with assembly by a third party system and stringent clustering with simultaneous consensus generation and alternative variant apprehension. Tests show a significant improvement against D2_cluster in computation time and cluster sizes.
GA Sunday, Poster #43
QI
Novel Pattern Prediction in Complete Genomes
Dong Qi, Ahmed Fadiel, A. Jamie Cuticchia, The Hospital for Sick Children, Toronto, ON, CA
Higher order oligonucleotides were used to address whether or not Markov chain models can provide a unified framework in which to describe whole genomes in terms of constituent oligonucleotides. Algorithms capable of oligonucleotide predictions were developed. Our results showed that the (n-2) order Markov chain is a unified better predictor. Intra-genomic variability is also discussed.
SA Tuesday, Poster #84
QIAN
Modeling the Gap Length Distribution
Bin Qian, Richard Goldstein, University of Michigan, USA
In order to improve the efficiency of sequence alignment methods, we want to find a more accurate model of insertions and deletions in structurally related and homologous proteins. We extracted the gap length distribution from the FSSP protein structure alignments. The results suggest new approaches to modeling insertions and deletions in sequence alignments.
SA Tuesday, Poster #85
QIAN
Promoter Finding and Linguistic
Minping Qian, Peking University, China
A method for mining the biological significant words in the core promoters is given. Our statistical results suggest that these words are transcription factor binding sites and they come in well defined pairs and triples. A program finding promoters based on these words and vocabulary is provided.
DB Monday, Poster #34
QUINN
XNotify: An Automatic Sequence Database Search and Results Management System
G. B. Quinn, The Burnham Institute, La Jolla, CA; Philip E. Bourne, University of California, San Diego, USA
Based on a Linux OS platform, an automatic sequence database search system has been written in the C programming language that performs daily sequence database searches of NCBI sequence datasets, notifying the user by email only when new or previously unseen matches have been found to a query sequence. Additionally, search data is stored online, allowing the researcher to access and manipulate search results at will via an intuitive web interface.
ME Sunday, Poster #70
RAJ
CORBA for Integration of Bioinformatics Applications
Chelliah Pethuru Raj, Naohiro Ishii, Nagoya Institute of Technology, Japan
The use of heterogeneous, distributed computing environments makes maintaining a software system for molecular biology data laborious and time-consuming. We present a proposal for integration of BLAST-related software tools for the improvement in the quality of accessing and use of biological knowledge available in public databases in n-tier setup using the enterprise application integration (EAI) - enabled CORBA.
SA Tuesday, Poster #86
REESE
Empirical Determination of Effective Gap Penalties for Protein and DNA Sequence Alignment
Justin T. Reese, William R. Pearson, University of Virginia
We have examined empirically effective gap penalties for alignment of protein and DNA sequences. Sequences were artificially mutated from PAM20 to PAM 200 and embedded in a database of unrelated sequences to determine the gap penalties that yielded the greatest statistical significance for the most distant "homologs".
BN Sunday, Poster #10
REGEV
Computer Simulation of Biomolecular Processes Using Stochastic Process Algebra
Aviv Regev, Tel Aviv University; William Silverman, Naama Barkai, Ehud Shapiro, Weizman Institute of Science, Israel
We present a novel approach for quantitative modeling of biochemical networks, at different levels of abstraction. We have developed a new simulation system, Psi, based on the stochastic pi-calculus process algebra. The ability of this well-defined formalism to capture modular structures is illustrated by studying the hysteresis-based model for circadian clocks.
PS Monday, Poster #74
REINHARDT
Improved Spatial Contact Prediction from Correlated Mutation Analysis
Astrid Reinhardt, Olivier Lichtarge, Baylor College of Medicine, TX
A new method was developed to identify spatial proximity between amino acid resiued-pairs within a protein based on correlations in their mutational behaviour. Phylogenetic tree information is used to classify mutational correlation in light of its evolutionary impact. This approach seems to improve the signal-noise ratio compared to existing methods.
SA Tuesday, Poster #87
REISDORF
Choosing Models for Similarity-based Gene Prediction: Profiles versus Single Sequences
William Reisdorf, Pankaj Aarwal, SmithKline Beecham Pharmaceuticals
Homology-based gene prediction should improve as the number of database entries increase. However, it is not always clear which database sequence will provide the best model for a new gene. We present an evaluation of using profile-HMMs, generated by HMMER, to guide homology-based gene predictions from GeneWise.
SA Tuesday, Poster #88
RHEE
Predicting Spliced Junction Site in the cDNA Sequence Data Using Bayesian Networks
Hwanseok Rhee, Yonsei University Medical School, Seoul, Korea; Bermseok Oh, Jinsung Lee, Korea Institute of Health, Seoul, Korea
As the splicing sites could be successfully identified from the genomic sequences, we trained the Bayesian belief network classifier to predict the spliced sites from the cDNA sequences reversely. The overall accuracy of the trained network was about 90% and it would be quite new property of the human genes.
SA Tuesday, Poster #89
RIVAS
A Screen for Noncoding RNA Genes using Bayesian Classification of Interspecies Sequence Alignments
Elena Rivas, Sean R. Eddy, Washington University, MO
We propose a probabilistic model to computationally classify conserved regions from related genomes using an approach that incorporates structural with comparative information. It uses three distict probabilistic models to calculate a Bayesian posterior probability that a sequence alignment should be tentatively classified as RNA, coding, or "other class.'' It is used to screen large databases of BLASTN alignments for novel noncoding RNA genes.
GA Sunday, Poster #44
ROUCHKA
Large-scale Genomic Sequence Composition Analysis
Eric C. Rouchka, David J. States, Washington University, MO
We attempt to collate a definitive set of nonredundant extended human genomic sequences by extending individual human entries in GenBank for the purpose of analyzing chromosomal genome organization at the sequence level. We present our results concerning isochore organization as well as studies into the mechanisms which promote isochore maintenence.
PS Monday, Poster #75
RUX
Structural Studies Probe the Evolution of Spherical Viruses
John J. Rux, Stacy D. Benson, Carmen San Martin, Roger M. Burnett, The Wistar Institute, PA, USA
Similarities in the molecular structures and the packing arrangement of the major coat proteins in human adenovirus and bacteriophage PRD1 suggest an evolutionary link between the viruses. Structural alignments and optimized multiple-sequence alignments of the adenovirus hexons have revealed conserved elements, which are being used to search for remote homologs.
SA Tuesday, Poster #90
SAGARA
A Method for Sequence Analysis using Multivariate Analysis
Jun-Ichi Sagara, Kentaro Shimizu, The University of Tokyo, Japan
We developed a computational sequence analysis method based on mutivariate analysis. The Method use the principal component analysis and mutidimensional scaling method to classify the sequences into multiple groups similar sequences, and also to extract characteristic bases that are conserved within a group but differ from other groups.
BN Sunday, Poster #11
SALVADOR
Condensed Representations of Synergistic Behavior in Biological Regulatory Networks
Armindo Salvador and Michael A. Savageau, The University of Michigan
A recently published formalism performs synergism analysis by quantifying deviations from linear/additive (synergistic) or from power-law/multiplicative (log-synergistic) behavior through second-order differential coefficients. We apply this formalism to a model of a metabolic network and show that it provides for an effective visualization of synergistic and log-synergistic interactions at the systemic level.
PT Monday, Poster #85
SARAI
Structure-specificity Relationship and Target Prediction of Transcription Factors
A. Sarai, S. Selvaraj, P. Prabakaran, RIKEN Life Science Center, Japan; H. Kono, University of Pennsylvania
We show that we can quantify the specificity in the protein-DNA recognition by the statistical analysis of structural data of protein-DNA complex. We derived empirical potential functions for the specific base-amino acid interactions, and examined the relationship between structure and specificity. We also discuss target prediction of transcription factors at the genome level.
GE Tuesday, Poster #27
SASIK
Clustering Analysis of Gene Expression Data Using Percolation
R.Sasik, T. Hwa, University of California, San Diego
We present a novel method for clustering of gene expression data based on the percolation paradigm. In this method the result is cast in terms of the probability that a gene belongs to a certain cluster, accommodating for the possibility that it participates in several clusters or none at all.
DB Monday, Poster #35
SCHACHERER
The TRANSPATH Signal Transduction Database a Knowledge Base on Signal Transduction Networks
F. Schacherer, GBF German Research Centre for Biotechnology, Germany; E. Wingender, GBF, Biobase Biological Databases GmbH, Germany
TRANSPATH (http://transpath.gbf.de) provides access to the growing amount of signal transduction data, mainly to pathways involved in mammalian transcription regulation. Entries are validated with references to original publications and linked to other databases. The knowledge base goes beyond the approach of traditional protein databases by storing the network of interactions.
ME Sunday, Poster #71
SCHIEKEL
Automatic Spot Detection on Biochip Arrays
C. Schiekel, T. Kaempke, Forschungsinstitut für anwendungsorientierte Wissensverarbeitung (FAW), Germany
A two-step automatic spot detection algorithm for biochip arrays is presented. First spot locations are detected with a non linear filter followed by a matched filter. Second, a grid is estimated by a mean least square method. The only input variables to the method are physical sizes and proportions from the layouts of the biochips.
GA Sunday, Poster #45
SCHLEIERMACHER
Computation and Visualization of Degenerate Repeats in Complete Genomes
Chris Schleiermacher, University of Bielefeld, Germany
A systematic study of repetitive DNA on a genomic scale requires extensive algorithmic support. We have developed the Reputer family of programs that serve as a fundamental tool in such studies. Efficient and complete detection of various types of repeats is provided together with an evaluation of significance, interactive visualization, and simple interfacing.
SA Tuesday, Poster #91
SENGER
OpenBSA Unbearable Lightness of Sequence Analysis
Martin Senger, Juha Muilu, Philip Lijnzaad, Alphonse Thanarag, Alan Robinson, EMBL-EBI, Hinxton, Cambridge, UK
In this poster we present freely available software components, which based on Biomolecular Sequence Analysis specification (BSA). BSA is the first (together with Genomic Maps) standardised and adopted technology by Life Science Research Domain Task Force within Object Management Group. The BSA comprises the modules of biological objects and analysis mechanisms.
SA Tuesday, Poster #92
SEVANT
Automatic Identification of Patterns for ProDom Families using Pratt
Florence Sevant, Daniel Kahn, Jérôme Gouzy, INRA/CNRS LBMRPM, France; Florence Corpet, INRA LGC, France; Stig Dommarsnes, Inge Jonassen, University of Bergen, Norway
ProDom is a database of automatically derived protein families and is integrated into the InterPro database. We describe an approach which allows fully automatic discovery of patterns to be used for characterising protein families and demonstrate its use on ProDom families. We describe the developed system and summarise the results.
GA Sunday, Poster #46
SEVERSON
In silico Cloning of Chromosome 7q35-q36 for an Accurate Physical Map, Refinement of the Region, and Identification of Candidate Genes for Mutation Analysis
T. M. Severson, K. Rust, H. Xu, L. Huynh, J. Lie, C. Bodell, T. S. Acott, M. K. Wirtz, Oregon Health Sciences University, USA
Primary open-angle glaucoma (POAG) in a large designated glaucoma family mapped to chromosome 7q35-q36. The region defined as GLC1F was refined and a physical map was constructed using computational tools, unfinished (phase 0-2) search databases and finished (phase 3) search databases at the National Center for Biotechnology Information (NCBI).
PSP Sunday, Poster #92
SHEPHERD
Automated Protein Structure Prediction Using Templates from the CATH Protein Family Database
Adrian Shepherd, Christine Orengo, University College London; Nigel Martin, Roger Johnson, Birkbeck College, London
Important recent developments in CATH include: the addition of nearly 200,000 sequence relatives from Genbank; generating multiple structural alignments for around 400 CATH homologous families; and setting up a Dictionary of Homologous Superfamilies containing structure/function information. To manage this new data, a database is currently being developed using Oracle8i.
PSP Sunday, Poster #93
SHI
FUGUE: A Fold Recognition Method Using Structural Environment-Specific Substitution Tables and Structure-Dependent Gap Penalties
Jiye Shi, Tom L. Blundell, Kenji Mizuguchi, University of Cambridge, UK
FUGUE is a program for recognizing distant homologues by sequence-structure comparison using structural environment-specific amino acid substitution tables and structure-dependent gap penalties. By combining structural environment information with amino acid sequence, FUGUE achieved better performance in both fold recognition and alignment accuracy, compared with some widely used fold recognition algorithms.
PF Monday, Poster #59
SHIEH
Classifying Protein 3-D Structures by Point Set Pattern Matching Algorithms
Grace S. Shieh, Ming-Jing Hwang, Academia Sinica, Taipei, Tai-Wan
A statistical measure is proposed to classify the 3-D structures of proteins. The measure captures the "similarity" between two protein sequences via an algorithm adapted from "Point set pattern matching (PSPM) in 3-dimension" (de Rezende and Lee, 1995) and "Faster PSPM in 3D" (Boxer, 1998).
DB Monday, Poster #36
SIEPEL
ISYS: A Software Platform to Enable the Integration of Heterogeneous Bioinformatics Resources
Adam Siepel, Andrew Tolopko, Andrew Farmer, Peter Steadman, Dawn Perry, Faye Schilkey, William Beavis, National Center for Genome Resources, Santa Fe, NM
Heterogeneity of databases and software resources continues to hamper the integration of biological information. We present a highly-flexible, bottom-up approach to this problem that uses a generic integration platform to enable the interoperation of diverse software components. Our solution is designed to make maximal use of existing resources.
PS Monday, Poster #76
SOBOLEV
Consensus Binding Site Structures at the Atomic Level: The Adenine Ring Moity as a Test Example
Vladimir Sobolev, Yosef Kuttner, Alexander Raskind, Marvin Edelman, Weizmann Institute of Science, Israel
We have derived an algorithm to search for similar spatial arrangements of atoms around a given chemical moiety, using the adenine ring as a test example. Such a consensus structure might serve as a sufficient identifying motif to search for unknown, potential binding sites.
PSP Sunday, Poster #94
SONG
How Does It Fold? Searching for Folding Pathways Using A Motion Planning Approach
Guan Song, Nancy M. Amato, Texas A&M University
We present a framework for studying folding problems from a motion planning perspective. Our preliminary experimental results with traditional paper crafts and some small proteins are very encouraging. For the protein folding problems, we try to validate our folding pathways by comparing the order in which the secondary structures form on our pathways to known results from pulse labeling experiments.
ME Sunday, Poster #72
SONG
Contextual Biomedical Image Recognition: Application to White Blood Cell Differentiation
Xubo (Beth) Song, Oregon Graduate Institute of Science and Technology
We present a novel system for automatic recognition of biomedical images, including white blood cell images and microscopic urinalysis images. The system consists of three major steps: image processing and feature extraction, pre-classification based on artifitial neural networks, and refined classification by incorporating contextual information.
PSP Sunday, Poster #95
ST-ARNAUD
Learning Sequence-structure Affinity Using Neural Networks and a Probablisitic Representation of Protein Folding Motifs
Daniel St-Arnaud, Francois Major, Université de Montréal, Québec, Canada
A novel measure of protein sequence-structure affinity is introduced. Based on Bayesian network theory, this measure can be learned from data using artificial neural networks and should prove useful for protein fold recognition by optimal sequence-structure alignment (threading).
EV Tuesday, Poster #3
STEFFAN
Tree Alignment for Zinc Finger Domains
Dagmar Steffen, Dusan Ihracky, Lothar Gierl, Universität Rostock, Germany
Based on the comparison of various domains, we developed a new similarity matrix. First, we collected all proteins containing zinc finger domains from databases. We then extract the zinc finger domains of three classes from the available protein data. Using masks, the domains were divided into subgroups. The domains of each subgroup now were sorted according to the similarity requirement and then arranged in a similarity tree.
ME Sunday, Poster #73
STEINFATH
Automated Image Analysis for Hybridisation Experiments
Matthias Steinfath, Wasco Wruck, Max-Planck-Institute for Molecular Genetics; Henrik Seidel, Schering AG; Hans Lehrach, Uwe Radelof, Max-Planck-Institute for Molecular Genetics; John O'Brien, Royal College of Surgeon in Ireland
An image analysis for hybridisation experiments is presented. This procedure is fully automated in contrast to semiautomated programs that incorporate user interaction. Two different kinds of experimental data have been analysed : Oligonucleotide fingerprinting data and gene expression data. We applied our image analysis software to several hundreds of images successfully.
GA Sunday, Poster #47
STORM
A Novel Method for Estimating Orthology Confidence
Christian Storm, Erik L. L. Sonnhammer, Karolinska Institutet, Sweden
A novel method is presented that analyzes bootstrap trees for orthologs. Instead of searching the optimal tree for branches that support orthology with high bootstrap support, we analyze each bootstrap tree and look for orthology assignments that occur frequently. The frequency in turn provides a confidence estimate.
GE Tuesday, Poster #28
SU
A Statistical Model for Locating Regulatory Regions Which Confer Erythroid-specific Gene Expression
Xiaoping Su, Sylvan Wallenstein, David Bishop, Mount Sinai School of Medicine, NY
We propose an approach that, on the basis of statistically significant clustering of putative transcription factor binding sites, predicts erythroid-spcific regulatory regions within a genomic sequence. We utilize putative binding sites for transcription factors already shown to be involved in erythroid-specific control and employ a robust non-overlapping r-scan statistic as an inferential means to distinguish any significant clusters from those incurring by chance.
GE Tuesday, Poster #29
SUTPHIN
Supervised Learning of Microarray Expression Profiles: Analysis of an Acute Leukemia Data Set as an Example
Patrick D. Sutphin, Soumya Raychaudhuri, Russ B. Altman, Stanford University, CA
We used a publicly available supervised learning tool developed in our lab to analyze leukemia data. Clustering methodologies failed to distinguish between the subtypes of leukemia; acute myeloid and acute lymphoid leukemia do not cluster separately. Our supervised classification strategy was effective in differentiating between the two subtypes.
PSP Sunday, Poster #96
SWAIN
Constraint Logic Programming and the Protein Side-chain Placement Problem
Martin T. Swain, Graham J. L. Kemp, University of Aberdeen, Kings College, Scotland
Residue positions may be formulated in finite domain CLP as variables, with side-chain rotamers as sets of possible values. Constraints are used to eliminate steric overlaps greater than a chosen threshold. Preliminary results indicate the accuracy of CLP is comparable to other side-chain placement methods.
GA Sunday, Poster #48
TAKAHASHI
Extraction of Microscopic and Macroscopic Structural Features from the DNA Sequence of Human Chromosome 22
Hironobu Takahashi, Yasuhide Mori, Oka Ryuichi, Real World Computing Partnership, Japan
The proposed method is an extended algorithm of so-called the Galaxy Clustering Method which have been developed for natural language processing, and is applied to a sequence categorization and a multiple alignment problem without any segmentation processes. The experimental results on the whole sequence of Human Chromosome 22 are shown.
GA Sunday, Poster #49
TATUSOVA
Analysis of Protein Coding Genes in Complete Microbial Genomes
Tatiana Tatusova, Sergei Resenchuk, Ilene Karsch-Mizrachi, James Ostell, National Institutes of Health, MD, USA
A new approach combines protein similarity search with phylogenetic classification. The analysis at the single gene level includes the comparison of amino-acid sequences of proteins encoded by complete genomes to proteins in current databases. Neighbor relationships to the proteins with known 3-dimensional structures are detected and linked to Cn3D viewer.
PS Monday, Poster #77
TAVERNA
Optimized and Evolved Protein Residue Interactions
D. M. Taverna, R. A. Goldstein, University of Michigan
We compare these optimized interactions of lattice model proteins to average contact energies derived from a population of evolving proteins. We find that highly designable structures maintain a higher average stability. This indicates highly designable structures will be very flexible towards adopting novel contact interactions for a variety of functions.
SA Tuesday, Poster #93
THOMPSON
DbClustal: Rapid and Reliable Global Multiple Alignments of Protein Sequences Detected by Database Searches
J. D. Thompson, F. Plewniak, J-C. Thierry, O. Poch, Institut de Genetique et de Biologie Moleculaire et Cellulaire, France
DbClustal addresses the important problem of the automatic multiple alignment of the top-scoring full-length sequences detected by a database homology search. By combining the advantages of both local and global alignment algorithms into a single system, DbClustal is able to provide accurate global alignments of highly divergent, complex sequence sets. http://igbmc.u-strasbg.fr:8080/ballast.html
GA Sunday, Poster #50
THOMPSON
Functionating the Proteome of Aquifex aeolicus
M. J. Thompson, M. Pellegrini, Protein Pathways, Inc.; E. M. Marcotte, T. O. Yeates, D. Eisenberg, Protein Pathways, Inc., University of California Los Angeles, USA
Functionating the proteomes of sequenced organisms is a great challenge in current computational biology. In contrast to homology-based methods, recent techniques infer cellular functions for proteins based on contextual and evolutionary information from whole proteomes. Here, we examine and illustrate the performance of these methods in functionating the proteome of Aquifex aeolicus.
SA Tuesday, Poster #94
TOMPA
A Statistical Method for Finding Transcription Factor Binding Sites
Martin Tompa, Saurabh Sinha, University of Washington, Seattle, WA
We present an enumerative statistical method for identifying good candidates for transcription factor binding sites. Unlike local search techniques that may not produce a global optimum, our method is guaranteed to produce the motifs with greatest z-scores. We also present the results of experiments in which our algorithm was used to locate candidate binding sites.
EV Tuesday, Poster #4
TORRENS
Assessing Molecular Diversity from Information Theory
Francisco Torrens, Universitat de Valéncia, Spain
Lin presented a novel approach for assessing molecular diversity based on Shannon information theory. This method is characterized by a strong tendency to oversample remote areas of the feature space and produce unbalanced designs. This poster demonstrates the limitation with some simple examples.
SA Tuesday, Poster #95
TROUKHAN
Multifaceted Approach to Gene Prediction
M. Troukhan, S. Brover, N. Alexandrov, Ceres, Inc. CA
A gene recognition system is described, that combines results from several gene prediction programs, several predictions of splicing sites, and the results of BLAST similarity searches into one HMM schema.
ME Sunday, Poster #74
TROYANSKAYA
Missing Value Estimation Methods for DNA Microarrays
Olga Troyanskaya, Michael Cantor, Orly Alter, Gavin Sherlock, Pat Brown, David Botstein, Robert Tibshirani, Trevor Hastie, Russ Altman, Stanford University
We present a comparative study of several missing value estimation methods in gene microarray data. A Singular Value Decomposition based method, weighted K-nearest neighbors, and row averaging were evaluated. We report results of the comparative experiments, providing recommendations for accurate estimation of missing microarray data under multiple sets of conditions.
SA Tuesday, Poster #96
ULLAH
REPRO: Finding Repeating Units in Protein Sequences
Jakir H. Ullah, R. A. George, Jaap Heringa, National Institute for Medical Research, London, UK
REPRO is an algorithm for recognition of distant repeats within protein sequences. We describe a new WWW server for the REPRO method with novel features to help interpret the results. New strategies have also been implemented into the basic REPRO code to gain a 25-fold speed increase. We also describe the use of REPRO in genome analysis and the development of a protein repeats database.
DB Monday, Poster #37
VAN BAKEL
The Development of GIDS: A Relational Database for High-Throughput Genotyping
H. H. M. J. van Bakel, P. L. Pearson, C. Wijmenga, University Medical Center Utrecht
The Genome Information Database System (GIDS) is being developed to store the large amount of genotypical and phenotypical data needed for the study of complex genetic diseases. Besides serving as a central data storage facility, GIDS will also play an active role in regulating the data flow in the laboratory.
GE Tuesday, Poster #30
VILO
Expression Profiler: An Integrated Tool for Gene Expression and Sequence Analysis
Jaak Vilo, Alvis Brazma, Alan Robinson, Wellcome Trust Genome Campus, UK
Expression Profiler (http://ep.ebi.ac.uk/) is a collection of web-based tools aimed for analysis of gene expression data from DNA microarray projects. Its four components perform clustering and visualization of expression data (EPCLUST), extraction of genome-specific information and sequences (GENOMES), submission of clustering results to other tools (URLMAP), and sequence pattern discovery (SPEXS).
BN Sunday, Poster #12
VOIT
Estimation of Metabolic Parameters from Genomic Data
Eberhard O. Voit, Medical University of South Carolina, USA
We have recently demonstrated that microarray data, quantifying gene expression profiles after a stimulus, can be analyzed and explained with metabolic pathway models designed within Biochemical Systems Theory (BST). The poster addresses the opposite direction: values of kinetic parameters can be deduced from, or at least constrained by, genomic information.
PSP Sunday, Poster #97
WALKER
Using ASTRAL for Protein Structure and Sequence Analysis
Nigel Walker, University of California, Berkeley; Patrice Koehl, Michael Levitt, Stanford University; Steven E. Brenner, UC Berkeley, USA
ASTRAL is a compendium of tools and databases for the study of protein structure through sequence analysis. One component of ASTRAL is a sequence database of the structural domains defined by SCOP. By combining sequence information and crystallographic data, we provide low redundancy sequence subsets, useful for homology based structure prediction.
PF Monday, Poster #60
WANG
Computational Study on the Basis of Drug Resistance of the HIV-1 Protease
Wei Wang, Peter A. Kollman, University of California, San Francisco, USA
We have studied several drug resistant mutants of the HIV-1 protease using molecular dynamics and continuum solvent model. Several crucial residues have been identified for causing the resistance if they are mutated. The computational study has suggested several strategies for designing new inhibitors to combat the drug resistance.
PF Monday, Poster #61
WAUGH
Large-scale Annotation of Functional and Structural Protein Sites
Allison Waugh, Glenn Williams, Russ B. Altman, Stanford University
The FEATURE system automates annotation of a query three-dimensional protein structure, predicting locations of functional and structural sites of interest. We are employing Legion, the University of Virginia's 'Worldwide Virtual Computer', to scan the Protein Data Bank for sites in FEATURE's site library, such as calcium binding, chloride binding, and redoxin active sites.
ME Sunday, Poster #75
WIERLING
Simulation and Characterization of Hybridization Signals in Oligonucleotide Fingerprinting Using Linear Models
Christoph K. Wierling, Michael Janitz, Ralf Herwig, Max-Planck Institut für Molekulare Genetik, Germany; Stefan Haas, Theoretische Bioinformatik, Heidelberg; Hans Lehrach, Uwe Radelof; Max-Planck Institut für Molekulare Genetik
We present a simulation tool of the oligonucleotide fingerprinting written in the object oriented programming language Python. Computation of hybridization signals is based on a linear model with consideration of single mismatches. Hybridization characteristics of oligonucleotides used in our experiments with human I.M.A.G.E. clones were derived. Reliability was tested by corresponding simulations.
PSP Sunday, Poster #98
WILD
A Bayesian Network Model for Protein Fold and Remote Homologue Recognition
D. L. Wild, A. Raval, Keck Graduate Institute of Applied Life Sciences, CA; Z. Ghahramani, University College London, UK
We describe a Bayesian network which learns primary, secondary structure and residue accessibility for proteins of known three-dimensional structure. In cross validation tests using the SCOP database the Bayesian network performs better in classifying proteins of known structural superfamily than a hidden Markov model trained on amino acid sequences.
GE Tuesday, Poster #31
WOOLF
A Fuzzy Logic Approach to Analyzing Gene Expression Data
Peter J. Woolf, Parke-Davis Pharmaceutical Research, Michigan; Yixin Wang, University of Michigan, Ann Arbor
We developed a fuzzy logic algorithm to transform gene expression values into qualitative descriptors that is evaluated by using heuristic rules. A model to find triplets of activators, repressors, and targets in a yeast data set was tested. This extends techniques such as clustering in that it generates connected networks of genes.
DB Monday, Poster #38
WU
PIR-Class: An Object-relational Protein Classification Database for Sequence Annotation and Genome Research
Cathy H. Wu, Chunlin Xiao, Zhenglin Hou, Winona C. Barker, Georgetown University Medical Center, Washington, DC
PIR-Class database is designed to provide an integrated platform for describing comprehensive family relationships and structural and functional features of proteins, with summary information of superfamilies/families, domains, and motifs, and rich links to various databases. The database is implemented in Oracle, searchable from http://pir.georgetown.edu/pirclass.html, and can support genomic research.
DB Monday, Poster #39
XENARIOS
Database of Interacting Proteins: A Benchmarking Tool for Protein-Protein Interactions Prediction
Ioannis Xenarios, Edward M. Marcotte, Michael Thompson, Xiaquon Joyce Duan, Lukasz Salwinsky, David Eisenberg, University of California, Los Angeles
The Database of Interacting Proteins (DIP) is a database that contains experimentally determined protein-protein interactions. This database has two main goals (i) giving an integrative database for browsing and efficiently extracting information about proteins of interest, (ii) being a useful tool to benchmark protein-protein prediction methods.
SA Tuesday, Poster #97
XU
DomainParser: A New Program for Protein Domain
Ying Xu, Dong Xu, Oak Ridge National Laboratory, TN; Harold N. Gabow, University of Colorado, Boulder, CO
We present a new algorithm for decomposing structure domains in proteins using the Ford-Fulkerson algorithm based on a network flow approach. The algorithm has been implemented as a computer program and a Web server, called DomainParser. Its performance compares favorably to existing programs. See http://compbio. ornl.gov/structure/domainparser/ for details.
SA Tuesday, Poster #98
YADA
A Hidden Markov Model Integrating Gene Finding Programs Discovers Gene Structure with High Accuracy
Tetsushi Yada, Yasushi Totoki, RIKEN Genomic Sciences Center; Yoshio Takaeda, Mitsubishi Research Institute, Inc.; Yoshiyuki Sakaki, RIKEN Genomic Sciences Center, The University of Tokyo; Toshihisa Takagi, The University of Tokyo
We developed an HMM which discovers genes by integrating several existing gene-finding programs. Though each program predicts exons in a self-serving manner in terms of reading frame and scoring method, our HMM predicts frame consistent genes by considering all exon scores. Preliminary experiments revealed that the HMM significantly improved the prediction accuracies at gene/exon levels.
PSP Sunday, Poster 99
YAN
GeneAtlasTM - An Automatic High-Throughput Pipeline for Structure Prediction and Function Assignment for Genomic Sequences
Lisa Yan, David Kitson. Zhan-Yang Zhu, Azat Badredinov, Krzysztof Olszewski, David Edward, Molecular Simulations, Inc., CA
GeneAtlas is an automated, high-throughput pipeline for the prediction of protein structure and function using sequence similarity detection, homology modeling, and fold recognition methods. Using a subset of structures from SCOP database, we demonstrate that GeneAtlas detects additional functional relationships in comparison with the widely used sequence searching method, PSI-BLAST.
DB Monday, Poster #40
YAO
Using a Formal Language to Define Biological Semantics: a Case Study
Guang Yao, University of Minnesota, Minneapolis, MN; Lynda B. M. Ellis, Toni Kazic, Washington University, MO
Glossa is a language which defines the semantics of biological ideas as executable code. We have used the University of Minnesota Biocatalysis/Biodegradation Database as a testbed for developing both Glossa and the underlying machinery for distributed computations (The Agora) among independent databases.
DB Monday, Poster #41
YARFITZ
Bioinformatics Needs Analysis by Data Mining
Stuart Yarfitz, Eugene Wan, Joanne West, University of Washington, WA
We have been studying researchers information seeking behavior and needs by analyzing bioinformatics services program records. Relational databases are used to organize and query data from Web logs, consultant's email folders, and software registration and log files. Facet analysis of consultation encounters is being used for ontology and thesaurus development.
PS Monday, Poster #78
YE
Defining Functional Residues in Furin: Structure-based Evolutionary Analysis
Yezhen Ye, Dafu Ding, Chinese Academy of SciencesAcademia Sinica, Shanghai, China
To explain the inhibition of furin by eglin-C mutants rationally and give guidance to further inhibitors design, we study the key residues in the furin by sequence-structure analysis. "Weighted evolutionary trace" and "correlated mutation analysis" methods are applied to detect specific residues. Afterwards, those residues were mapped onto the furin 3D structure constructed with the in-house program "Pmodeling" for further exploration.
GA Sunday, Poster #51
YEH
High-throughput SNP Mining of the Human Transcriptome
Raymond T. Yeh, Gabor T. Marth, Ian Korf, Warren R. Gish, Washington University School of Medicine, MO
We have developed a pipeline which clusters ESTs from dbEST to finished and draft quality human genomic sequence. We then mine these clusters for single-nucleotide polymorphisms using the POLYBAYES SNP discovery tool. Accurate clustering of ESTs allows reliable SNP detection free of false positives due to the existence of paralogs.
GE Tuesday, Poster #32
YEUNG
Validating Clustering for Gene Expression Data
Ka Yee Yeung, David R. Haynor, and Walter L. Ruzzo, University of Washington
Many clustering algorithms have been proposed to analyze gene expression data. We provide a systematic and quantitative framework to assess clustering results. Our methodology is to apply a clustering algorithm to the data from all but one experimental condition, and to use the remaining condition to assess clustering results.
EV Tuesday, Poster #5
YU
Statistical Significance of Probabilistic Hybrid Alignment
Yi-Kuo Yu, Florida Atlantic University; Ralf Bundschu, Terence Hwa, University of California, San Diego, USA
We propose a sequence alignment algorithm which is a hybrid between the Smith-Waterman and HMM-based methods. In contrast to all existing algorithms, the statistics of the hybrid alignment can be accurately characterized without doing massive simulation, even for position-dependent substitution and gap costs, without losing sensitivity or specificity.
BN Sunday, Poster #13
YUGI
Integrative Modeling of Mitochondrial Metabolism Using E-Cell System
Katsuyuki Yugi, Masaru Tomita, Keio University, Japan
We are in the process of constructing a kinetic model of mitochondrial metabolism using E-CELL System, a generic simulation environment for cellular simulation based on multiple ordinary differential equations. Our eventual goal is to apply the model to pathological analyses of mitochondrial diseases.
GE Tuesday, Poster #33
ZHANG
Discovery: A Tool for Class Prediction Using Gene Expression Data
Louxin Zhang, Zhang Zhuo, Song Zhu, Kent Ridge Digital Labs and Bioinformatics Centre, Singapore
The paper introduces a web tool for analysis gene expression data and class prediction, which is roughly like diagnosis: given a set of known classes, determine the correct class for a new sample. The method has great potential in improving cancer classification and diagnosis by using gene expression data.
PT Monday, Poster #86
ZHAO
A Novel System for the Identification of Potential Target Epitopes and Molecular Mimics for T Cell Receptor
Yingdong Zhao, National Institutes of Health, Bethesda, MD; Bruno Gran, National Institute of Neurological Disorder, MD; Clemencia Pinilla, Torrey Pines Institute for Molecular Studies, CA; Bernhard Hemmer, Silva Markovic-Plese, Roland Martin, National Institute of Neurological Disorder, MD; Richard Simon, National Institutes of Health
Combining mathematical models with combinatorial peptide library data, we developed a novel approach for the identification of T cell epitope and its molecular mimics by extensive database searches and neural network. This approach has been tested to be very efficient and accurate in several known/unknown CD4+ and CD8+ systems.
DB Monday, Poster #42
ZHOU
GeneX: A Generic Relational Schema for the Storage and Exchange of Gene Expression Data Using Relational Database and XML
Jiaye Zhou, Guanghong Chen, Greg Colello, Andrew Farmer, Harry Mangalam, National Center for Genome Resources, NM; Jason Stewart, Avestha Engraine Technologies, India; Mark Waugh, Jennifer Weller, , National Center for Genome Resources
We present the GeneX data storage schema, the XML data exchange format as well as the GeneX software tools implemented at the National Center for Genome Resources. Design and implementation details of the storage and analytical systems and the source code of the software tools can be found at www.ncgr.org/research/genex.
DB Monday, Poster #43
ZIMMER
Tools for Analysis of Groups of E. coli Genes
D. P. Zimmer, S. Kustu, University of California, Berkeley, CA.
We have developed a database and tools for analysis of groups of E. coli genes (http://coli.berkeley.edu/ecoli). The tools are designed to facilitate analysis of global expression experiments. The database is implemented in MySQL and programs are written in Perl on Linux/Intel.
EV Tuesday, Poster #6
ZMASEK
Automated Sequence Function Prediction Based on Phylogenetic Inference
Christian M. Zmasek and Sean R. Eddy, Washington University School of Medicine, MO
A method using phylogenetic inference for automated and robust sequence function prediction is described. At the core stands the automated comparison of gene trees with species trees, allowing detection of paralogs. A time efficient algorithm to accomplish this is discussed. In addition, a Java framework for working with phylogenetic trees is shown.
DB Monday, Poster #44
ZUO
An Intelligent Database System for Analysis of Signal Transduction Pathways
Zhuang Zuo, Gary Pestano, Ramprasad Ramakrishna, Kam-Chuen Jim, Phisiome Sciences, Inc., NJ
A fully interactive, Web-based intelligent database system, as part of the In Silico CellT, was developed for modeling signal transduction pathways. A T cell model built within this system successfully mimicked in-vivo changes of cytokine secretion when stimulated. This system offers enhanced functions including virtual knockouts and over-expression analyses.