Monday Poster Presentations: Sequence Analysis – Gene Finding, Alignment Techniques

43- Investigations of Modularity in A-kinase Anchoring Proteins.
Sanna Herrgard, Per Jambeck, Susan Taylor and Shankar Subramaniam, University of California, San Diego
We have analyzed the domain architecture of several proteins belonging to the functionally defined family of A-kinase anchoring proteins (AKAPs). Our domain analysis reveals that AKAPs typically consist of multiple domains, many of which have not been identified earlier. On the basis of one heretofore-unidentified domain, we predict that the AKAPs are implicated in a novel form of cell signaling crosstalk.

44 – Sub-clustering ESTs Using Sequence Conflicts
Bartlett G. Ailey, Byungkook Lee, Ira Pastan, National Cancer Institute, National Institutes of Health
An algorithm to cluster ESTs and thus find new tissue-specific genes from the EST database. The ESTs from a tissue are aligned to the EST database. The alignments are then used to cluster the ESTs in a predetermined order to optimize the quality of the cluster.

45 - Tuples, Tables, and Trees: A New Approach to the Discovery of Patterns in Biological Sequences
D. R. Argentar, K. M. Bloch, H. A. Holyst, A. R. Moser, W. T. Rogers, D. J. Underwood, A. G. Vaidyanathan, J. VanStekelenborg, DuPont Company
We describe a new pattern discovery algorithm based upon a novel set of data structures ("k-tuples," and associated tuple tables) combined with a new tree-traversal method, designed to efficiently discover all patterns at all levels of support. Results of applying this algorithm to the analysis of GPCRs will be shown.

46 - GeneDecoder--A Gene Finding System with Multi-stream HMMs
Kiyoshi Asai, Yutaka Ueno, Katunobu Itou, Electrotechnical Laboratories; Tetsushi Yada, Genomic Science Center, RIKEN
The GeneDecoder, a gene finding system for Eucaryote, have multi-stream Hidden Markov Models that integrate the sequence and various types of pre-processed information. This drastically reduces the complexity of the models, and enables flexible model designs by the weights for the streams dependent to each component models.

47 - Implementation of the Conserved Exon Method for Gene Finding
Vineet Bafna, Daniel H. Huson, Celera Genomics Corporation
The "Conserved Exon Method" is a new approach to gene prediction, based on the idea of looking for conserved protein sequences by comparing pairs of DNA sequences. It simultaneously predicts gene structures e.g. in both human and mouse genomic sequences. We demonstrate the program "CEMexplorer" that implements this method.

48 - New Developments to SCANPS: High Performance Parallel Iterated Protein Sequence Searching with Full Dynamic Programming and on-the-fly Statistics
Geoffrey J. Barton, Caleb Webber, Stephen M. J. Searle, EMBL-European Bioinformatics Institute (EBI)
SCANPS performs sequence database searches using full dynamic programming. Enhancements in three areas are described: 1) On-the-fly statistics based on the score distribution found with each search; 2) Iterative searching using profiles created from the results of each iteration; and 3) Parallel processing implementations using OpenMP and MPI libraries and Intel MMX instructions.

49 - New Distributed System for Large-scale Sequence Analyses
Douglass Blair, Gabriel Robins, University of Virginia
We implemented a new distributed system for performing large-scale sequence library comparisons. Our techniques are ideal for widely distributed computing architectures where many typical computers with modest interconnect bandwidth can be utilized in unison, and our system scales to an arbitrarily large number of processing nodes.

50 - Title: WWW Tools for Detecting SNPs and Alternative Splice Forms in ESTs
David Brett, Gerrit Lehmann, Jens Hanke, Stepfan Gross, Jens Reich, Max-Delbruck Center for Molecular Medicine, Germany; Peer Bork, EMBL
Two WWW tools that allow an end user to search for novel alternative splice forms or SNPs in a query protein or mRNA sequence. Candidate alternative splice forms or SNPs are detected by alignment with ESTs. The tools filter for paralogues, pseudogenes and sequencing errors. http://mahe.bioinf.mdcberlin.de/home.html

51 - A New Method in Rapid Significance Assessment of Smith-Waterman Alignments
Ralf Bundschuh, University of California, San Diego
For significance assessment of sequence alignments the score distribution of random alignments has to be known. In gapped alignment, only its shape is known. Its parameters must be determined by time consuming computations for every scoring system. We present an importance sampling technique that estimates these parameters within minutes to within 0.5%.

52 - The Topology of Recombination
Isabel K. Darcy, Stephen D. Levene, Kenneth Huffman, University of Texas at Dallas
Recombination results in the deletion/insertion or inversion of DNA sequences. When acting on circular DNA, many recombinases produce a spectrum of topologically knotted/catenated products. By solving mathematical equations determined by the topology of these products much information about the recombinase mechanism may be gained.

53 - New Models for Likelihood Analysis of Protein Sequences
M. W. Dimmic, J. S. Rest, D. P. Mindell, R. A. Goldstein, University of Michigan
We present several new maximum likelihood (ML) models for phylogenetic analysis that differ in the manner in which they accept new mutations. A Bayesian formalism is extended to account for lack of data and/or parameters. These models can potentially yield structural or functional information about the proteins of interest as well as information on the population level.

54 - Searching for Coding Regions in Neurospora crassa Using a Simple Codon Bias Algorithm and Consensus Sequences
Judith Galbraith, University of New Mexico, Albuquerque High Performance Computing Center; Don Natvig, Mary Anne Nelson, Laura Salter, University of New Mexico
To locate coding regions in sequences with no similarity to known genes, characteristics distinctive to Neurospora are examined. The exaggerated difference between the counts of cytosine and adenosine residues in the third position of codons is measured using a log ratio. Consensus sequences are used to create a table of potential exons with P-values. The results are available using a Web interface.

55 - Identifying Genes upon Which Positive Selection May Operate: A Promising Means of Identifying Novel Virulence Genes in Bacterial Pathogens
Junaid Gamieldien, Winston Hide; South African National Bioinformatics Institute
We have performed an intraspecies search for genes on which positive selection may operate between pairs of strains of Helicobacter pylori, Chlamydia trachomatis, and Neisseria meningitidis, to ascertain whether new virulence genes may be identified in this way. Thirty-four previously described virulence factors are demonstrated to be under positive selection.

56 - Computational Characterization of mRNA Localization Control Sequences in 3'-untranslated Sequences
Joel H. Graber, Charles R. Cantor, Martin Frith, Jahnavi C. Prasad, James O. Deshler, Boston University
We are searching for control sequences responsible for the subcellular localization of mRNA transcripts, specifically mRNAs such as Vg1 in X. laevis, which localizes to the vegetal half of the developing oocyte through protein interactions with a series of short repeated sequence elements in its 3'UTR.

57 - Computational Characterization of mRNA 3'-end-processing Control Sequences
Joel H. Graber, Charles R. Cantor, Scott C. Mohr, Temple F. Smith, Boston University
We have computationally investigated 3'-end-processing (cleavage and polyadenylation) control sequences through analysis of EST sequences from several different organisms. The control sequences consist of multiple, short elements, where the individual elements can vary widely from a consensus sequence and yet remain functional as part of the whole.

58 - A Method for Modeling Promoter Structures in a Non-heuristic Manner Using a Modified Self-Organizing Map Algorithm
Korbinian Grote, GSF-National Research Center for Environment and Health, Germany; Wilfried Brauer, Technische Universität München; Thomas Werner, GSF-National Research Center for Environment and Health
We present a new method based on a combination of different self-organizing map algorithms, that is able to derive highly specific formal models of promoter structures consisting of an ordered combination of transcription factor binding sites. Apart from a set of functionally related promoter sequences no additional knowledge is required.

59 - Sequence Data Analysis of Voltage-gated Ion Channel Proteins
Purnima Guda, Boojala V. B. Reddy, Mauricio Montal, Philip E. Bourne, University of California, San Diego
Voltage-gated ion channel (VGC) proteins mediate the selective diffusion of K+, Na+ or Ca2+ across cell membranes. We present an analysis of these multiple sequences to identify conserved and semi-conserved residues in the VGC family of sequences. The poster will detail our efforts to define a sequence based pattern/profile for proteins of the VGC family and its usefulness in modeling these proteins.

60 - Combining Protein Secondary Structure Prediction Methods with a New Multi-category SVM
Yann Guermeur, LORIA; Dominique Zelus, LBMC, France
Vapnik's learning theory has given birth to an inference paradigm, implemented in the Support Vector Machines (SVMs). The theory grounding these machines was developed for two-class discriminant analysis. Building upon a new uniform convergence result, we propose a theoretical foundation for multi-category SVMs. From this framework, original models are derived, which are used to combine protein secondary structure prediction methods.

61 - A Novel Gene Finding System
Fatemeh Haghighi, Columbia University; Mark Diekhans, David Haussler, University of California, Santa Cruz; William Noble Grundy, Columbia University
Accurate recognition of genes and gene components is central in annotation of data from the genome sequencing projects. We present a new gene-finding system that is designed to be scaleable and flexible with respect to the gene features it models, the machine learning algorithms it employs, and the range of experimental data from which it learns.

62 - AcE: A System for Analyzing the Accuracy of Gene Prediction Programs
William S. Hayes, Smith Kline-Beecham Pharmaceuticals
AcE, an Accuracy Evaluation tool for eukaryotic gene prediction, is displayed along with results from several test sets versus several eukaryotic gene prediction tools. Ease of use and flexibility were primary design considerations.

63 - Telegraph: A New Dynamic Programming Template Library
Ian Holmes, University of California, Berkeley; Guy St. C. Slater, Ewan Birney, Wellcome Trust Genome Campus, UK; Gerald M. Rubin, University of California, Berkeley
Many algorithms in bioinformatics rely on dynamic programming for the alignment of sequences to finite state machines of various architectures. Telegraph is a new library enabling exploitation of this paradigm using an implementation that is object-oriented, scriptable, modular, portable and probabilistic. This poster illustrates Telegraph with a number of examples.

64 - Assessing Sequence Comparison Methods Using a Pfam Annotated Database
Mingqian Huang, William R. Pearson, University of Virginia
We have developed a new reference database for evaluating protein and DNA search methods. FASTA is better than BLASTN for DNA sequence comparison and the FASTX/TFASTX programs are better than BLASTX/TBLASTN when frameshifts are present. The FASTA programs also provide reliable statistical estimates for protein and DNA sequence searches.

65 - Detection of Recombination in DNA Multiple Alignments with Hidden Markov Models
Dirk Husmeier, Frank Wright, Biomathematics and Statistics Scotland (BioSS) SCRI, UK
A hidden Markov model (HMM) is employed to detect recombination events in multiple alignments of DNA sequences. The emission probabilities in a given state are determined by the branching order (topology) and the branch lengths of the respective phylogenetic tree, while the transition probabilities depend on the global frequency of recombination.

66 - An Estimate of Statistics of Alternative Human Gene Transcripts by Assembling Partial cDNA Sequences
Ryotaro Irie, Yasuhiko Masuho, Keiichi Nagai, Helix Research Institute, Inc., Japan
We developed a DNA sequence assembly program (MakeAllContigs) that assembles partial cDNA sequences to create all contigs (sets of consistently aligned partial sequences) each of which could correspond to a transcript. MakeAllContigs was applied to UniGene clusters to estimate the statistics on alternative gene transcripts from human chromosome 22.

67 - Motif Mining Reveals More Links in Plant Stress Responses
Vidhya Jagannathan, Avestha Gengraine Technologies; S.Krishnaswamy, Madurai Kamaraj University,India; Jason Stewart, Villoo Morawala-Patell, Avestha Gengraine Technologies
An analysis was conducted to discover potential regions of homology between twelve abiotic and biotic stress-related genes and resulted in 3 non-overlapping motifs. A Position Specific Scoring Matrix and a Hidden Markov Model were generated and a total of 113 new stress related genes were identified.

68 - UTR Reconstruction and Analysis Using Genomically Aligned EST Sequences
Zhengyan Kan, Washington University; Warren Gish, Washington University School of Medicine; Eric Rouchka, Washington University; Jarret Glasscock, Washington University School of Medicine; David States, Washington University
We have developed a computational method to detect poly-A sites in human genomic sequences and to infer UTR sequences using genomically aligned ESTs. The accuracy of the method is evaluated by reconstructing functionally cloned transcript sequences. Using the method to analyze 908 genic regions, we estimate that 40-50% of human genes undergo alternative polyadenylation.

69 - OPTIMA: A New Score Function for Distantly Related Protein Sequence Comparison
Maricel Kann, Bin Qian, Richard Goldstein, University of Michigan
We describe a new method of determining the score function by optimizing the ability to discriminate between homologs and non-homologs. This new score function out-performs currently available score functions at identifying both distant and close homologies.

70 - The SAM-T99 Protein-search Method Works Well as a Multiple Aligner
Kevin Karplus, Birong Hu, University of California, Santa Cruz
We evaluated the SAM-T99 method as a multiple aligner, using the BAliBase multiple-alignment test suite. Using SAM-T99 -tuneup option, then building an HMM to align all sequences, SAM-T99(tuneup) seems comparable to other multiple aligners such as Clustal and PRPP (much better on reference 2, slightly worse on reference 1v1, comparable on the others).

71 -A Method to Detect Conserved Domains in Mouse Full-length cDNA Data
Hideya Kawaji, Osaka University, NTT Software Corporation; Hideo Matsuda, Osaka University; Shinji Kondo, RIKEN Genomic Sciences Center, Japan; Jun Kawai, Yoshihide Hayashizakik RIKEN Genomic Sciences Center, RIKEN Tsukuba Institute, Japan; Akihiro Hashimoto, Osaka University
We present a method for detecting conserved domains between cDNA sequences. The method explores a set of fixed-length and ungapped subsequences that exhibit similarity to each other by using the maximum-density subgraph algorithm. The results obtained by applying it to mouse full-length cDNA sequences are also presented.

72 - Identification of Alternatively Spliced Candidates Genes Present in Cancer-specific cDNA Libraries
Janet Kelso, Winston Hide, University of the Western Cape, South Africa
A novel transcript clustering and viewing system to mine cancer-specific EST libraries for aberrantly expressed genes is presented. Analysis of 13 881 ESTs provides in excess of 20 aberrantly processed candidate genes with between 2 and 7 alternative consensus sequences. Results demonstrate tissue and neoplastic state specificity of aberrant expression forms.

73 - A Computational Approach to Sequence Assembly Validation
Sun Kim, Li Liao, Michael P. Perry, Shiping Zhang, Jean-Francois Tomb, DuPont Experimental Station
Correct sequence assembly is critical to the success of large-scale sequencing projects. We propose a computational approach to sequence assembly validation. Among the several analysis techniques we developed, the "good-minus-bad clone analysis" approach correctly identified all misassembled regions in the assembly of the Mycoplasma genitalium genome.

74 - Modeling of Liver-specific Transcriptional Regulatory Regions
William Krivan, Wyeth Wasserman, Karolinska Institutet, Sweden
We present a model for the identification and analysis of transcriptional regulatory regions in promoters of genes with liver-specific expression. From a collection of experimentally determined regulatory regions, taken from the literature, we generated profiles for the binding specificity of each transcription factor. We use logistic regression to characterize the interaction between transcription factors bound to distinct elements within regulatory regions.

75 - Clustering Protein Sequences with a Linkage Graph
Li Liao, Sun Kim, Jean-Francois Tomb, DuPont Central Research and Development
By representing sequence similarity relationships among proteins as linkage graphs, this Candidate-Elimination-like clustering algorithm identifies the maximal quasi-complete subgraphs, a concept recently introduced by Matsuda, et al to represent protein clusters. The role of graph connectivity, as a confidence measure, is studied by analyzing the statistical distributions of similarity scores.

76 - Characterisation of the Epidermal Differentiation Complex (EDC) on Mouse Chromosome 3 and Human Chromosome 1q21
Pawel Listwan, Joseph A. Rothnagel, University of Queensland, Australia
Terminal differentiation of the mammalian epidermis involves the expression of structurally and functionally related genes found in the epidermal differentiation complex (EDC) located on human chromosome 1q21 and mouse chromosome 3. The research presented shows the techniques used in mapping studies, sequence analysis and database searches for novel human sequences.

77 - BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-expressed Genes
Xiaole Liu, Jun Liu, Douglas L. Brutlag, Stanford University
BioProspector finds DNA sequence motifs from upstream region of co-expressed genes. Using a modified Gibbs sampling algorithm, the program can consider background Markov dependency, find motifs present only in part of the sequences, motifs with two blocks separated by a variable length gap or palindromic patterns.

78 -Ancient Fungi Entrapped in Glacial Ice
Li-jun Ma, Scott O. Rogers, State University of New York, Syracuse
Culture and DNA polymerase chain reaction amplification methods were employed to revive and identify the ancient fungi and their nucleic acids in ice core sections from both the Greenland and Antarctic, up to 400 000 years old. Twenty-seven isolates were obtained. In addition three fungal sequences were amplified and sequenced directly from the inner of the ice core section.

79 - Prints-S Illuminates the Midnite Zone
J. E. Mabey, P. Scordis, T. K. Attwood, University of Manchester
PRINTS-S, a recent extension of the PRINTS databank, attempts to provide depth to protein family data by storing its discriminators within a relational database. Using this system, it is possible to describe hierarchical relationships within protein families. Hence, PRINTS-S is able to illuminate relationships that occur within the Midnight Zone.

80 -The Distribution of Log Likelihood Scores in Multiple Alignments
J. David Moroz, Terence Hwa, University of California, San Diego
The multiple alignment of a group of sequences provides a means for recognizing relationships among the sequences. A number of methods based on weight matrices have recently been proposed for finding good multiple alignments by maximizing the log-likelihood score. We present an analytical calculation of the expected distribution of scores that may be used to determine the statistical significance of alignments.

81 - A Hybrid Markov Chain--Neural Network System for the Exact Prediction of Eukaryotic Transcription Start Sites
Uwe Ohler, Georg Stemmer, Heinrich Niemann, Universitä Erlangen-Nürnberg
We have designed stochastic segment models with Markov chains as submodels for vertebrate promoter and non-promoter sequences. Their output, along with additional CpG island features, is fed into a neural network classifier. This new two-step approach leads to an accurate and specific prediction of transcription start sites in genomic DNA.

82 - Unsupervised Hidden Markov Models Trained on P. falciparum Chromosome 3 Detect Biologically Interesting Structure
Matthew R. Pocock, Thomas A. Down, Tim J. P. Hubbard, Wellcome Trust Genome Campus, UK
We have developed Hidden Markov models for whole chromosomes, containing pairs of states that emit long regions of complementary sequence. After training on the malaria P. falciparum chromosome 3 and tested on chromosome 2, they reproducibly learned states associated with telomeres, genes, sub-telomeric repeats, and structures associated with the var antigen genes.

83 - A Comparative Sequence Analysis System for Detecting RNA Patterns in mRNAs: Application to Prion Protein mRNAs
Guylaine Poisson, Isabelle Barrette, Patrick Gendron, Francois Major, Université de Montréal
We developed a Structural Pattern Finder computer program to detect structured RNA in human prion mRNA. The program measures folding free energies (FFE), which in this mRNA indicated the presence of several RNA patterns. One of particular interest is a pseudoknot that could affect prion protein translation.

84 - Modeling the Gap Length Distribution
Bin Qian, Richard Goldstein, University of Michigan
In order to improve the efficiency of sequence alignment methods, we want to find a more accurate model of insertions and deletions in structurally related and homologous proteins. We extracted the gap length distribution from the FSSP protein structure alignments. The results suggest new approaches to modeling insertions and deletions in sequence alignments.

85 - Promoter Finding and Linguistic
Minping Qian, Peking University
A method for mining the biological significant words in the core promoters is given. Our statistical results suggest that these words are transcription factor binding sites and they come in well-defined pairs and triples. A program finding promoters based on these words and vocabulary is provided.

86 - Empirical Determination of Effective Gap Penalties for Protein and DNA Sequence Alignment
Justin T. Reese, William R. Pearson, University of Virginia
We have examined empirically effective gap penalties for alignment of protein and DNA sequences. Sequences were artificially mutated from PAM20 to PAM 200 and embedded in a database of unrelated sequences to determine the gap penalties that yielded the greatest statistical significance for the most distant "homologs."

87 - Choosing Models for Similarity-based Gene Prediction: Profiles versus Single Sequences
William Reisdorf, Pankaj Agarwal, SmithKline Beecham Pharmaceuticals
Homology-based gene prediction should improve as the number of database entries increase. However, it is not always clear which database sequence will provide the best model for a new gene. We present an evaluation of using profile-HMMs, generated by HMMER, to guide homology-based gene predictions from GeneWise.

88 - Computational Identification and Classification of Repetitive Elements
Zhirong Bao, Sean Eddy, Washington University School of Medicine
Repetitive elements comprise a major part of eukaryotic genomes. We have developed and implemented a de novo approach for the identification and classification of these elements from genomic sequences, based on algorithmic extensions to the usual approach of single linkage clustering of BLAST hits.

89 - A Screen for Noncoding RNA Genes using Bayesian Classification of Interspecies Sequence Alignments
Elena Rivas, Sean R. Eddy, Washington University
We propose a probabilistic model to classify conserved regions from related genomes using an approach that incorporates structural with comparative information. It uses three distict models to calculate a Bayesian posterior probability that a sequence alignment should be tentatively classified as RNA, coding, or "other class.'' It is used to screen large databases of BLASTN alignments for novel noncoding RNA genes.

90 - A Method for Sequence Analysis Using Multivariate Analysis
Jun-Ichi Sagara, Kentaro Shimizu, The University of Tokyo
We developed a computational sequence analysis method based on mutivariate analysis. The method uses the principal component analysis and mutidimensional scaling method to classify the sequences into multiple groups similar sequences, and also to extract characteristic bases that are conserved within a group but differ from other groups.

91 – OpenBSA--Unbearable Lightness of Sequence Analysis
Martin Senger, Juha Muilu, Philip Lijnzaad, Alphonse Thanaraj; Alan Robinson, EMBL-EBI
In this poster we present freely available software components, which are based on Biomolecular Sequence Analysis specification (BSA). BSA is the first (together with Genomic Maps) standardised and adopted technology by Life Science Research Domain Task Force within Object Management Group. The BSA comprises the modules of biological objects and analysis mechanisms.

92 - Automatic Identification of Patterns for ProDom Families using Pratt
Florence Servant, Daniel Kahn, Jérôme Gouzy, INRA/CNRS LBMRPM; Florence Corpet, INRA LGC, France; Stig Dommarsnes, Inge Jonassen, University of Bergen
ProDom is a database of automatically derived protein families and is integrated into the InterPro database. We describe an approach that allows fully automatic discovery of patterns to be used for characterising protein families and demonstrate its use on ProDom families. We describe the developed system and summarise the results.

93 - DbClustal: Rapid and Reliable Global Multiple Alignments of Protein Sequences Detected by Database Searches
J. D. Thompson, F. Plewniak, J-C. Thierry, O. Poch, Institut de Genetique et de Biologie Moleculaire et Cellulaire
DbClustal addresses the important problem of the automatic multiple alignment of the top-scoring full-length sequences detected by a database homology search. By combining the advantages of both local and global alignment algorithms into a single system, DbClustal is able to provide accurate global alignments of highly divergent, complex sequence sets. http://igbmc.u-strasbg.fr:8080/ballast.html

94 - A Statistical Method for Finding Transcription Factor Binding Sites
Martin Tompa, Saurabh Sinha, University of Washington
We present an enumerative statistical method for identifying good candidates for transcription factor binding sites. Unlike local search techniques that may not produce a global optimum, our method is guaranteed to produce the motifs with greatest z-scores. We also present the results of experiments in which our algorithm was used to locate candidate binding sites.

95 - Multifaceted Approach to Gene Prediction
M. Troukhan, S. Brover, N. Alexandrov, Ceres, Inc.
A gene recognition system is described that combines results from several gene prediction programs, several predictions of splicing sites, and the results of BLAST similarity searches into one HMM schema.

96 - REPRO: Finding Repeating Units in Protein Sequences
Jakir H. Ullah, R. A. George, Jaap Heringa, National Institute for Medical Research, UK
REPRO is an algorithm for recognition of distant repeats within protein sequences. We describe a new WWW server for the REPRO method to help interpret the results. New strategies have been implemented into the basic REPRO code to gain a 25-fold speed increase. We describe the use of REPRO in genome analysis and the development of a protein repeats database.

97 - DomainParser: A New Program for Protein Domain Decomposition Using a Graph – Theoretic Approach
Ying Xu, Dong Xu, Oak Ridge National Laboratory; Harold N. Gabow, University of Colorado, Boulder
We present a new algorithm for decomposing structure domains in proteins using the Ford-Fulkerson algorithm based on a network flow approach. The algorithm has been implemented as a computer program and a Web server, called DomainParser. Its performance compares favorably to existing programs. http://compbio.ornl.gov/structure/domainparser/

98 - A Hidden Markov Model Integrating Gene Finding Programs Discovers Gene Structure with High Accuracy
Tetsushi Yada, Yasushi Totoki, RIKEN Genomic Sciences Center; Yoshio Takaeda, Mitsubishi Research Institute, Inc.; Yoshiyuki Sakaki, RIKEN Genomic Sciences Center, The University of Tokyo; Toshihisa Takagi, The University of Tokyo
We developed an HMM that discovers genes by integrating several existing gene-finding programs. Though each program predicts exons in a self-serving manner in terms of reading frame and scoring method, our HMM predicts frame consistent genes by considering all exon scores. Preliminary experiments revealed that the HMM significantly improved the prediction accuracies at gene/exon levels.

99 - Discovery and Characterization of 70,000 SNPs in EST sequences
Dirk Walther, Dinh Diep, Preeti Lal, Harold Hibbert, Inge Loudon, Karen Thomas, Alan Schafr, Richard Goold, Jeff Seilhamer, Macdonald Morris, Incyte Genomics, Inc.
A fully automated algorithm, SnooPer, was developed to identify polymorphisms in sequence databases. Mining of Incyte's EST database has resulted in the identification of over 70,000 candidate SNPs in genes at an experimentally determined validation rate of 80%. The poster will give an overview over their statistical properities.

POSTER PRESENTATIONS Tuesday, August 22 Sequence Analysis – Gene Finding, Alignment Techniques

POSTER PRESENTATIONS

Tuesday, August 22

Sequence Analysis – Gene Finding, Alignment Techniques