Posters.

 

Microarrays. A. B.

Systems Biology. A. B.

Functional Genomics. A. B.

Structural Biology. A. B.

Data Visualization. A. B.

Phylogeny and Evolution. A. B.

Data Mining. A. B.

Genome Annotation. A. B.

Sequence Comparison. A. B.

Predictive Methods. A. B.

New Frontiers. A. B.

 

 

 

Session A.

 

 

Microarrays.

 

1A. Normalization Methods for cDNA Microarrays. 17

2A. Comparative Data Mining for Microarrays: A Case Study Based on Multiple Myeloma. 17

3A. A Model of Congruence in E. coli, from Microarray Data to Literature Knowledge. 18

4A. Decision Tree Learning-Based Characterization of the Global Effects of Cocaine Abuse on Gene Expression in the Rat Brain. 18

5A. Performance Analysis of an Optimal Estimator of Gene Expression Ratio. 18

6A. Density Estimator Self-Organizing Map for Gene Expression Analysis. 18

7A. Analyzing Single-slide Microarray Gene Expression Data via a Bayesian Approach. 19

8A. FOREL Clustering Algorithms for Functional Genomics. 19

9A. Nonlinear Correlation for the Analysis of Gene Expression Data. 19

10A. An Empirical Bias Model for Normalization of Microarray Data. 19

11A. EMMA - ESTs Meet Microarrays. 19

12A. A Gene Expression Database for Immune Cells Transcriptomes. 20

13A. General Optimisation Approach for Normalising cDNA Microarray Data with Replicates. 20

14A. A High Throughput Pipeline for Validating Novel Splice Variants Discovered Using Whole-Genome Junction Arrays. 20

15A. LIMS for DNA Sequence and Microarray Analysis Based on AceDB. 20

16A. Application of Resampling-Based Multiple Testing in the Analysis of Gene Expression in Human Peripheral Nerve Injury. 20

17A. Variance Stabilization, Normalization, and Power Calculations of Affymetrix Microarray Data with Application to Autism. 21

18A. GenMAPP: A Tool for Viewing and Analyzing Microarray Data on Biological Pathways. 21

19A. Use of a Native XML-Based Database and Emerging Public Standards (MAGE-ML, MIAME) in Gene Expression Array Analysis. 21

20A. Quantitative Treatment of cDNA Microarray Data. 21

21A. ROSO : A Software to Search Optimized Oligonucleotide Probes for Microarrays. 21

22A. Expressionist Refiner - a Software Solution for Assessment of Quality and Correction of Gene Expression Data. 22

23A. Exploration of the Expression and Functional Annotation of Genes Identified by Representational Difference Analysis and Global Microarrays. 22

24A. Optimal Design of Oligos for Micro Array Gene Expression Profiling. 22

25A. On the Integration of Normalization Steps and SAM in cDNA Microarray Data Analysis. 22

26A. Partially Supervised Clustering: A Useful Tool for Investigating Coexpression of Gene Microarray Data. 23

27A. Osprey: An Application for a Wide Range of Oligonucleotide Design Tasks. 23

28A. An Image-Based Visualization of Microarray Features and Classification Results. 23

29A. Preprocessing Microarray Data to Improve Power of Multiple Testing. 23

30A. Classification of Spot Profile in Microarray Image Data by using Statistical Characteristics. 23

31A. R-MDAT: Development of GUI-based Microarray Data Analysis Tool Using R-Language. 24

32A. Client-Server Solution for Large Scale Gene Expression Data Mining. 24

33A. A Comparison of Clustering Techniques for Gene Expression Data. 24

34A. Can Molecular Mechanisms of Biological Processes Be Extracted from Expression Profiles? Case Study: Endothelial Contribution to Tumor-Induced Angiogenesis. 24

35A. MAC: A Dynamic Program that Tracks Samples between Microplates and Microarrays. 24

36A. Immunotranscriptomics: Analysis of Gene Expression in the Immune System. 25

37A. Filtering and Normalization Strategies for Microarray Generated Gene Expression Profiles. 25

38A(i). Blind Gene Classification : An ICA-based Gene Classification/Clustering Method. 25

38A(ii) Recovering Reproducible and Biologically Valuable Clusters from Noisy Array Data.

38A(iii) Evaluation of Data Reduction and Testing Methods for Oligonucleotide Arrays.

 

 

1A. Normalization Methods for cDNA Microarrays.

James J. Chen, Yi-Ju Chen and Chen-an Tsai. National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas, 72079, USA.

jchen@nctr.fda.gov

 

We present normalization methods for different cDNA microarray experiments. We consider both two-color (Cy3 and Cy5 fluorescence) and single color (or radioisotope) array images. A data set from a dye-swap experiment and another data set with several treatments and replicates are presented to illustrate the methods.

Long Abstract

 

 

2A. Comparative Data Mining for Microarrays: A Case Study Based on Multiple Myeloma.

David Page1,2, Fenghuang Zhan3, James Cussens4, Michael Waddell2, Johanna Hardin5, Bart Barlogie6 and John Shaughnessy, Jr.3. 1Dept. of Biostatistics and Medical Informatics, 2 Dept. of Computer Sciences University of Wisconsin Madison, WI 53706, 3Lambert Laboratory of Myeloma Genetics University of Arkansas for Medical Sciences Little Rock, AR 72205, 4Computer Science Dept. University of York Heslington, York, YO10 5DD, United Kingdom, 5Southwest Oncology Group Fred Hutchinson Cancer Research Center, Seattle, WA 98109 and 6Myeloma Institute for Research and Therapy, University of Arkansas for Medical Sciences, Little Rock, AR 72205.

mwaddell@biostat.wisc.edu

 

These studies compare SVMs, Bayesian networks, decision trees, boosted decision trees and voting (ensembles of decision stumps) on a new microarray data set for cancer (multiple myeloma) with over 100 samples. They provide evidence for several important lessons about how these techniques should be used for mining microarray data.

Long Abstract

 

 

3A. A Model of Congruence in E. coli, from Microarray Data to Literature Knowledge.

Rosa Maria Gutiérrez-Ríos1, David Rosenblueth2, Araceli Huerta-Moreno1 and Julio Collado-Vides1. 1CIFN, UNAM, A.P. 565-A, Cuernavaca, Morelos 62100, Mexico and 2IIMAS, UNAM, Ciudad Universitaria C.P. 04510, Mexico.

rmaria@cifn.unam.mx

 

Using the knowledge available in RegulonDB on regulation of transcription in E.coli, we evaluated the congruence of transcriptome experiments under different conditions. This qualitative comparison is based on a discrete model of transcriptional regulation involving direct and indirect effects following the regulatory network of interactions.

Long Abstract

 

 

4A. Decision Tree Learning-Based Characterization of the Global Effects of Cocaine Abuse on Gene Expression in the Rat Brain.

Changqing Ma1, Vanathi Gopalakrishnan2, David G. Peters3 and Robert E. Ferrell3. 1Department of Pathology, University of Pittsburgh School of Medicine, 2Department of Medicine, University of Pittsburgh School of Medicine and 3Department of Human Genetics, University of Pittsburgh School of Public Health.

chmst40@pitt.edu

 

A decision tree learning method was applied successfully to the microarray gene expression data obtained from cocaine-treated and normal tissue samples of the rat brain. The learned, highly accurate, and human-understandable model depicted a global change in gene expression among three brain regions in response to an acute dose of cocaine.

Long Abstract

 

 

5A. Performance Analysis of an Optimal Estimator of Gene Expression Ratio.

David Seale and Stephen W. Davies. Institute of Biomaterials and Biomedical Engineering and the Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto.

seale@ecf.utoronto.ca

 

An optimal estimator of gene expression has been developed using a stochastic signal model for DNA microarray images. Through simulation, the performance of the estimator is analyzed and compared with traditional methods of estimating gene expression ratios. We also explore the robustness of the estimator to model inaccuracies.

Long Abstract

 

 

6A. Density Estimator Self-Organizing Map for Gene Expression Analysis.

A.D. Pascual-Montano1, A. Sesto2, A.J. Rodríguez-Sánchez1, M. Navarro2, J.L. Jorcano2 and J.M. Carazo1. 1Biocomputing Unit. Centro Nacional de Biotecnología. Campus Universidad Autónoma de Madrid. Cantoblanco 28049. Madrid, Spain and 2Dept. Molecular and Cell Biology and Gene Therapy. CIEMAT. Av. Complutense 22, 28040 Madrid, Spain.

pascual@cnb.uam.es

 

We describe the application of a new variant of a Self-Organizing Map (KerDenSOM) in the context of microarray data analysis. KerDenSOM is specially designed to find a set of representative code vectors with a probability density as similar as possible to that of the input data. KerDenSOM is available at http://www.engene.cnb.uam.es.

Long Abstract

 

 

7A. Analyzing Single-slide Microarray Gene Expression Data via a Bayesian Approach.

Grace Shieh, T. H. Fan and B.C. Chung. Inst. of Statistical Sci., Academia Sinica, Dept. of Statistics, National Central Univ. and Inst. of Molecular Biology, Academia Sinica.

gshieh@stat.sinica.edu.tw

 

Log-scaled red channel intensities from a single-slide array were fitted to their corresponding green channel intensities via a regression model. The residuals, capturing information of differentially expressed genes were modeled by a mixture distribution, and its posterior distribution was obtained to identify differential expressions.

Long Abstract

 

 

8A. FOREL Clustering Algorithms for Functional Genomics.

Andrey Ptitsyn. Pennington Biomedical Research Center.

ptitsyaa@pbrc.edu

 

A new algorithm for clustering expression profiles has been developed, particularly for analysis of irregularly shaped patterns in multidimensional space. The algorithm combines computational effectiveness with versatility, and accepts a wide variety of distance and cluster quality metrics.

Long Abstract

 

 

9A. Nonlinear Correlation for the Analysis of Gene Expression Data.

Karen M. Bloch1 and Gonzalo R. Arce2. 1DuPont Company, Wilmington, Delaware, USA and 2University of Delaware, Newark, Delaware, USA.

Karen.M.Bloch@usa.dupont.com

 

Gene expression analysis techniques that rely on linear correlation metrics only detect relationships with a large linear component. This poster illustrates the ability to determine the relationships between gene patterns based on nonlinear correlation measurements. Our results indicate that improved clustering of genes can be achieved via the proposed method.

Long Abstract

 

 

10A. An Empirical Bias Model for Normalization of Microarray Data.

Qingwei Zhang1, Nobukazu Ono2, Yoshiyuki Takahara2 and Hiroshi Tanaka1. 1Bioinformatics Dept., Medical Research Institute of Tokyo Medical and Dental University and 2Pharmaceutical Research Laboratories, Ajinomoto Co., Inc.

zhang.com@mri.tmd.ac.jp

 

We propose an empirical bias model in data normalization for microarray data, in which we normalize the original data only by subtraction of background and a constant bias followed by principle component analysis and consequent coordinate system rotation. Data showed that the method is simple and practical.

Long Abstract

 

 

11A. EMMA - ESTs Meet Microarrays.

Michael Dondrup and Alexander Goesmann. Center for GenomeResearch - Bielefeld University.

michael.dondrup@Genetik.Uni-Bielefeld.de

 

We have developed an open source system for efficient storage and analysis of large scale microarray data. The EMMA system is based on an object oriented API-layer that encapsulates a relational (SQL) database compliant to the MIAME standard. Besides the integration of comprehensive normalization and data analysis methods EMMA offers an interface for the GenDB annotation system.

Long Abstract

 

 

12A. A Gene Expression Database for Immune Cells Transcriptomes.

A. Splendiani, C. Vizzardelli, N. Pavelka, M. Pelizzola, M. Capozzoli, F. Granucci and P. Ricciardi-Castagnoli. Univ. Milano Bicocca.

andrea.splendiani@unimib.it

 

We are generating a gene expression database for immune cells transcriptomes. This database is complemented by a collaborative environment and upload facility.

Long Abstract

 

 

13A. General Optimisation Approach for Normalising cDNA Microarray Data with Replicates.

Ilana Saarikko1, Timo Viljanen2, Riitta Lahesmaa3, Tapio Salakoski2 and Esa Uusipaikka4. 1 Turku Centre for Biotechnology, University of Turku, Finland, 2 Department of Information Technology and Turku Centre for Computer Science, University of Turku, Finland, 3Turku Centre for Biotechnology, University of Turku, Åbo Akademi University, Finland and 4Department of Statistics, University of Turku, Finland.

ilana.saarikko@btk.utu.fi

 

We study the normalisation of cDNA microarrays based on replicated experiments. We introduce the normalisation as an optimisation problem. The general target function complies with the most commonly used normalisation methods but also allows more complicated approaches. We evaluate the normalisation by applying the analysis of variance (ANOVA) to the data and calculating the standard deviation of replicated genes.

Long Abstract

 

 

14A. A High Throughput Pipeline for Validating Novel Splice Variants Discovered Using Whole-Genome Junction Arrays.

Patrick Loerch, Chris Armour, Phil-Garrett-Engele, Ralph Santos, Zhengyan Kan, Jason Johnson and Daniel Shoemaker. Rosetta Inpharmatics.

patrick_loerch@merck.com

 

Recent studies indicate that over half of all human genes undergo alternative splicing. A high-throughput microarray-based pipeline was developed to monitor and validate alternative splicing on a genome-wide scale. Here we present in detail the validation strategy to confirm array-based splicing predictions and optimize analysis algorithms.

Long Abstract

 

 

15A. LIMS for DNA Sequence and Microarray Analysis Based on AceDB.

1Ikjin Kim, 2Yoongang Hur, 3Hyunseung Lee, 3Youngsoo Park, 3Changpyo Hong, 3Yong-pyo Rim, and 1Jueson Maeng. 1Department of Life Science, Sogang University, Seoul 121-742, Korea, Department of 2Biology and 3Horticulture, Chungnam National University, Daejon 305-764, Korea.

griffin@sogang.ac.kr

 

LIMS for DNA sequence and microarray analysis based on AceDB are proposed and have been processed with 2,688 Brassica rapa ESTs and 10 DNA microarray experiments. All the analysis tools and information have been integrated in Brassica rapa database upon Arabidopsis genome database, and can be processed through GUI and internet.

Long Abstract

 

 

16A. Application of Resampling-Based Multiple Testing in the Analysis of Gene Expression in Human Peripheral Nerve Injury.

Yuanyuan Xiao1, Donglei Hu1, C. Anthony Hunt1, Mark R. Segal2, Andrew H. Ahn3, Douglas Rabert4 and Lakshmi Sangameswaran4 and Praveen Anand5. 1Dept. of Biopharmaceutical Sciences, University of California, San Francisco, CA, USA, 2Dept. of Epidemiology and Biostatistics, University of California, San Francisco, CA, USA, 3Dept. of Anatomy and Neurology, University of California, San Francisco, CA, USA and 4Roche Bioscience, Palo Alto, CA, USA and 5Dept. of Neurology, Imperial College School of Medicine, Hammersmith Hospital, UK.

yxiao@itsa.ucsf.edu

 

In this first microarray study of human brachical plexus injury, we applied and compared the performance of two recently proposed algorithms for tackling the multiple testing problems in microarrays. We illustrated the use of appropriate multiple testing methods for microarrays in monitoring differential gene expression between different biological states.

Long Abstract

 

 

17A. Variance Stabilization, Normalization, and Power Calculations of Affymetrix Microarray Data with Application to Autism.

Sue Geller, Jeff Gregg, Paul Hagerman and David M. Rocke. University of California, Davis.

scgeller@ucdavis.edu

 

We present a method of transforming (see Durbin, Hardin, Hawkins, and Rocke) and normalizing microarray data to roughly constant variance and normal errors. This is useful since many statistical methods are based these assumptions, as are standard power calculations. The method requires only a few chips/slides of biological replicates and can be thought of as machine calibration.

Long Abstract

 

 

18A. GenMAPP: A Tool for Viewing and Analyzing Microarray Data on Biological Pathways.

Kam D. Dahlquist, Nathan Salomonis, Karen Vranizan, Scott W. Doniger, Steven C. Lawlor, and Bruce R. Conklin. Gladstone Institute of Cardiovascular Disease, University of California, San Francisco.

kdahlquist@gladstone.ucsf.edu

 

GenMAPP (Gene MicroArray Pathway Profiler) is a free, stand-alone computer program for viewing and analyzing gene expression data on MAPPs representing biological pathways or other functional grouping of genes. GenMAPP automatically color-codes the genes on the MAPP according to criteria supplied by the user. GenMAPP is available from http://www.GenMAPP.org.

Long Abstract

 

 

19A. Use of a Native XML-Based Database and Emerging Public Standards (MAGE-ML, MIAME) in Gene Expression Array Analysis.

Ronald Taylor. Center for Computational Pharmacology, School of Medicine, University of Colorado.

ronald.taylor@uchsc.edu

 

The Center for Computational Pharmacology at the University of Colorado has created a database and web site for support of neuroscience research; in particular, to handle microarray data. A novel native XML database approach is used, in combination with the emerging public MIAME and MAGE-ML standards for gene expression data.

Long Abstract

 

 

20A. Quantitative Treatment of cDNA Microarray Data.

Tomokazu Konishi. Biotechnology Institute, Akita Prefectural University.

konishi@agri.akita-pu.ac.jp

 

A common data distribution is confirmed in many types of DNA tips used in microarray experiments. Based on the distribution, data can be processed in a quantitative manner. The processing method also identifies each experiment’s noise level, which determines the limit of signal detection, and provides information for data fidelity.

Long Abstract

 

 

21A. ROSO : A Software to Search Optimized Oligonucleotide Probes for Microarrays.

Nancie Reymond, Hubert Charles and Jean-Michel Fayard. Laboratory of Functional Biology, Insects and Interactions (BF2I), UMR INRA / INSA of Lyon, Villeurbanne, France.

nancie.reymond@jouy.inra.fr

 

The ROSO software helps searcher to design optimal oligonucleotide probes according to several criteria. The software calculates the oligonucleotide specificity and the value of Tm and it identifies the absence of secondary structures. The best probes are finally selected in regard to the localization and the stability criteria.

Long Abstract

 

 

22A. Expressionist Refiner - a Software Solution for Assessment of Quality and Correction of Gene Expression Data.

A. Goryachev, H. Rehrauer, M. Wendt, D. Bittner, J. Nickolenko, S. Bellamy, D. Ferguson, H. Vogel, T. Wormus and P. Norman. GeneData AG, Maulbeerstrasse 46, Basel, BS, CH-4016, Switzerland.

andrew.goryachev@genedata.com

 

Array-based methods are the core of high-throughput gene expression profiling in academic research and industry alike. Expressionist Refiner is specifically designed to bridge the gap between raw technology-dependent data and high-level data mining analysis. We present a systematic statistical approach for extracting true expression values from the raw microarray data.

Long Abstract

 

 

23A. Exploration of the Expression and Functional Annotation of Genes Identified by Representational Difference Analysis and Global Microarrays.

Tove Andersson1, Per Unneberg1, Peter Nilsson1, Jacob Odeberg1, John Quackenbush2 and Joakim Lundeberg3. 1Department of Biotechnology, Royal Institute of Technology, Stockholm, Sweden, 2The Institute for Genomic Research, Rockville, Maryland, USA and 3Department of Biotechnology, Royal Institute of Technology, Stockholm, Sweden.

tove@biotech.kth.se

 

We demonstrate how the mapping of elements in global microarrays to UniGene database entries, together with the visual projection of genes onto different classification systems, can be used to explore the differential expression and functional annotation of genes identified by representational difference analysis in a macrophage/foam cell model for atherosclerosis.

Long Abstract

 

 

24A. Optimal Design of Oligos for Micro Array Gene Expression Profiling.

Niels Tolstrup, Annett M. Frankel, Eivind Tøstesen, Jens G. Kolberg, Søren M. Echwald, Peter Stein Nielsen, Sakari Kauppinen and Henrik Vissing. Biomolecular Informatics and Expression Microarrays, Exiqon A/S.

tolstrup@exiqon.com

 

We present a system for the design of LNA modified oligos for micro arrays. It features LNA modified oligonucleotide secondary structure prediction, LNA spiked oligo melting temperature prediction, genome wide cross hybridization prediction and secondary structure prediction of the target. The system is available at http://lnatools.com/.

Long Abstract

 

 

25A. On the Integration of Normalization Steps and SAM in cDNA Microarray Data Analysis.

Daewoo Choi1, Hyo Sung Kim1 and Yong Sung Lee2. 1Department of Statistics, Hankuk Univ. of FS, Yongin, Korea and 2Department of Biochemistry, Hanyang Univ., Seoul, Korea.

3banjang@dreamwiz.com

 

In this study, we find some relationship between normalization steps and SAM using simulated data. Also, we propose an improved version of scaled normalization. As another result of our research, the algorithm of determining the asymmetric cut-points is discussed for increasing the power.

Long Abstract

 

 

26A. Partially Supervised Clustering: A Useful Tool for Investigating Coexpression of Gene Microarray Data.

R. Baumgartner1, R. Somorjai1, C. Bowman1, R. Summers1 and S. Booth2. 1Institute for Biodiagnostics, National Research Council Canada and 2National Microbiology Laboratory, Health Canada, Winnipeg, Manitoba, Canada.

christopher.bowman@nrc.ca

 

We present a hybrid gene expression study from an experiment of genes differentially expressed in response to a neurotoxic peptide. We show that efficient incorporation of a small amount of prior knowledge about gene labels significantly improves clustering results for very low contrast-to-noise ratios, especially beneficial for the gene microarrays.

Long Abstract

 

 

27A. Osprey: An Application for a Wide Range of Oligonucleotide Design Tasks.

Paul Gordon and Christoph Sensen. University of Calgary, Sun Center of Excellence for Visual Genomics, Faculty of Medicine, Department of Biochemistry and Molecular Biology, 3330 Hospital Drive NW, Calgary, Alberta, Canada, T2N 4N1.

gordonp@ucalgary.ca

 

Osprey is an application that efficiently automates the design of oligonucleotides for amplification, genome sequencing (walking and polishing), differential expression display and microarrays. Through a single program, users may design for any of these experiments, in the context of single clone to genome-scaled data. Constraints are based on thermodynamic models of annealing, rather than potentially inaccurate rules-of-thumb.

Long Abstract

 

 

28A. An Image-Based Visualization of Microarray Features and Classification Results.

Peter Bajcsy, Ph.D.1 and Lei Liu, Ph.D2. 1National Center for Supercomputing Applications, 605 East Springfield Avenue, Champaign, IL 61820 and 2The W. M. Keck Center for Comparative and Functional Genomics, University of Illinois at Urbana-Champaign, 330 ERML, 1201 W. Gregory Dr., Urbana, IL 6180.

pbajcsy@ncsa.uiuc.edu and leiliu@uiuc.edu

 

We present a novel image-based visualization approach for fast screening and inspection of DNA microarray data. The DNA microarray data, including laser scanned imagery, extracted features and labeled classification results, are displayed as high-dimensional images. Four types of data visualization are demonstrated, such as, visualization with spatial information pattern, multi-feature visualization, visualization of labeled classification results and multi-grid isualization.

Long Abstract

 

 

29A. Preprocessing Microarray Data to Improve Power of Multiple Testing.

Tzulip Phang and Larry Hunter. Center for Computational Pharmacology, University of Colorado Health Sciences Center.

tzu.phang@uchsc.edu

 

We have developed preprocessing steps to eliminate meaningless genes in microarray data analysis. The gene reduction process is important to enhance the power of multiple comparison correction procedures. These preprocessing methods will screen out low variance, low mRNA level, and inconsistent genes prior to standard statistical analysis.

Long Abstract

 

 

30A. Classification of Spot Profile in Microarray Image Data by using Statistical Characteristics.

Masaru Takeya1, Masao Iwamoto1, Takehiro Matsuda2, Norimichi Tsumura2 and Yoichi Miyake2. 1National Institute of Agrobiological Sciences and 2Chiba University.

katu@affrc.go.jp

 

The scratched spot or the additive noise spot in microarray image can be detected automatically. The mean, variance, skewness, and kurtosis of intensity values on each spot are used as characteristics of classification. This method is applied to classify spots of rice DNA microarray into four groups based on shape.

Long Abstract

 

 

31A. R-MDAT: Development of a GUI-based Microarray Data Analysis Tool Using R-Language.

Sang-Cheol Kim, Jee-Hyub Kim, Charny Park and Cheol-Goo Hur. National Center for Genome Information, KRIBB.

hurlee@mail.kribb.re.kr

 

R-MDAT is a GUI-based tool for DNA microarray analysis and was developed using R-language. R-MDAT integrated projection and clustering algorithms provided from R-project, and was designed to adopt new algorithms. Now, it supports PCA, SVD, Hierarchical, K-means, SOM, and plotting functions for quality control, and will support classification analysis in the future.

Long Abstract

 

 

32A. Client-Server Solution for Large Scale Gene Expression Data Mining.

Alexander Sturn, Bernhard Mlecnik, Roland Pieler, Johannes Rainer, Thomas Truskaller and Zlatko Trajanoski. Institute of Biomedical Engineering, Graz University of Technology, Krenngasse 37, 8010 Graz, Austria.

alexander.sturn@tugraz.at

 

We have developed a platform independent, flexible and scalable Java suite for large scale gene expression data mining, which integrates various computational intensive hierarchical and non hierarchical clustering algorithms. The suite includes a powerful client for data preparation and results visualization, an application server for computation and additional administration tools.

Long Abstract

 

 

33A. A Comparison of Clustering Techniques for Gene Expression Data.

Michiel de Hoon, Seiya Imoto and Satoru Miyano. University of Tokyo.

mdehoon@ims.u-tokyo.ac.jp

 

We have implemented several clustering algorithms (hierarchical, k-means, and Self-Organizing Maps) in a

C subroutine library, available at http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/software.html. We assessed the suitability of these methods by applying them to expression data, using several distance measures, and comparing the clustering solutions to each other and to existing biological knowledge.

Long Abstract

 

 

34A. Can Molecular Mechanisms of Biological Processes Be Extracted from Expression Profiles? Case Study: Endothelial Contribution to Tumor-Induced Angiogenesis.

Maria Novatchkova, Alexander Schleiffer and Frank Eisenhaber. Research Institute of Molecular Pathology (IMP).

novatchkova@imp.univie.ac.at

 

The applicability of gene expression data in obtaining mechanistic information rather than diagnostic profiles was studied using expression analysis of tumor-induced angiogenesis. The interpretation of a gene expression set using advanced sequence analysis methods allowed the description of molecular processes implementing angiogenesis but did not reveal key regulatory molecules.

http://mendel.imp.univie.ac.at/SEQUENCES/TEMS

Long Abstract

 

 

35A. MAC: A Dynamic Program that Tracks Samples between Microplates and Microarrays.

Kei-Hoi Cheung1, Janet Hager2,3, Kevin White4, Kenneth Williams2,3, Kenneth Nelson5, Michael Snyder3,5, Yu Li1 and Perry Miller1,5. 1Center for Medical Informatics, 2Keck Biotechnology Resource Laboratory at Yale, 3Department of Molecular Biophysics, 4Department of Genetics and 5Department of Molecular, Cellular and Developmental Biology Yale University, New Haven, CT 06520, USA.

kei.cheung@yale.edu

 

MAC is a Web-based program that dynamically maps coordinates between microplates and spotted microarrays. Not only is the program platform-independent, but it also allows users to enter a set of parameters that cover a wide range of array configurations. MAC is available at http://yam.med.yale.edu/cgi-bin/cgiwrap/kei/kc_mac_dev8.pl.

Long Abstract

 

 

36A. Immunotranscriptomics: Analysis of Gene Expression in the Immune System.

Helen J. Kirkbride, Josef A. Walker and Darren R. Flower. The Edward Jenner Institute for Vaccine Research.

helen.kirkbride@jenner.ac.uk

 

Microarrays are being used to determine the genes involved in regulation of the human immune system. Software has been developed to cluster genes using various criteria within their expression profiles, and display biological information concerning their putative function from external sources. Regulatory interactions between genes are investigated using Bayesian statistics.

Long Abstract

 

 

37A. Filtering and Normalization Strategies for Microarray Generated Gene Expression Profiles.

Jennifer Listgarten, Kathryn Graham, Sambasivarao Damaraju, John Mackey, Carol Cass and Brent Zanke. Cross Cancer Institute, Alberta Cancer Board, Edmonton.

jennilis@cancerboard.ab.ca

 

Little consensus exists in the microarray community on methods of normalization and filtering of data. An attempt at systematic comparison of various filtering and normalization schemes was made on a set of 30 tumor samples collected through the PolyomX project. Additionally, the interaction between normalization and filtering schemes was examined.

Long Abstract

 

 

38A(i). Blind Gene Classification : An ICA-based Gene Classification/Clustering Method.

Gen Hori, Masato Inoue, Shin-ichi Nishimura and Hiroyuki Nakahara. Brain Science Institute, RIKEN, Saitama, Japan.

hori@bsp.brain.riken.go.jp

 

Blind gene classification is a method of gene classification/clustering based on the independent component analysis (ICA) of gene expression data. It finds typical expression patterns from gene expression data, exploiting higher order statistical structure of the data, and classifies genes according to those typical expression patterns.

Long Abstract

 

 

38A(ii) Recovering Reproducible and Biologically Valuable Clusters from Noisy Array Data.

Donna Slonim1, Andrew Hill1, Ryan Baugh2 and Craig Hunter2. 1Department of Genomics, Wyeth Research, 35 CambridgePark Drive, Cambridge, MA 02140, USA and 2Dept. of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02139.

dslonim@wyeth.com

 

Clustering microarray data to determine global expression patterns in a biological system can be done by a variety of methods. Analyzing data from a C. elegans embryonic development series, we use promoter characterization to evaluate the practical impact of guaranteeing cluster quality through robustness estimation.

Long Abstract

 

 

38A(iii) Evaluation of Data Reduction and Testing Methods for Oligonucleotide Arrays.

Andrew Hill, Donna Slonim and Yizheng Li. Wyeth Research, 35 Cambridge Park Drive, Cambridge, MA 02140, USA.

ahill@genetics.com

 

Oligonucleotide arrays simultaneously monitor the expression levels of ~1e4 genes. Using replicate data and datasets from spiking experiments carried out by Affymetrix, we assess the statistical properties of array readouts. Comparison of normalization and probe summary schemes illustrates that recently described data reduction methodologies can outperform current methods.

Long Abstract

 

 

Systems Biology.

 

39A. Computational Model of the Mammalian Cell Cycle using Hybrid Petri Net. 25

40A. Modeling Mutations, Abnormal Processes, and Disease Phenotypes, using a Workflow/Petri Net Model. 26

41A. FCModeler: Dynamic Graph Display and Fuzzy Modeling of Regulatory and Metabolic Maps. 26

42A. CELLML: a Language for the Definition and Exchange of Cellular Models. 26

43A. BIOCAD for Constructing Gene Regulatory Networks. 26

44A. Probabilistic Boolean Networks as Models of Gene Regulatory Networks: From Inference to Intervention. 26

45A. In silico Studies of Integrated Gene Expression, Protein, and Metabolite Profiles. 27

46A. Modeling and Visualization of the Pattern Formation in Drosophila melanogaster by Genomic Object Net. 27

47A. Generating Petri Nets for Metabolic Network Modelling. 27

48A. Design and Implementation of a Knowledge-Base for Pharmacology. 27

49A. Identifying Functionally Important Protein Regions via Protein Family Correlation Analysis and Atomistic Energy Calculations. 27

50A. Regulatory Network of Transcriptional Regulation in E. coli. 28

51A. An In Silico Experimental Device for Drug Transport Research. 28

52A. A Domain-Specific Ontology and Knowledge Base for Signal Transduction. 28

53A. Pathway Reconstruction and Semantic Data Integration. 28

54A. Genomic Object Net (Ver.1.0): A Platform for Biopathway Modeling and Simulation. 28

55A. Modeling the Blood-Brain Barrier and Transporter Expression. 29

56A. Distributed Agent-Based Software Architectures for Bio-Pathway Simulation. 29

57A(i). Robustness in Models of the MAPK-cascade. 29

57A(ii). The BioCyc Collection of Pathway/Genome Databases.

 

39A. Computational Model of the Mammalian Cell Cycle using Hybrid Petri Net.

Takashi Yoshioka1, Shuji Kotani2 and Akihiko Konagaya2,3. 1NTT Data Co. Ltd., Kayabacho Tower, Shinkawa 1-21-2, Chuo-ku, Tokyo 104-0033, Japan, 2RIKEN Genomic Sciences Center (GSC), Suehiro-cho 1-7-22-W519, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan and 3Graduate School of Knowledge Science, Japan Advanced Institute of Science and Technology, Hokuriku (JAIST), Asahidai 1-1, Tatsunokuchi, Ishikawa 923-1292, Japan.

yossie@rd.nttdata.co.jp

 

To describe the molecular mechanism of the cell cycle, we developed a new computational model that makes use of hybrid-Petri nets. This model can easily reproduce “knockout” or overexpression of specific genes, so this model is a potentially useful means of exploring relations between gene functions and diseases.

Long Abstract

 

 

40A. Modeling Mutations, Abnormal Processes, and Disease Phenotypes, using a Workflow/Petri Net Model.

Mor Peleg, Irene S. Gabashvili and Russ B. Altman. Stanford Medical Informatics, Stanford University, Stanford, CA, USA .

peleg@smi.stanford.edu

 

We developed a qualitative model of molecular function that links genetic polymorphisms of tRNA to affected cellular processes and disease phenotypes. Our model is based on a Workflow model that is mapped to Petri Nets, and incorporates controlled biomedical vocabularies. It enables querying and simulation. The model is available at http://www.smi.stanford.edu/people/peleg/Process_Model.htm.

Long Abstract

 

 

41A. FCModeler: Dynamic Graph Display and Fuzzy Modeling of Regulatory and Metabolic Maps.

Julie Dickerson1, Zach Cox1 and Andy Fulmer2. 1Iowa State University and 2Proctor & Gamble.

julied@iastate.edu

 

The FCModeler tool uses fuzzy methods for modeling signal transduction, gene regulatory, and metabolic networks and interprets the results using fuzzy cognitive maps. The front end of the system is a dynamic graph visualization system that uses graph theoretic models to analyze the structure of metabolic maps and search for pathway interactions.

Long Abstract

 

 

42A. CELLML: a Language for the Definition and Exchange of Cellular Models.

Bullivant, DP, Cuellar, AA, Hedley, WJ, Hunter, PJ, Nelson, MR and Nielsen, PF. The Bioengineering Institute, The University of Auckland, New Zealand and Physiome Sciences, Inc, Princeton, New Jersey.

a.cuellar@auckland.ac.nz

 

CellML is an XML-based exchange format for describing the structure and underlying mathematics of cellular level models. The language specification has been developed by the Bioengineering Institute at the University of Auckland, in collaboration with Physiome Sciences, Inc. The language specification, tools, and examples are available on http://www.cellml.org.

Long Abstract

 

 

43A. BIOCAD for Constructing Gene Regulatory Networks.

Hiroyuki Kurata. Kyushu Institute of Technology.

kurata@bse.kyutech.ac.jp

 

BIOCAD is a powerful software suit with GUI for constructing a large-scale map of complicated biochemical reaction networks, where chemical reaction equations with detailed attribute tags were employed in an XML-based common representation. Novel notation for designing the map is developed by improving Kohn's method.

Long Abstract

 

 

44A. Probabilistic Boolean Networks as Models of Gene Regulatory Networks: From Inference to Intervention.

Ilya Shmulevich1, Edward Dougherty2, Seungchan Kim3 and Wei Zhang1. 1University of Texas MD Anderson Cancer Center, 2Texas A&M University and 3NHGRI/NIH.

is@ieee.org

 

We introduce Probabilistic Boolean Networks (PBN) as models of gene regulatory networks. PBNs incorporate rule-based dependencies between genes, allow the systematic study of global network dynamics, are able to cope with uncertainty, and permit the quantification of the relative influence of genes on other genes.

Long Abstract

 

 

45A. In silico Studies of Integrated Gene Expression, Protein, and Metabolite Profiles.

Matej Oresic and Tom Plasterer. Beyond Genomics, Inc.

moresic@beyondgenomics.com

 

We present a systems biology platform for analysis of metabolite, protein, and gene expression data. Our approach includes unique normalization and integration of data, as well as data- and knowledge-driven pathway analysis, which in turn, leads to new hypotheses and further experiments.

Long Abstract

 

 

46A. Modeling and Visualization of the Pattern Formation in Drosophila melanogaster by Genomic Object Net.

Hiroshi Matsuno1, Rie Yamane1, Sachie Fujita1, Naoyuki Yamasaki1, Ryutaro Murakami1 and Satoru Miyano2. 1Faculty of Science, Yamaguchi University and 2Human Genome Center, Institute of Medical Science, University of Tokyo.

matsuno@sci.yamaguchi-u.ac.jp

 

Genomic Object Net is the biosimulation system based on hybrid functional Petri net architecture and XML technology. With this system, we model two pattern formations by Notch signaling in Drosophila melanogaster and give intuitive visualization of these simulation results. Genomic Object Net can be accessed through http://www.GenomicObject.Net.

Long Abstract

 

 

47A. Generating Petri Nets for Metabolic Network Modelling.

Rainer König, Marco Weismüller and Roland Eils. Intelligent Bioinformatics Systems, German Cancer Research Center, 69120 Heidelberg, Germany.

r.koenig@dkfz.de

 

To define organism-specific detailed metabolic Petri nets, we compile a metabolic net by combining the metabolic KEGG-database, taking enzymatic reactions and classifications of metabolic subnets, with the sequence based databases Swissprot and Embl/Genbank, getting enzyme locations and organism specific abundancies. Furthermore, we compare the net's connectivity with scale free networks.

Long Abstract

 

 

48A. Design and Implementation of a Knowledge-Base for Pharmacology.

George Acquaah-Mensah and Larry Hunter. Center for Computational Pharmacology, University of Colorado School of Medicine, Denver, Colorado, USA.

George.Acquaah-Mensah@uchsc.edu

 

We present an object-oriented knowledge representation that captures key pharmacological concepts: ligands, proteins, processes and anatomy. Ligands interact in myriad ways with biomolecules (such as proteins), initiating, altering or even terminating diverse physiological and behavioral processes. These interactions lie at the heart of events of significance to pharmacology.

Long Abstract

 

 

49A. Identifying Functionally Important Protein Regions via Protein Family Correlation Analysis and Atomistic Energy Calculations.

Manish C. Saraf, Gregory L. Moore and Costas D. Maranas. Dept. of Chemical Engineering, Pennsylvania State University.

costas@psu.edu

 

A key challenge in using combinatorial libraries for protein engineering is that most of the library members are non-functional and often do not even fold correctly. We elucidate favorable/unfavorable residue combinations by (i) analyzing protein families for correlation and (ii) constructing sequence ensembles based on internal energy calculations with CHARMM.

Long Abstract

 

 

50A. Regulatory Network of Transcriptional Regulation in E. coli.

Gama-Castro S., A. Martínez-Antonio, R. Gutiérrez-Ríos, H.P. Salgado, M. Spínola, A. Santos-Zavaleta and J. Collado-Vides. Program of Computational Genomics, CIFN, UNAM, A,P, 565-A, Cuernavaca, Morelos 62100, México.

sgama@cifn.unam.mx

 

Making use of the information contained in RegulonDB database, the regulatory network of transcriptional regulators in E. coli was obtained. The interactions found in the network were analyzed in terms of known operons, promoters, and in terms of the conditions of the expression and repression of the regulated genes.

Long Abstract

 

 

51A. An in silico Experimental Device for Drug Transport Research.

Yu Liu, Carolyn Cummins and C. Anthony Hunt. UCSF/UCB Joint Graduate Group in Bioengineering, University of California, Berkeley, CA 94720, USA Department of Biopharmaceutical Sciences, University of California, San Francisco, CA 94143, USA.

yuliu@socrates.berkeley.edu

 

We use an agent-based model as an in silico experimental device that represents a mammalian intestinal epithelium. It mimics key features of the in vitro Caco-2 epithelial cell model. This innovative device has been designed to yield drug transport data that matches experimental data when given the drug’s physicochemical properties.

Long Abstract

 

 

52A. A Domain-Specific Ontology and Knowledge Base for Signal Transduction.

Jens Eberlein and Lawrence Hunter. Center for Computational Pharmacology, University of Colorado School of Medicine.

jens.eberlein@uchsc.edu

 

We have developed a rich knowledge base of known signal transduction pathways in order to (1) facilitate the interpretation of gene expression data, (2) to provide prior probabilities over the structure and content of probabilistic network models, and (3) to provide knowledge useful in natural language understanding.

Long Abstract

 

 

53A. Pathway Reconstruction and Semantic Data Integration.

Roland Carel, Ph.D., Krzysztof Jezak, Russ Green and Jack Pollard, Ph.D. 3rd Millennium.

jpollard@3rdmill.com

 

We will describe a semantic integration technology driven by ontologies for researchers analyzing biological pathways. This technology is capable of extracting and integrating data from a variety of genomic, interaction, and pathway sources and allows users to define their own integration processes.

Long Abstract

 

 

54A. Genomic Object Net (Ver.1.0): A Platform for Biopathway Modeling and Simulation.

Nagasaki, M., A. Doi, M. Sasaki, C.J. Savoie, H. Matsuno and S. Miyano. University of Tokyo.

miyano@ims.u-tokyo.ac.jp

 

Genomic Object Net is revised and implemented from scratch with JAVA so that it shall work as a general platform for biopathway modeling and simulation. The new version employs an extension of hybrid functional Petri net architechture, XML pathway/data documentation, and a GUI realizing more biologically intuitive usage.

Long Abstract

 

 

55A. Modeling the Blood-Brain Barrier and Transporter Expression.

Amina Qutub1, Tomoki Hashimoto2 and C. Anthony Hunt1,3. 1UCSF/UCB Joint Graduate Group in Bioengineering, Berkeley and San Francisco, 2UCSF Center for Cerebrovascular Research and 3Department of Biopharmaceuticals, UCSF San Francisco, CA, USA.

aminaq@socrates.berkeley.edu

 

This research presents a new computer-based experimental model that simulates transporter expression and function at the blood-brain barrier membrane. Specifically, this project focuses on modeling the essential parameters in the transport of glucose to the brain through the GLUT1 membrane transporter.

Long Abstract

 

 

56A. Distributed Agent-Based Software Architectures for Bio-Pathway Simulation.

Doheon Lee1, Kwang-Hyung Lee1 and Yonggwan Won2. 1Department of BioSystems, KAIST, Daejeon, Korea and 2Department of Computer Engineering, Chonnam Nat’l Univ. Gwangju, Korea.

dhlee@mail.kaist.ac.kr

 

This paper proposes software architectures for bio-pathway simulation based on distributed agent technology. The fundamental advantages of distributed agents are their effectiveness in handling heterogeneous pathway information and efficiency in scalability. They also utilize XML to represent semi-structured data such as quantitative reaction rules, and adopt Petri nets to model concurrent pathway executions.

Long Abstract

 

 

57A(i). Robustness in Models of the MAPK-cascade.

Nils Bluethgen and Hanspeter Herzel. Theoretical Biology, Humboldt University Berlin.

nils.bluethgen@itb.biologie.hu-berlin.de

 

Using dynamical models we analyse the robustness of features of signaling cascades. This allows us to predict in silico the function of a cascade. We show that the well conserved MAPK-cascade shows highly robust ultrasensitivity suggesting that the cascade works as a switch in the intracellular signaling network.

Long Abstract

 

 

57A(ii). The BioCyc Collection of Pathway/Genome Databases.

Peter D. Karp, Cindy Krieger, Suzanne Paley, John Pick and Pedro Romero. SRI International.

pkarp@ai.sri.com

 

BioCyc is a collection of Pathway/Genome Databases (PGDBs) that are available at the SRI Web site (http://BioCyc.org), and for local installation. The BioCyc collection includes the EcoCyc E. coli database, the MetaCyc database containing 450 metabolic pathways from 150 different organisms, and PGDBs from 12 additional microbes.

Long Abstract

 

 

Functional Genomics.

 

58A. No poster. 29

59A. The European Comparative Genetic Resource (EuroCOMP). 29

60A. Haplotype Variation in Human G Protein-Coupled Receptor (GPCR) Genes. 30

61A. Automated Analysis of MALDI-TOF Mass Spectrometry SNP Genotyping Data. 30

62A. Prediction of Protein Function and Interaction from Complete Genomes. 30

63A. In-silico Genomics : A Bioinformatic Analysis of Retinoblastoma-specific RB1 Mutational Spectra. 30

64A. Detection of Regulatory Circuits by Integration of Protein-Protein and Protein-DNA Interaction Data. 30

65A. GoPArc. 31

66A. ProDB: Integrating Proteome and Genome Data. 31

67A. GTKrio: an Open Source Environment for Functional Genomics. 31

68A. Interaction Generality, a Measurement to Assess Reliability of Protein-Protein Interaction. 31

69A. GelScape: An Interactive Web-based Gel Viewing and Annotation System. 31

70A. ESDB: A Web-Based Application for Analyzing and Managing DNA Sequences Disrupted By Tagged Sequence Mutagenesis From Mouse Embryonic Stem Cells. 32

71A. caCORE: A Package of Object Models, Databases, Controlled Vocabularies, and APIs for Genomic and Clinical Application Development. 32

72A. IPPRED: Server for Protein Interactions Inference. 32

73A. "Gene Discovery": Search for Regularities in Gene Promoters. 32

74A. The International Rice Information System: A Platform for Meta-Analysis of Rice Data. 33

75A. How many SNPs Do We Need for Whole-Genome Linkage Disequilibrium Mapping?  33

76A. Bioinformatics in a Fully Automated Cellular Perturbation Environment for the Identification of Medically Relevant Genes. 33

77A. The Rice Growth Monitoring System for Phenotypic Functional Analysis. 33

78A. Mapping DNA Regulatory Sequences to a Metabolic Network. 33

79A. ADDA: A Novel Method for Partitioning Protein Sequences into Domains. 33

80A. Protein-Protein Interaction Analysis of Transcription Factors and Its use for the Identification of Cooperatively Acting Transcription Factors. 34

81A. The Eukaryotic Core Proteome. 34

82A. Alternative Splice Variants as Natural Competitive Inhibitors of Known Proteins. 34

 

58A. No poster.

 

59A. The European Comparative Genetic Resource (EuroCOMP).

Jitka Sengerova1, Hrabe de Angelis M.2, Beckers J.2, Blanquet V.2, Fuchs H.2, Hahn A2., Schaeble K2, Schneider R2, Soewarto D.2, Tiedemann H2, Werner T.2, Peters J.3, Greenfield A.3, Nolan P.3, Herault Y.4, Scherf M5. 1EMBL-EBI, The Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK, 2GSF Munich, 3MRC Harwell, 4CDTA-CNRS Orleans and 5Genomatix Munich.

jitka@ebi.ac.uk

 

The mouse became an important model organism for understanding the function of novel genes and is an invaluable source for the investigation of human disease processes. The EuroCOMP aims to provide the scientific community with new mutations created in a large-scale phenotype-driven mutagenesis program. Relevant information will be accessible through the WWW.

Long Abstract

 

 

60A. Haplotype Variation in Human G Protein-Coupled Receptor (GPCR) Genes.

Ruhong Jiang, Debra Tanguay, Zongliang Mu, Ping Zhan, Manish Pungliya, Julie Schneider, Min Wei, Carole Harris-Kerr, Jicheng Duan, Krishnan Nandabalan, J. Claiborne Stephens, and Chuanbo Xu. Genaissance Pharmaceuticals, Inc. New Haven, CT 06511, USA.

r.jiang@genaissance.com

 

We have investigated haplotype variation in human GPCR genes by using a bioinformatics approach. Our findings showed that GPCR genes have a substantial amount of genetic variability in both coding and noncoding regions. An understanding of the nature of this genetic variability has important implications for drug development and optimization.

Long Abstract

 

 

61A. Automated Analysis of MALDI-TOF Mass Spectrometry SNP Genotyping Data.

Jan-Henner Wurmbach1, Jens Decker2, Thomas Schott1, Markus Kostrzewa2, Herbert Thiele1 and Wolfgang Pusch1. 1Bruker Daltonik Fahrenheitstrasse 4 28359 Bremen Germany and 2Bruker Saxonia Analytik Permoser Strasse 15 04318 Leipzig Germany.

WPU@bdal.de

 

Bioinformatic tools can follow various approaches for automated SNP genotyping from MALDI-TOF mass spectrometry raw data. Typically, the mass spectra displaying allele-specific detector molecules are analyzed by classical peak-picking based algorithms. However, for complex multiplex spectra we pursue also fuzzy-logic and fast Fourier transformation/correlation approaches.

Long Abstract

 

 

62A. Prediction of Protein Function and Interaction from Complete Genomes.

Bingding Huang and Yixue Li. Bioinformatics Center, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences.

markero@263.net

 

The fully sequenced genomes of over 50 organisms have led to the rapid growth of the sequence or its related databases, which leaves a vast amount of genes unannotated. For genome projects to be successful there should be fast and reliable ways to identify the functions of unknown proteins. Currently various new computational methods have been proposed to predict protein function and protein-protein interaction from genome sequences. Here,we would like to review several approaches for the functional assignment of uncharacterized proteins and then present a novel and effectively method to detect protein-protein interaction from complete sequenced genomes based on gene fusion events.

Long Abstract

 

 

63A. In-silico Genomics : A Bioinformatic Analysis of Retinoblastoma-specific RB1 Mutational Spectra.

S. Lithwick1, A. Fadiel2, A. J. Cuticchia1 and B. Gallie3. 1University of Toronto, Toronto, Canada, 2Centre for Computational Biology, Hospital for Sick Children, Toronto, Canada and 3Ontario Cancer Institute, Princess Margaret Hospital, Toronto, Canada.

stuart.lithwick@utoronto.ca

 

Non-coding regions of the RB1 gene have been examined using bioinformatics techniques to identify regulatory sequences that might be subject to mutation in certain retinoblastoma tumors, which lack coding sequence changes. Also, germline and somatic RB1 mutational datasets have been compared and contrasted to identify tissue-specific patterns.

Long Abstract

 

 

64A. Detection of Regulatory Circuits by Integration of Protein-Protein and Protein-DNA Interaction Data.

Esti Yeger-Lotem 1,2 and Hanah Margalit2. 1Department of Computer Science, Technion, Haifa 32000, Israel and 2Department of Molecular Genetics and Biotechnology, Faculty of Medicine, The Hebrew University, POB 12272, Jerusalem 91120.

estiy@cs.technion.ac.il

 

A major post-genomic challenge is to reveal the interplay between genes and proteins within a living cell. Using a novel application of classical graph algorithms we integrate data of yeast protein-protein and protein-DNA interactions, and exploit it for the discovery of simple and complex multi-level regulatory circuits.

Long Abstract

 

 

65A. GoPArc.

Daniela Bartels, Alexander Goesman and Folker Meyer. Bielefeld University, 33594 Bielefeld, Bielefeld, NRW, 33594, Germany.

daniela.bartels@genetik.uni-bielefeld.de

 

We present a comprehensive framework for the integration of gene ontologies and metabolic pathways. GoPArc provides an object oriented API to view genome, transcriptome and proteome data from the perspective of GO categories, TIGR roles, Monica Riley categories and KEGG pathways. The system is based on a relational database and offers an extensible interface for GenDB, EMMA and other systems.

Long Abstract

 

 

66A. ProDB: Integrating Proteome and Genome Data.

Andreas Wilke, Christian Rückert and Folker Meyer. Center for Genome Research, Bielefeld University, Germany.

Andreas.Wilke@genetik.uni-bielefeld.de

 

We have developed an open source system that acts as a connection layer between mass spectrometry data and the GenDB (http://gendb.genetik.uni-bielefeld.de) annotation system. The system allows analysis of the data with Mascot and results are automatically presented to user. The system is based on a relational database backend for storage mass spectra together with experimental data.

Long Abstract

 

 

67A. GTKrio: an Open Source Environment for Functional Genomics.

A.Splendiani, C.Vizzardelli, N.Pavelka, M.Pelizzola, M.Capozzoli, F.Granucci, P.Ricciardi-Castagnoli, E.Virzi and P.Fantucci. Univ. Milano Bicocca.

andrea.splendiani@unimib.it

 

We propose an open source project for gene expression data analysis and functional genomics. It will be built around a set of technologies that will allow cross-platform capabilities without sacrificing performance, and will be open and allow the integration of software written in many languages.

Long Abstract

 

 

68A. Interaction Generality, a Measurement to Assess Reliability of Protein-Protein Interaction.

Harukazu Suzuki, Rintaro Saito and Yoshihide Hayashizaki. Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center (GSC), Yokohama, Japan.

harukazu@gsc.riken.go.jp

We introduce the “interaction generality” measure, which can be used to computationally assess the reliability of the protein-protein interaction data by using only a list of interactions. We also report the results of networks of interaction data that we made more reliable by applying this method.

Long Abstract

 

 

69A. GelScape: An Interactive Web-based Gel Viewing and Annotation System.

Nelson Young, Zhan Chang and David S. Wishart. Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta.

ny1@ualberta.ca

 

GelScape is an interactive Java-based application that offers a wide variety of tools to annotate, view, and archive 1D or 2D gels. In addition, GelScape has comprehensive image manipulation capabilities that permit spot quantification as well as warping and matching of gels.

Long Abstract

 

 

70A. ESDB: A Web-Based Application for Analyzing and Managing DNA Sequences Disrupted By Tagged Sequence Mutagenesis From Mouse Embryonic Stem Cells.

Liu S-Y, Mou Y, Delange L, Tsuyuki D, Arapovic D and Hicks GG. Manitoba Institute of Cell Biology, University of Manitoba, Winnipeg, Manitoba.

soliu@cc.umanitoba.ca

 

ESDB was developed to organize information on thousands of new ES cell clones and improve the sequence analysis strategy to identify target genes from short sequence tags. The application will automate data entry, data validation, Blast results, keywords, and hyperlinks to gene-specific identifiers in GenBank, LinkOut, Pubmed, and OMIM.

Long Abstract

 

 

71A. caCORE: A Package of Object Models, Databases, Controlled Vocabularies, and APIs for Genomic and Clinical Application Development.

Peter A. Covitz1, Himanso Sahni2, Scott Gustafson2, Frank Hartel1, Sherri De Coronado1, Gilberto Fragoso1, Jean-Jaques Maurer3, Lisa Chatterjee3, Carl Schaefer1 and Kenneth Buetow1. 1National Cancer Institute Center for Bioinformatics, Rockville, MD, USA; 2Science Applications International Corporation, Annapolis, MD, USA; 3Oracle Corporation, Reston, VA, USA.

covitzp@mail.nih.gov

 

caCORE is a combination of largely open source technologies that support data management, access and vocabulary control for genomic, biological pathway, and clinical trials research. caCORE is intended to support biomedical applications that bridge genomics to clinical research. caCORE and an example of such a bridging application will be presented.

Long Abstract

 

 

72A. IPPRED: Server for Protein Interactions Inference.

Nicolas Goffard, Virginie Garcia, Alexis Groppi and Antoine de Daruvar. Centre de Bioinformatique Bordeaux, Université V. Segalen, Bordeaux 2, Bordeaux, France.

nicolas.goffard@pmtg.u-bordeaux2.fr

 

IPPRED is a WEB based server to infer protein interactions. This simple inference by homology allows to propose or to validate potential interactions. In some cases, the inference also gives indications concerning the domains involved in the interaction. IPPRED is available at http://cbi.labri.fr/ippred.

Long Abstract

 

 

73A. "Gene Discovery": Search for Regularities in Gene Promoters.

Yury L. Orlov, Mikhail A. Pozdniakov, Nikolay A. Kolchanov and Eugenii E. Vityaev. Institute of Cytology and Genetics, Novosibirsk, Russia.

orlov@bionet.nsc.ru

 

The PC software system "Gene Discovery" discovers regularities connecting nucleotide sequences of promoter regions with the functional class of corresponding genes. The system constructs specific oligonucleotide patterns as first-order logic expressions. These patterns selected for co-regulated genes annotated in the TRRD database (http://www.mgs.bionet.nsc.ru/mgs/gnw/trrd/) predict promoters with high specificity.

Long Abstract

 

 

74A. The International Rice Information System: A Platform for Meta-Analysis of Rice Data.

R. Bruskiewich, A. Cosico, W. Eusebio, A. Portugal, L. Ramos, T. Reyes, V. Ulat and C. G. McLaren International Rice Research Institute (IRRI) DAPO 7777, Metro Manila, Philippine. http://www.irri.org

r.bruskiewich@cgiar.org.

 

The International Rice Information System (IRIS, http://www.iris.irri.org) is the rice implementation of the International Crop Information System (ICIS, http://www.cgiar.org/icis), a database system for the management and integration of global information on breeding pedigrees and field characterization for any crop. IRIS is now being extended to rice functional genomics.

Long Abstract

 

 

75A. How many SNPs Do We Need for Whole-Genome Linkage Disequilibrium Mapping?

Maido Remm and Andres Metspalu. Estonian Biocentre and University of Tartu, ESTONIA.

mremm@ebc.ee

 

Using Chr21 data from Patil et al. Science 294:1719, we calculated haplotype block length distributions. Using these distributions, we simulated random haplotype blocks and estimated how many haplotype blocks and how many SNPs would be required to cover all EXONS in the whole human genome.

Long Abstract

 

 

76A. Bioinformatics in a Fully Automated Cellular Perturbation Environment for the Identification of Medically Relevant Genes.

S. Röhrig, A. Spychaj, R. Korn, A. Felber, R. Köckerbauer, B. Kesper and C. Hergersberg. Xantos Biomedicine AG, Fraunhoferstr. 22, D-82152, Martinsried, Germany.

s.roehrig@xantos.de

 

Discovery of medically relevant genes is our foremost interest. To this end we combine the screening of human cDNA libraries using our cell based and fully robotics assisted XantoScreen™ technology with expression profiling techniques and the development of an innovative bioinformatics platform.

Long Abstract

 

 

77A. The Rice Growth Monitoring System for Phenotypic Functional Analysis.

Takanari Tanabata1, Toru Ishizuka2, Makoto Takano3 and Tomoko Shinomura2. 1Hitachi Research Laboratory, 2Hitachi Central Research Laboratory and 3National Institute of Agrobiological Sciences.

ttanaba@hrl.hitachi.co.jp

 

We are developing an automatic digital imaging system for acquiring plant growth measurements necessary for detailed physiological / phenotypic analysis of gemmating rice seedling. By comparing WT and a phyA mutant, we were able to calculate differential growth rates of coleoptile, 1st, and 2nd leaves.

Long Abstract

 

 

78A. Mapping DNA Regulatory Sequences to a Metabolic Network.

Laurence Ettwiller, Johan Rung and Ewan Birney. EBI, genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

ettwille@ebi.ac.uk

 

We present a method which uncovers mappings between yeast regulatory sequences to protein function. About a hundred Patterns have significant overlap beween the network of genes linked by a pattern and the metabolic network suggesting that proteins acting on the same compounds have significantly higher chances to be regulated in synergy.

Long Abstract

 

 

79A. ADDA: A Novel Method for Partitioning Protein Sequences into Domains.

Andreas Heger and Liisa Holm. EMBL-EBI.

heger@ebi.ac.uk

 

ADDA is an algorithm for delineating domain boundaries in protein sequences. Domain boundaries in a query sequence are optimised in context with its BLAST neighbours. This allows avoidance of oversplitting due to truncated local alignments. http://www.ebi.ac.uk/sgg.

Long Abstract

 

 

80A. Protein-Protein Interaction Analysis of Transcription Factors and Its use for the Identification of Cooperatively Acting Transcription Factors.

Ricardo Bringas. Centro de Ingenieria Genetica y Biotecnologia, Ave 31 e/ 158 y 190, Cubanacan, Playa, Havana, Havana, 10600, Cuba.

bringas@cigb.edu.cu

 

The public available information on protein-protein interaction of yeast Saccharomyces cerevisiae is analyzed focusing in transcription factors. Networks of transcription-factors interactions are identified as well as potential pairs of transcription factors that co-regulate gene expression. Additionally we have clustered genes according to the interactions pattern they have and identified transcription regulation complexes.

Long Abstract

 

 

81A. The Eukaryotic Core Proteome.

Roland Krause, Karin Schleinkofer, Anne-Claude Gavin and Georg Casari. Cellzome, AGMeyerhofstr., 1, Heidelberg, 69117, Germany.

roland.krause@cellzome.com

 

Recent large scale studies of protein-protein interactions in Saccharomyces cerevisiae have expanded our knowledge to a comprehensive map of protein cooperation. Using the genome sequence of many eukaryotes we can extrapolate the shared eukaryotic core proteome to other organisms. In particular we studied human disorde rs for potential points of intervention.

Long Abstract

 

 

82A. Alternative Splice Variants as Natural Competitive Inhibitors of Known Proteins.

Erez Levanon, Dvir Dahary and Zurit Levine. Compugen LTD, Tel-Aviv, Israel.

erez@compugen.co.il

 

Alternative splicing variants that lack functional domains that exist in other variants of the same gene may act as natural competitive inhibitors. We use Compugen's LEADS platform with the human genome and the EST database to find such alternative splicing variants.

Long Abstract

 

 

 

Structural Biology.

 

83A. The SRS 3D Module: a New View of Structures, Integrating Sequences and Annotations. 34

84A. Consistency Matrices: Quantified Structure Alignments for Sets of Related Proteins. 34

85A. Exploration of Functional Sites in Complex RNA Folds and Macromolecular Assemblages. 35

86A. Structural Modeling for the Exploration of the Evolution of the Basic Helix-Loop-Helix Proteins. 35

87A. A Pattern-based Approach to Protein Feature Space: Use in Discrimination of Protein Fold. 35

88A. Side-Chain Freedom Analysis of Protein-Protein Interactions. 35

89A. ICBS: A Database of Protein-protein Interactions Mediated by Interchain Beta-sheet Formation. 35

90A. An Investigation of Domain: Domain Interactions Using The Pfam Database. 36

91A. TOPS: The database of the Topology of Protein Structures. 36

92A. Side chain flexibility for 1:n protein-protein docking. 36

93A. Protein Substructure Comparison: An Efficient Combinatorial Approach. 36

94A. Identification and Automating Calculation of Homologous Core Structures. 36

95A. Kernel Model Derived from Simplicial Contact Edges for Protein Folding. 37

96A. MALECON: Multiple  Protein Structural Alignment by a Step-Wise Multi-Solution Approach that Maximizes the Number of Spatially Equivalent Residues. 37

97A. Homology Modelling of the AdoMetDC Domain of the Bi-functional Ornithine-Decarboxylase / S-adenosylmethionine Decarboxylase Enzyme from Plasmodium falciparum. 37

98A. An Analysis of Protein Domain Linkers: their Classification and Role in Protein Folding. 37

99A. Chemical Shift Threading – A Direct Approach to Determining Protein Structure from Chemical Shift Data. 37

100A. Hidden Markov Models for Protein Recurrent Core Packing Arrangements. 38

101A. ConSurf: A Server for the Identification of Functional Regions in Proteins by Surface-Mapping of Phylogenetic Information -. 38

102A. Incorporating Sequence and Biochemical Information in Topological Models of Protien Structure Towards The Structural and Functional Genomics. 38

103A. Incorporation of Biochemical Knowledge in Geometric Hashing: a Statistical Assessment. 38

 

83A. The SRS 3D Module: a New View of Structures, Integrating Sequences and Annotations.

Sean O’Donoghue, Joachim E.W. Meyer, Andrea Scafferhans and Karsten Fries. LION Bioscience AG, Waldhoferstr. 98, Heidelberg, 69117, Germany.

sean.odonoghue@lionbioscience.com

 

SRS 3D allows users to easily find all structural data for a target sequence, select appropriate structures, and visualize structures with annotations from other databases. Currently, SRS 3D provides structural information for 120,000 sequences, 330,000 SwissProt sequence features, and 1.5 million InterPro domain annotations. Other annotation databases can be integrated.

Long Abstract

 

 

84A. Consistency Matrices: Quantified Structure Alignments for Sets of Related Proteins.

Ivo Van Walle1, I. Lasters2 and L. Wyns1. 1Dept. of Ultrastructure, Vrije Universiteit Brussel, Paardenstr. 65, Sint-Genesius-Rode, 1640, Belgium, 2Algonomics NV, www.algonomics.com.

ivwalle@vub.ac.be

 

Consistency matrices describe the comparability of 2 proteins in a more informative way than an alignment of their residues. They are derived from a pseudo multiple structure alignment and can quantify the spatial conservation of residue positions. Among other things, they can be used for threading and protein structure classification.

Long Abstract

 

 

85A. Exploration of Functional Sites in Complex RNA Folds and Macromolecular Assemblages.

D. Rey Banatao. UCSF/Stanford University, 19 Belleau Ave., Atherton, CA, 94027, USA.

banatao@smi.stanford.edu

 

We describe a novel approach for characterization of functional sites, particularly metal binding sites in complex RNA structures and assemblies. Identifying metal binding sites in RNA is crucial to understanding its structure and function. This method could potentially be applied to characterization of other RNA sites such as RNA-protein and RNA-small molecule interactions.

Long Abstract

 

 

86A. Structural Modeling for the Exploration of the Evolution of the Basic Helix-Loop-Helix Proteins.

Michael J. Buck and William R. Atchley. Department of Genetics, North Carolina State University, Raleigh, NC 27695-7614.

mjbuck@unity.ncsu.edu

 

The problem we are addressing is can we determine a detailed function using 3D models for uncharacterized members of the bHLH family, beyond what can be learned by using sequence alone. We have developed two structural comparison techniques to compare models, which allow us to detect structural/functional relationship hidden when using only sequence.

Long Abstract

 

 

87A. A Pattern-based Approach to Protein Feature Space: Use in Discrimination of Protein Fold.

Josef Panek. Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia.

j.panek@imb.uq.edu.au

 

An approach to feature space, for automated investigation of properties of protein groups was developed. The approach uses patterns of feature attributes of protein sequences based on the physical, chemical and structural properties of amino acids to construct the feature space. The approach allows fold recognition and produces folding rules.

Long Abstract

 

 

88A. Side-Chain Freedom Analysis of Protein-Protein Interactions

Christian Cole and Jim Warwicker. Department of Biomolecular Sciences, UMIST,

Manchester, United Kingdom.

c.cole@umist.ac.uk

 

Protein-protein interactions are crucial to many biological processes. Our understanding, however, of specific contacts is limited and difficult to predict. Side-chain conformational entropy is determined and employed to extend current shape complementarity methods. This yields further information and potential predictive power regarding the driving forces and specificity of these interactions.

Long Abstract

 

 

89A. ICBS: A Database of Protein-protein Interactions Mediated by Interchain Beta-sheet Formation.

Pierre-Francois Baisnée1, Gianluca Pollastri1, Yann Pécout2, James S. Nowick3and Pierre Baldi1. 1Department of Information and Computer Science, University of California, Irvine, CA 92697-3430, 2IUP Génie Physiologique et Informatique, University of Poitiers 86000 Poitiers, France and 3Department of Chemistry, University of California, Irvine, CA 92697-2025.

1pbaisnee@uci.edu, 2gpollast@uci.edu, 3upecout@ics.uci.edu, 4jsnowick@uci.edu, 5pfbaldi@uci.edu.

 

Contacts between the edges of protein beta-sheets play a role in protein-protein interactions that are central to healthy biological function and diseases ranging from AIDS to Huntington's disease. The ICBS database identifies, characterizes and ranks interchain beta-sheet interactions within entries in the Protein Data Bank. The database is available at: http://www.igb.uci.edu/servers/icbs/.

Long Abstract

 

 

90A. An Investigation of Domain: Domain Interactions Using The Pfam Database.

Robert D Finn, Mhairi Marshall and Alex Bateman. The Wellcome Sanger Institute, The Wellcome Genome Campus, Hinxton, Cambs, England CB10 1SA.

rdf@sanger.ac.uk

We have used domains defined in Pfam, a protein domain family database, to investigate structurally interacting domains. The interaction data has been incorporated into Pfam to allow investigation of potentially interacting sequences and the visualisation of interacting residues within a multiple sequence alignment.

Long Abstract

 

 

91A. TOPS: The database of the Topology of Protein Structures.

Ioannis Michalopoulos1, David R. Gilbert2, Gilleain M. Torrance2 and David R. Westhead1. 1School of Biochemistry and Molecular Biology University of Leeds, Leeds LS2 9JT, UK, and 2Department of Computing Science, University of Glasgow, Glasgow G12 8QQ, UK.

ioannis@bioinformatics.leeds.ac.uk

 

TOPS is a database containing information on the Protein Structure Topology. Our relational database of extended protein topology data of all solved structures automatically generates TOPS cartoons (topological abstractions of protein structures) and enables a machine-learning technique to define target topological patterns and match domains. TOPS is available at http://www.tops.leeds.ac.uk/.

Long Abstract

 

 

92A. Side chain flexibility for 1:n protein-protein docking.

Kerstin Koch, Steffen Neumann, Frank Zoellner and Gerhard Sagerer. Technical Faculty, Applied Computerscience Department, Bielefeld University.

kerstin@techfak.uni-bielefeld.de

 

During docking, proteins undergo conformational changes. We are investigating bound and unbound structures of proteins to introduce a new measurement for flexibility. New rotamer libraries for complexed and unbound structures are compiled. The probabilities for rotamer changes are investigated according to the likelihood of the unbound rotamer, secondary structure and rotamericity.

Long Abstract

 

 

93A. Protein Substructure Comparison: An Efficient Combinatorial Approach.

Andrew Binkowski, Bhaskar DasGupta and Jie Liang. University of Illinois at Chicago.

dasgupta@cs.uic.edu

 

An alternative approach to heuristic methods such as Monte Carlo search is developed for detecting substructure similarity of proteins. Based on the two-phase algorithm, this combinatorial method has a theoretical performance guarantee and runs quickly. Examples from the PDB will be shown illustrating the effectiveness of this algorithm.

Long Abstract

 

 

94A. Identification and Automating Calculation of Homologous Core Structures.

Jie Chen, Yanli Wang, Aron Marchler-Bauer and Steve H. Bryant. NCBI/NLM/NIH.

chenj@ncbi.nlm.nih.gov

 

Homologous Core Structure is defined based on comparative method as an indicator of evolutionary distance. The goal of an automatic calculation of a HCS is to allow fully automatic distinction of homologs and analogs.

Long Abstract

 

 

95A. Kernel Model Derived from Simplicial Contact Edges for Protein Folding.

Changyu Hu, Xiang Li and Jie Liang. Dept of Bioengineering, University of Illinois at Chicago.

jliang@uic.edu

 

Pairwise contact potentials cannot stabilize native proteins against decoys. Using edge simplices from alpha shape, we have developed a kernel model by SVM training. Our method succeeds in stabilizing a set of 456 proteins against 15 million decoys. It also has good performance on a test set of 204 proteins.

Long Abstract

 

 

96A. MALECON: Multiple  Protein Structural Alignment by a Step-Wise Multi-Solution Approach that Maximizes the Number of Spatially Equivalent Residues.

María Elena Ochagavía1,2 and Shoshana Wodak2. 1Center for Genetic Engineering and Biotechnology, Apartado Postal 6162. Ave. 31 e/ 158 y 190, Cubanacán, La Habana 10600, Cuba and 2Service de Conformation de Macromolecules Biologiques et Bioinformatique, Av. F.D. Roosevelt 50, P2- CP 160/16, B-1050 Brussels, Belgium.

ocha@cigb.edu.cu

 

MALECON is a new combinatorial procedure for multiple structural alignments, yielding several alternative solutions. In comparison to other methods, it produces improved definitions of the common structural core in structurally diverse proteins, and if the proteins are too diverse, distinct cores are automatically derived for different protein subsets.

Long Abstract

 

 

97A. Homology Modelling of the AdoMetDC Domain of the Bi-functional Ornithine-Decarboxylase / S-adenosylmethionine Decarboxylase Enzyme from Plasmodium falciparum.

G. Wells, F. Joubert, LM. Birkholtz and A.I. Louw. Department of Biochemistry, University of Pretoria.

gordon@tuks.co.za

 

The two regulatory activities of polyamine biosynthesis (ornithine decarboxylase / S-adenosylmethionine decarboxylase) are usually present in separate proteins. However in Plasmodium falciparum both activities occur as part of a bi-functional enzyme. The AdoMetDC domain has been modelled using modeller 6v1 based on the human crystal structure.

Long Abstract

 

 

98A. An Analysis of Protein Domain Linkers: their Classification and Role in Protein Folding.

Richard A. George and Jaap Heringa. National Institute for Medical Research, The Ridgeway, London, NW7 3RY, UK.

rgeorge@nimr.mrc.ac.uk

 

Recent advances in protein engineering have come from creating multi-functional chimeric proteins containing modules from various proteins. These modules are typically joined via an oligopeptide linker. Here we analyse the properties of naturally occurring inter-domain linkers with the aim to design linkers for domain fusion. A database of linkers is available via the Internet at http://mathbio.nimr.mrc.ac.uk

Long Abstract

 

 

99A. Chemical Shift Threading – A Direct Approach to Determining Protein Structure from Chemical Shift Data.

Haiyan Zhang, Albert Leung and David Wishart. Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Canada.

hzhang@redpoll.pharmacy.ualberta.ca

 

A method for the rapid determination of protein structures that uses only chemical shift data is described. This approach extends the concept of sequence threading and comparative model building to the realm of NMR spectroscopy. The program (known as THRIFTY) is available as a web-based server at http://redpoll.pharmacy.ualberta.ca.

Long Abstract

 

 

100A. Hidden Markov Models for Protein Recurrent Core Packing Arrangements.

Xin Yuan and Christopher Bystroff. Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180-3590, USA.

yuanx2@rpi.edu

 

Recurrent non-local packing arrangements in proteins (hydrophobic cores) can be modeled using modified Hidden Markov Models (HMM) with self-avoiding state pathways, state pair emissions and multiple-emission states. Using a simulated annealing approach, state-state connectivities are defined so that the states have three-dimensional meaning.

Long Abstract

 

 

101A. ConSurf: A Server for the Identification of Functional Regions in Proteins by Surface-Mapping of Phylogenetic Information -

Fabian Glaser1, Tal Pupko2, Inbal Paz1, Eric Martz3 and Nir Ben-Tal1.1Department of Biochemistry, George S.Wise Faculty of Life Sciences, Tel Aviv University, Israel, 2The Institute of Statistical Mathematics, Minami-Azabu, Minato-ku, Tokyo, Japan and 3Department of Microbiology, University of Massachusetts, Amherst MA, USA. http://consurf.tau.ac.il.

fabian@ashtoret.tau.ac.il

 

ConSurf is a web server for the identification of functional regions in proteins of known 3D-structure. It uses advanced phylogenetic algorithms to estimate the evolutionary rate of each amino acid site; functional regions are usually comprised of slow evolving residues. ConSurf is available at http://consurf.tau.ac.il

Long Abstract

 

 

102A. Incorporating Sequence and Biochemical Information in Topological Models of Protien Structure Towards The Structural and Functional Genomics

Mallika Veeramalai and David Gilbert. Bioinformatics Research Centre, Department of Computing Science, Glasgow Univeristy.

mallika@brc.dcs.gla.ac.uk

 

Significant algorithm development for TOPS Database to enhancing topological protein models with sequence (Structure-based annotated sequence) and important biochemical features such as ligand binding site, active site and that will lead to structure-sequence-function relationships. Interesting results would be valuable information to predict protein structure and function from sequence, as these problems remain key challenges of direct relevance to projects in structural and functional genomics.

Long Abstract

 

 

103A. Incorporation of Biochemical Knowledge in Geometric Hashing: a Statistical Assessment.

Jiménez-Lozano N1, Rodríguez A2, Chagoyen M1, Pascual-Montano A1, Carazo JM1 and Trelles O2. 1Unidad de Biocomputación, Centro Nacional de Biotecnología-CSIC and 2Departamento de Arquitectura de Computadores, Universidad de Málaga.

natalia@cnb.uam.es

 

Geometric Hashing is a structural comparison algorithm based only in geometrical criteria. Our work is based on the improvement of this method through the introduction of three environmental parameters: area buried, polar fraction and local secondary structure. Our objective is to reduce the number of similarities lacking reliable structural meaning.

Long Abstract

 

 

 

Data Visualization.

 

104A. Homograph: A Genome-Wide Protein Homology Visualizer. 39

105A. Visalization Techniques for Genomic Data. 39

106A. A Fast Algorithm for Visualizing and Analyzing Protein-Protein Interactions. 39

107A. A Partitioned Approach to Protein Interaction Mapping. 39

108A. XdomView: A Graphical Tool for Protein Domain and Exon Position Visualization. 39

109A. GoSurfer: A visualized Tool to Utilize Gene Ontology in Comparative Gene Analysis. 40

110A. Pattern Matching NMR Metabolic Profiling Data. 40

111A. WebGen-Net: A System for Support of Genetic Network Construction. 40

112A. A Scoring Algorithm for Ontology Information Extraction. 40

113A. No poster. 40

114A. Web-Based Biological Discovery using an Integrated Database. 40

 

104A. Homograph: A Genome-Wide Protein Homology Visualizer.

Cei Abreu-Goodger and Enrique Merino. Instituto de Biotecnologia, Universidad Autonoma Nacional de Mexico, Av. Universidad 2001, Cuernavaca, Morelos, 62210. Mexico.

cei@ibt.unam.mx

 

Homograph is an X-windows graphic interface for visualizing genome-wide protein homology. A dot-plot is used to represent every pair of proteins that pass a certain similarity threshold. The dots can be selected and colored by user determined categories, by searching the gene descriptions, or by a similarity score. Homograph is available at http://www.ibt.unam.mx/paginas/cei/homograph.html.

Long Abstract

 

 

105A. Visalization Techniques for Genomic Data.

Ann E. Loraine and Gregg A. Helt. Affymetrix, Inc.

ann_loraine@affymetrix.com

 

The high frequency of alternative splicing in human genes requires specialized visualization tools that reveal how variations in transcript structure affect the encoded proteins. Techniques for visualizing alternative splicing are presented, including semantic zooming, visual encoding of translation frame, and display of protein domains in the context of genomic sequence.

Long Abstract

 

 

106A. A Fast Algorithm for Visualizing and Analyzing Protein-Protein Interactions.

Byong-Hyon Ju, Byungku Park, Kyungsook Han and Jong H. Park. Department of Computer Science and Engineering, Inha University, Inchon 402-751, South Korea.

khan@inha.ac.kr

 

We have developed a new algorithm for visualizing large-scale protein-protein interactions, and implemented it in a program called InterViewer. InterViewer provides an integrated framework for querying databases and directly visualizes the query results. InterViewer is an order of magnitude faster than other force-directed programs, yet generates aesthetically pleasing drawings.

Long Abstract

 

 

107A. A Partitioned Approach to Protein Interaction Mapping.

Yanga Byun, Euna Jeong and Kyungsook Han. Department of Computer Science and Engineering, Inha University.

khan@inha.ac.kr

 

A common problem with many graph-drawing programs is that they become very slow when dealing with large-scale graphs such as protein interaction networks. We propose a new algorithm for efficiently visualizing large-scale protein interaction networks. It partitions nodes into three groups based on their interaction characteristics. An implementation of the algorithm is available at http://wilab.inha.ac.kr/protein.

Long Abstract

 

 

108A. XdomView: A Graphical Tool for Protein Domain and Exon Position Visualization.

Gopalan Vivek1, Tin Wee Tan1 and Shoba Ranganathan1,2. 1Department of Biochemistry and 2Department of Biological Sciences, National University of Singapore, Singapore 119260.

vivek@bic.nus.edu.sg

 

XdomView is a web-based graphical tool that maps protein structural domains and intron positions in eukaryotic homologues to the tertiary structure of a given protein. Since it visualizes the association of sequence signals to 3D structure in XdomView provides a valuable visualization environment for scientists working on eukaryotic gene organization, gene evolution, protein folding and protein structure classification. XdomView is available http://surya.bic.nus.edu.sg/xdom.

Long Abstract

 

 

109A. GoSurfer: A visualized Tool to Utilize Gene Ontology in Comparative Gene Analysis

Sheng Zhong1, Ovidiu Lipan1, Kai-Florian Storch3, Charles J. Weitz3, Wing H. Wong1,2. 1 Department of Biostatistics, Harvard School of Public Health, 655 Huntington Avenue, Boston, MA, 02115, USA., 2 Department of Statistics, Harvard University and 3 Department of Neurobiology, Harvard Medical School

szhong@hsph.harvard.edu

 

GoSurfer software uses Gene Ontology (GO) structured vocabulary to perform comparative gene analysis. GoSurfer visualizes gene ontology information as a tree, with nodes and branches representing GO terms and paths. Different sets of genes can be mapped onto the tree with different colors. GoSurfer is available at http://biosun1.harvard.edu/~szhong/GoSurfer.htm.

Long Abstract

 

 

110A. Pattern Matching NMR Metabolic Profiling Data.

Robert Stones, Adrian Charlton, Paul Brereton and Sarah Oehlschlager. Central Science Laboratory, Sand Hutton YO41 1LZ UK.

r.stones@csl.gov.uk

 

Metabolomics provides a powerful new tool for acquiring insight into functional biology. Snapshots of the levels of abundant small molecules within a cell, and how those levels change under different conditions, are very complementary to gene expression and proteomic studies.  We are currently developing computer tools for acquisition of NMR metabolic profiling data, and utilising computational approaches to analyse this type of data.

Long Abstract

 

 

111A. WebGen-Net: A System for Support of Genetic Network Construction.

Mikio Yoshida1, Yukari Shibagaki1, Hideaki Shimano1, Mariko Shima1, Tatsuo Kitahashi1, Yasutaro Fujita2 and Takashi Ito3. 1INTEC Web and Genome Informatics Corporation, Tokyo, Japan, 2Faculty of Engineering, Fukuyama University, Hiroshima Japan and 3Cancer Research Institute, Kanazawa University, Ishikawa, Japan.

yoshida@gic.intec.co.jp

 

WebGen-Net is a system for supporting construction of genetic networks. This system provides a graphical user interface to allow its users to interactively reconstruct genetic networks via referring biological relations collected from public databases and experimental results. A prototype system of WebGen-Net is freely available from http://genome.c.kanazawa-u.ac.jp/webgen.

Long Abstract

 

 

112A. A Scoring Algorithm for Ontology Information Extraction.

David Outteridge. Department of Pharmacology University of Colorado Health Sciences Center.

david.outteridge@uchsc.edu

 

Associating genes with ontology entries enables a reversed association from entries to genes. Extracting subsets of interesting entries, each describing many genes, is achieved by scoring. These scores are mapped to visual effects (coloured graphs) for clear identification of interesting entries.

Long Abstract

 

 

113A. No poster.

 

114A. Web-Based Biological Discovery using an Integrated Database.

D.F. Pinney, the Allgenes.org Development Group, the EPConDB Development Group, the Plasmodium Genome Database Collaborative and C.J. Stoeckert. Computational Biology and Informatics Laboratory, University of Pennsylvania, Philadelphia, Pennsylvania.

pinney@pcbi.upenn.edu

 

Allgenes.org, PlasmoDB, and EPConDB are web-based discovery tools relying on a single platform, GUS, which warehouses and integrates biological data from heterogeneous sources. Allgenes.org and PlasmoDB provide access to data for the human, mouse and Plasmodium falciparum genomes, respectively. EPConDB provides access to data for genes expressed in endocrine pancreas.

Long Abstract

 

 

 

Phylogeny and Evolution.

 

115A. The Relative Importance of Segmental and Tandem Duplications in Gene Family Evolution in Arabidopsis thaliana. 41

116A. Prediction of DNA-protein Interaction Domains for Transcription Factors using an Evolutionary Filtering Technique. 41

117A. Intra-genomic Comparison of Plant Genomes. 41

118A. Nucleotide Bias Affects Amino Acid Composition in Angiosperms. 41

119A. Aspartyl Proteases in Human and Model Organisms. 42

120A. RTKdb: Database of Receptor Tyrosine Kinase. 42

121A. Identification of Human-Mouse Orthologs at Evolutionary Conserved Locations from Pairwise Genome Comparison. 42

122A. Determining Factors for the Distribution of Simple (AC)n Microsatellites in the Rat Genome. 42

123A. Use LumberJack to Create and Compare a Forest of Phylogenetic Trees. 42

 

115A. The Relative Importance of Segmental and Tandem Duplications in Gene Family Evolution in Arabidopsis thaliana.

Steven B. Cannon, Andrew Baumgarten, Georgiana May and Nevin D. Young. University of Minnesota, USA.

cann0010@tc.umn.edu

 

We describe software to determine which genes in Arabidopsis thaliana have arisen through large segmental or local tandem duplications. We find that contributions made by these two processes differ greatly among gene families. We discuss the possible biological significance of these differences in gene family evolution. http://www.tc.umn.edu/~cann0010.

Long Abstract

 

 

116A. Prediction of DNA-protein Interaction Domains for Transcription Factors using an Evolutionary Filtering Technique.

Li Jia, Michael Clegg and Tao Jiang. Department of Computer Science, Department of Botany and Plant Science, University of California, Riverside, CA 92521.

lijia@cs.ucr.edu

 

R2R3-AtMYB is one of the largest transcription factor gene families in Arabidopsis. Using inferred ancestral sequences we have found that several lineages in the R2R3-AtMYB phylogeny were subjected to excess nonsynonymous substitutions which show the evidence of positive selection episodes.

Long Abstract

 

 

117A. Intra-genomic Comparison of Plant Genomes.

Aoife McLysaght, Steve Hampson, Brandon Gaut and Pierre Baldi. Department of Ecology and Evolutionary Biology, Department of Information and Computer Science Institute for Genomics and Bioinformatics, University of California, Irvine.

amclysag@uci.edu

 

LineUp is a heuristic algorithm designed to tackle the computationally intensive problem of identifying collinear regions within or between complex genomes. The method makes allowances for map error in the genome, and for the existence of multiple paralogues. LineUp was applied to the maize genome and results are shown.

Long Abstract

 

 

118A. Nucleotide Bias Affects Amino Acid Composition in Angiosperms.

Huai-chun Wang, Greg Singer and Donal Hickey. Department of Biology, University of Ottawa.

dhickey@uottawa.ca

 

We compared the amino acid composition of homologous protein sequences between rice and Arabidopsis and found that amino acid substitution pattern is predictable from the overall differences in G+C content between these two genomes. We also found corresponding, predictable differences in synonymous codon usage between the two genomes. The results demonstrate that changes in nucleotide composition have significant effects on the protein evolution pattern.

Long Abstract

 

 

119A. Aspartyl Proteases in Human and Model Organisms.

Alla M. Karnovsky and Cara L. Ruble. Pharmacia Corporation.

alla.karnovsky@pharmacia.com

 

Aspartyl proteases are a widely distributed and diverse protein family involved in a variety of cellular and biochemical processes ranging from digestion to cleavage of amyloid precursor protein. We used blast and profile HMMs to identify aspartyl proteases in human, worm, fly, and other model organisms and inferred the intron-exon structure of the aspartyl protease genes in C. elegans, D. melanogaster and H. sapiens. We use protein homology and splicing pattern to investigate the evolution of aspartyl proteases.

Long Abstract

 

 

120A. RTKdb: Database of Receptor Tyrosine Kinase.

Julien Grassot1, Guy Perrière2 and Guy Mouchiroud1. 1Centre de Génétique Moléculaire et Cellulaire, UMR CNRS 5534, Université Claude Bernard – Lyon 1, 43 bd. du 11 Novembre 1918, 69622 Villeurbanne Cedex, France and 2Laboratoire de Biométrie et Biologie Évolutive UMR CNRS 5558, Université Claude Bernard – Lyon 1, 43 bd. du 11 Novembre 1918, 69622 Villeurbanne Cedex, France.

grassot@biomserv.univ-lyon1.fr

 

Collecting RTK sequences would provide a good starting point as a new model for comparative and evolutionary studies applying to multigene families. In this context, we are developing the Tyrosine Kinase Receptors database (RTKdb), which is the only database on these proteins currently available, and can be accessed at http://pbil.univ-lyon1.fr.

Long Abstract

 

 

121A. Identification of Human-Mouse Orthologs at Evolutionary Conserved Locations from Pairwise Genome Comparison.

Fu Lu, Xiangqun Holly Zheng, Zhenyuan Wang, Zhong Fei, Jian Wang, Eric Zheng, Aaron Halpern, Vivien Bonazzi and Richard Mural. Celera Genomics, 45 W. Gude Dr. Rockville, MD 20850 USA.

fu.lu@celera.com

 

Comparative gene mapping provides an invaluable way to understand human biology and diseases. Here we describe an algorithm to identify mouse human orthologs that are at evolutionary conserved locations by pairwise genome comparison. A total of about 23000 pairs of orthologous genes were identified by this novel approach.

Long Abstract

 

 

122A. Determining Factors for the Distribution of Simple (AC)n Microsatellites in the Rat Genome.

Chin-Fu Chen1, Michael I. Jensen-Seaman2, Michael A. Thomas1, Jian Lu1, Simon N. Twigger1 and Peter J. Tonellato1. 1Bioinformatics Research Center and 2Human and Molecular Genetics Center Medical College of Wisconsin, Milwaukee, WI 53226, USA.

cfchen@mcw.edu

 

The distribution of rat (AC)n microsatellites DNA is bell-shaped. The heterozygosity of (AC)n repeats is positively correlated with the number of repeat, and higher repeat number of (AC)n associates with higher GC content of the surrounding sequences. There are significant differences in the amount of heterozygosity among chromosomes.

Long Abstract

 

 

123A. Use LumberJack to Create and Compare a Forest of Phylogenetic Trees.

Carolyn J. Lawrence1, R. Kelly Dawe1,2, and Russell L. Malmberg1. Departments of 1Plant Biology and 2Genetics, University of Georgia, Athens, GA USA.

carolyn@dogwood.botany.uga.edu

 

The ML heuristic search algorithms currently available are computationally impractical for large datasets (especially those consisting of protein sequences). We are developing a ML heuristic search tool called LumberJack that progressively jackknifes an alignment to generate multiple NJ trees, and then compares them based upon likelihood scores.

Long Abstract

 

 

 

Data Mining.

 

124A. Genomewide Analysis of Bkm Sequences (GATA repeats): Predominant Association with Sex Chromosomes and Potential Role in Higher Order Chromatin Organization and Function. 43

125A. Hierarchical Machine Learning for Characterising Protein Families. 43

126A. In Silico Comparison of the Transcriptome Derived from Purified Normal Breast Cells and Breast Tumor Cell Lines Reveals Candidate Upregulated Genes in Breast Tumor Cells. 43

127A. Extraction and Dynamic View of Biomolecular Interactions in Large Biomedical Text Database. 43

128A. Mining the literature for enzyme-disease associations. 44

129A. Search for Gene Regulatory cis-Elements in Arabidopsis thaliana. 44

130A. Semantic Similarity Measures Across the Gene Ontology: Relating Sequence to Annotation. 44

131A. Patterns, Pairings and Predictions of Catalytic DNA. 44

132A. GIMS a Data Warehouse for Management and Analysis of Complex Biological Data. 44

133A. A Simple Statistical Test for Evaluating Differences between Database Retrieval Methods. 45

134A. Proteome Databases: An Information Source for Bacterial Immunology. 45

135A. RED: a web-based system for the analysis, management, and dissemination of expressed sequence tags. 45

136A. Searching Microarray Time Series Data for Yeast Cell-Cycle Regulatory Genes. 45

137A. Application of Relational Database Tools for the Analysis of Large Proteomic Data Sets from Tandem Mass Spectrometry. 45

138A. Hierarchical Cluster Analysis and Classification of SAGE data. 46

139A. GEA:  a Toolkit for Gene Expression Analysis. 46

140A. A Method for Detecting Protein-Protein Interaction Rules. 46

141A. G-language Genome Analysis Environment. 46

142A. Data Handling for Detailed Phenotypic Characterization of Novel Mouse Phenotypes. 46

143A. Willo and Wisp: Data Management Systems for Mouse Genome Mapping and Sequencing. 47

144A. Novel Opportunities and Challenges in the Human Proteome: A Bioinformatics Strategy to Identify Splice Variants of Druggable Gene Targets. 47

145A. DrugBank: An Integrated Database for Drug Discovery and Pharmacogenomics. 47

146A. The Genomics Unified Schema (GUS). 47

147A. Compensation for Nucleotide Bias in a Genome by Representation as a Discrete Channel with Noise. 48

148A. Integrating Eukaryotic Genomes by Orthologous Groups: What is Unique about Apicomplexan Parasites?  48

149A. The CyberCell Database (CCDB). 48

150A. Functional Database System of Olfactory Receptors. 48

151A. A Standard Corpus for Evaluating Extraction of Molecular Interaction Pathway Information from Scientific Abstracts. 48

152A. A First Study of the Central Role of the Analyst in the Knowledge Discovery Process in Biology. 49

153A. Assessing the Compactness and Isolation of Individual Clusters Observed in Microarray Data. 49

154A. An Amino Acid Centered Database to Facilitate Protein Crystallisation. 49

155A. In silico reconstruction of metabolic network from unannotated raw genome sequences  49

156A. AFLP® Nucleotide Sequence Quality Assessment and Improvement Tool. 49

157A(i). Schema Mapping and Data Integration with Clio. 50

157A(ii). The GENIA Corpus: an Annotated Corpus in Molecular Biology Domain. 50

 

124A. Genomewide Analysis of Bkm Sequences (GATA repeats): Predominant Association with Sex Chromosomes and Potential Role in Higher Order Chromatin Organization and Function.

Subbaya Subramanian, R.K. Mishra and L. Singh. Centre for Cellular and Molecular Biology, W413 CCMB, Uppal Road, Hyderabad, Andra Pradesh, 500007, India.

subree@gene.ccmbindia.org

 

Genomewide analysis of GATA repeats revealed that GATA repeats are absent in prokaryotes and have been gradually accumulated in higher organisms during the course of evolution. In humans, the Y chromosome has the highest GATA repeat density, which is predominantly present in the Yq pericentric region. GATA repeats along the Y-chromosome and their close proximity to Matrix Associated Regions (GATA-MAR) may be demarking chromatin domains.

Long Abstract

 

 

125A. Hierarchical Machine Learning for Characterising Protein Families.

Aik Choon Tan and David Gilbert. Bioinformatics Research Centre, Department of Computer Science, University of Glasgow, Glasgow, U.K.

actan@brc.dcs.gla.ac.uk

 

The aim of this research is to construct a novel approach to induce comprehensive patterns from various data sources using knowledge discovery and hierarchical machine learning approach. We have applied this technique to characterise several protein families and our classifiers show higher accuracy and are more informative compared to the conventional methods.

Long Abstract

 

 

126A. In silico Comparison of the Transcriptome Derived from Purified Normal Breast Cells and Breast Tumor Cell Lines Reveals Candidate Upregulated Genes in Breast Tumor Cells.

Leerkes MR, Caballero OL, Mackay A, Torloni H, O'Hare MJ, Simpson AJ, and de Souza SJ. Ludwig Institute for Cancer Research, Rua Prof. Antonio Prudente, 109, 4 andar, Sao Paulo, SP, CEP 01509-010, Brazil.

leerkes@compbio.ludwig.org.br

 

We report here the combined use of ORESTES sequences generated in the FAPESP/LICR Human Cancer Genome Project and information available in the UniGene and SAGE databases to characterize the transcriptome of normal and breast tumor cells. We have identified 154 genes as candidates for overexpression in breast tumor cells.

Long Abstract

 

 

127A. Extraction and Dynamic View of Biomolecular Interactions in a Large Biomedical Text Database.

Yoshihiro Ohta1 and Shigeo Ihara2. 1Hitachi Central Research Laboratory and 2Research Center for Advanced Science and Technology, University of Tokyo.

yoh@crl.hitachi.co.jp

 

We constructed a biomolecular interaction detection system which is practical to handle the recent massive increase in literature on molecular biology. We comprehensively considered every needed elements, large-scale dictionary construction, biomolecular name detection, interaction detection and effective user-interface of network viewer. Our system can extract over 550,000 interactions with these elements.

Long Abstract

 

 

128A. Mining the literature for enzyme-disease associations.

Hofmann O. and Schomburg D. Department of Biochemistry, University of Cologne, Germany.

o.hofmann@smail.uni-koeln.de

 

A network of enzyme and disease correlations was built by automatically extracting relevant information from the abstracts of biomedical literature. The concept-based data and implemented visualization techniques allow easy navigation by researchers to explore knowledge available in literature databases and develop new theories.

Long Abstract

 

 

129A. Search for Gene Regulatory cis-Elements in Arabidopsis thaliana.

Judith Lucia Gomez, Ingo Dreyer and Bernd Mueller-Roeber. University of Potsdam, Institute for Biochemistry and Biology, Dept. Molecular Biology, Karl-Liebknechtstrasse 24/25, Haus 20, 14476 Golm, Germany.

jgomez@rz.uni-potsdam.de

 

The regulation of gene expression in plants is thought to result from the binding of different sets of transcription factors to promoter cis-elements. We tested HMM based methods to search for target genes in the model plant Arabidopsis thaliana, harbouring putative binding sites for transcription factors in their promoter regions.

Long Abstract

 

 

130A. Semantic Similarity Measures Across the Gene Ontology: Relating Sequence to Annotation.

P.W. Lord, R.D. Stevens, A. Brass and C.A.Goble. Dept. of Computer Science, Manchester University.

p.lord@russet.org.uk

 

The Gene Ontology (GO) represents knowledge of a gene product's function, process and location in a computationally amenable form. We present metrics for measuring the similarity between GO terms, and therefore semantic similarity of gene products annotated with them. We validate these metrics by comparing them with measures of sequence similarity, and show several uses for the measure.

Long Abstract

 

 

131A. Patterns, Pairings and Predictions of Catalytic DNA.

Gopinath Ganji 1,2, Yingfu Li 3, T. Chiang 2 and A. Jamie Cuticchia 1. 1 Department of Medical Biophysics, University of Toronto, 610 University Avenue, Toronto, Ontario, CANADA M5G 2M9, 2Center for Computational Biology, Hospital for Sick Children, 555 University Avenue, Toronto, Ontario, CANADA M5G 1X8, 3Department of Biochemistry and Department of Chemistry, McMaster University, 1200 Main St. W., Hamilton, Ontario, CANADA L8N 3Z5.

gopi.ganji@utoronto.ca

 

We hypothesize catalytic nucleic acids containing characteristic structural/functional sequence features can be probabilistically modeled and experimentally verified. By employing pattern discovery algorithms, structure prediction tools and machine learning methods, we have attempted to characterize various classes of SELEX-generated 'DNA kinases' (self-phosphorylating DNA) that recruit specific divalent metal cations and NTPs.

Long Abstract

 

 

132A. GIMS: a Data Warehouse for Management and Analysis of Complex Biological Data.

Michael Cornell, Paul Kirby, Cornelia Hedeler and Norman W Paton. Dept of Computer Science, Kilburn Building, University Of Manchester, M13 9PL.

mcornell@cs.man.ac.uk

 

GIMS is an object database that integrates genome sequence data with functional data (transcriptome, metabolome, metabolic pathway, proteome and protein-protein interactions) in a single data warehouse. GIMS can be browsed or analysed using canned queries. GIMS can be queried remotely using a Java application that can be downloaded from www.cs.man.ac.uk/~norm/gims.

Long Abstract

 

 

133A. A Simple Statistical Test for Evaluating Differences between Database Retrieval Methods.

John L Spouge and Eva Czabarka. National Center for Biotechnology Information, National Institutes of Health, Bethesda MD USA.

spouge@ncbi.nlm.nih.gov

 

One key problem in designing intelligent systems for molecular biology is to determine which of two database retrieval methods is better. We give a simple statistical test based on z-scores to calculate the significance of differences in ROC[n] scores and apply the method to assess putative improvements to PSI-BLAST.

Long Abstract

 

 

134A. Proteome Databases: An Information Source for Bacterial Immunology.

Klaus-Peter Pleissner1, Till Eifert2, Frank Schmidt1, Stefan H.E. Kaufmann1 and Peter R. Jungblut1. 1Max Planck Institute for Infection Biology, 2Algorithmus GmbH.

pleissner@mpiib-berlin.mpg.de

 

A collection of proteome databases which comprises 2-D gel proteins, Isotope Coded Affinity Tag (ICAT) and functional classification databases for Mycobacterium tuberculosis and Helicobacter pylori is presented. Information about genes, proteins and metabolic pathways serves as an information source for bacterial immunology. http://www.mpiib-berlin.mpg.de/2D-PAGE.

Long Abstract

 

 

135A. RED: a Web-based System for the Analysis, Management, and Dissemination of Expressed Sequence Tags.

Everitt R.#, Minnema S.E.#, Koster C.S., Olson R.A., Wride M.A. and Rancourt D.E. Department of Biochemistry and Molecular Biology, University of Calgary, Alberta, Canada.

#These authors contributed equally to this work.

seminnem@ucalgary.ca, reveritt@ucalgary.ca

 

The Rancourt EST Database (RED) is a web-based system for the analysis, management, and dissemination of expressed sequence tags (ESTs). RED represents a flexible template DNA sequence database that can be easily manipulated to suit the needs of other labs undertaking mid-size sequencing projects. Source code for RED and the associated tools is available from reveritt@ucalgary.ca. RED is publicly accessible via www.ucalgary.ca/~rancourt.

Long Abstract

 

 

136A. Searching Microarray Time Series Data for Yeast Cell-Cycle Regulatory Genes.

Holger Hoos, Andrew Kwon and Raymond Ng. Department of Computer Science, University of British Columbia.

tjkwon@cs.ubc.ca

 

We propose a new method for analyzing microarray time series data. We apply the method on yeast cell-cycle time series data to find potential regulatory pairs.The results indicate that our algorithm is able to find different true positive pairs from correlation and edge detection method by Filkov et al.

Long Abstract

 

 

137A. Application of Relational Database Tools for the Analysis of Large Proteomic Data Sets from Tandem Mass Spectrometry.

Ioannis K. Moutsatsos, Yongchang Qiu, Rod Hewick, Joseph Wooters, Steve Howes, Gary Van Domselaar and Patrick Cody. Wyeth Research Inc. 35 Cambridgepark Drive, Cambridge, MA02140, USA.

gvandomselaar@Wyeth.com

 

TurboSEQUEST is a search engine used for protein prediction from MS/MS spectra of protein digests. We have developed a custom application, SequestOnOracle, that extends TurboSEQUEST with the data management and analysis tools of a relational database. SequestOnOracle’s unique capabilities derive from its ability to summarize and compare the protein and peptide content from multiple TurboSEQUEST searches.

Long Abstract

 

 

138A. Hierarchical Cluster Analysis and Classification of SAGE data.

Raymond T. Ng, Jorg Sander, Monica C. Sleumer and Man Saint Yuen. University of British Columbia.

myuen@cs.ubc.ca

 

Under the assumption that although cells can look morphologically similar they may behave very differently at a molecular level, we present method for clustering and classifying SAGE libraries to detect the similarities and differences between various tissue types and neoplastic states.

Long Abstract

 

 

139A. GEA:  a Toolkit for Gene Expression Analysis.

Jessica M. Phan, Raymond Ng and Steve Jones. University of British Columbia.

myuen@cs.ubc.ca

 

We demonstrate the toolkit for Gene Expression Analyzer (GEA) used particularly with high dimensional data such as SAGE. GEA provides a graphical interface with operations for clustering, comparing and contrasting gene expressions in different SAGE clusters. GEA would eventually be linked to various bioinformatics databases for integrated genomic analysis.

Long Abstract

 

 

140A. A Method for Detecting Protein-Protein Interaction Rules.

Takuya Oyama1,4, Kagehiko Kitano1,4, Kenji Satou 2,4 and Takashi Ito3,4. 1INTEC Web and Genome Informatics Corporation, 2School of Knowledge Science, Japan Advanced Institute of Science and Technology, 3Cancer Research Institute, Kanazawa University and 4Institute for Bioinformatics Research and Development (BIRD), Japan Science and Technology Corporation (JST).

oyama@isl.intec.co.jp

 

We studied a method that can discover rules related to protein-protein interactions from accumulated protein-protein interaction data using data mining. The method reveals the relation between the features of mutually interacting proteins like that the protein having the feature F1 interacts with the protein having the feature F2.

Long Abstract

 

 

141A. G-language Genome Analysis Environment.

Kazuharu Arakawa1,2, Koya Mori11,3 and Masaru Tomita1,2. 1 Institute for Advanced Biosciences, Keio University, 2 Department of Environmental Information and 3Graduate School of Media and Governance.

gaou@g-language.org

 

G-language Genome Analysis Environment (G-language GAE) is a generic software package aimed for higher efficiency in bioinformatics analysis. G-language GAE has an interface as a set of Perl libraries for software development, and a graphical user interface for easy manipulation. It is distributed freely under GPL at http://www.g-language.org/.

Long Abstract

 

 

142A. Data Handling for Detailed Phenotypic Characterization of Novel Mouse Phenotypes.

E. C. J. Green1, J. Airey1, R. Cox1, Y. Hashim1, T. Hough1, Z. Lalanne1, K. E. Logan1, P.Nolan1, L.Visor1, A-M. Mallon1, P. Jones1, R. Selley1, A. Blake1, S. Greenaway1, H. J. Kirkbride1, J. Hunter2 and S. D. M. Brown1. 1Mouse Genome Center and Mammalian Genetics Unit, MRC, Harwell, Oxfordshire, OX11 0RD, UK and 2GlaxoSmithKline, New Frontiers Science Park, Harlow, CM19 5AW, UK.

e.green@har.mrc.ac.uk

 

A system is described for the management of data produced from the characterization of novel phenotypes, observed from a large scale ENU mutagenesis programme. A diversity of data is being produced from sources such as microarray technology, in situ hybridization studies, animal husbandry, candidate gene identification, DHPLC and sequencing.

Long Abstract

 

 

143A. Willo and Wisp: Data Management Systems for Mouse Genome Mapping and Sequencing.

M. Simon, S. Greenaway, A-M. Mallon, R. Selley, P. Jones, Z. Tymowska-Lalanne, S. Breeds, S. Smythe, H. Kirkbride, S. Webb, A. Blake, J. Weekes, E. Green, E. Mollison, P. Denny, P. Nolan, M. Goldsworthy, M. Strivens and S.D.M. Brown. Medical Research Council, Harwell, Oxon, Ox11 0RD, England.

m.simon@har.mrc.ac.uk

 

A vital element of high-throughput genetics is to capture the data generated from experimental procedures and to integrate and disseminate these results. Two data management systems have been developed to capture this data at the point of generation - Wisp and Willo. These capture data specifically generated from sequencing and genotyping.

Long Abstract

 

 

144A. Novel Opportunities and Challenges in the Human Proteome: A Bioinformatics Strategy to Identify Splice Variants of Druggable Gene Targets.

Chandra Ramanathan1, Shuba Gopal2, Bob Bruccoleri1, John Feder1, Gabe Mintier1 and Terry Gaasterland2. 1Bristol-Myers Squibb and 2The Rockefeller University.

Chandra.Ramanathan@bms.com

 

Identification, verification and biological characterization of splice variants are challenging tasks but essential to understand the observed biological complexity in humans. A systematic bioinformatics methods is being developed to mine the human genomic and EST data for identifying splice variant forms of druggable gene targets and correlate these variants with disease/tissue expression information available in various proprietary databases.

Long Abstract

 

 

145A. DrugBank: An Integrated Database for Drug Discovery and Pharmacogenomics.

Kavoos Basmenji, Zhan Chang, Bahram Habibi-Nazhad and David Wishart. Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, AB, T6G 2N8.

zchang@ualberta.ca

 

DrugBank is a web-enabled database developed to facilitate drug discovery and drug analysis. It combines drug information with drug target information to allow users the possibility of linking small molecule data with protein sequence/structure data. DrugBank can be accessed freely at http://redpoll.pharmacy.ualberta.ca/~zchang/cgi-bin/welcome.cgi.

 

Long Abstract

 

 

146A. The Genomics Unified Schema (GUS).

V. Babenko, B. Brunk, J.Crabtree, S. Diskin, S. Fischer, G. Grant, Y. Kondrahkin, L.Li, J. Liu, J. Mazzarelli, D. Pinney, A. Pizarro, E. Manduchi, S. McWeeney, J. Schug and C. Stoeckert. Center for Bioinformatics, University of Pennsylvania.

stevef@pcbi.upenn.edu

 

GUS is a comprehensive strongly typed relational schema and object-based software platform for integration, analysis, curation and presentation of sequence based genomics information. It has been used to model and/or mine human, mouse, plasmodium and the pancreas, and is suitable for model organisms in general. It is freely available.

Long Abstract

 

 

147A. Compensation for Nucleotide Bias in a Genome by Representation as a Discrete Channel with Noise.

Mark Schreiber1,2 and Chris Brown1. 1AgResearch NZ, PO Box 50034, Dunedin, New Zealand and 2Dept of Biochemistry, University of Otago, PO Box 56 Dunedin, New Zealand.

mark.schreiber@agresearch.co.nz

 

Calculation of the information content of motifs in genomes highly biased in nucleotide composition leads to overestimates of the amount of useful information in the motif. By treating a biased genome as a discrete channel with noise, in accordance with Shannon Information Theory, we were able to remove both ‘Distortion’ and ‘Noise’ from the motif and recover a more instructive biological ‘signal'.

Long Abstract

 

 

148A. Integrating Eukaryotic Genomes by Orthologous Groups: What is Unique about Apicomplexan Parasites?

Li Li, Brian Brunk, Christian J. Stoeckert Jr and David S. Roos. Department of Biology, University of Pennsylvania, Philadelphia, USA and Center for Bioinformatics, University of Pennsylvania, Philadelphia, USA.

lili4@sas.upenn.edu

 

To integrate eukaryotic sequence data with information on biological process we sought to identify orthologous groups by combining sequence similarity comparisons with graph clustering algorithms. Queries based on user-defined species distribution provide a snapshot of shared/diversified processe, facilitating (for example) the identification of targets for broad-spectrum antibiotics targeting apicomplexan parasites.

Long Abstract

 

 

149A. The CyberCell Database (CCDB).

Bahram Habibi-Nazhad, Melania Ruaini, Kavoos Basmenji and David S. Wishart. Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton AB T6G 2N8, Canada.

bahram@redpoll.pharmacy.ualberta.ca

 

The CyberCell Database (CCDB) is a web-enabled, user-friendly database containing previously published and electronically archived information on nearly every aspect of E. coli molecular biology and enzymology. We have also constructed CC3D which contains E. coli structural proteomic data and CCMD which contains the chemical database of metabolites and other small molecules used to support metabolic analysis.

Long Abstract

 

 

150A. Functional Database System of Olfactory Receptors.

Kazunori Miyazaki and Satoshi Itoh. Advanced Materials and Devices Laboratory, Corporate Research and Development Center, TOSHIBA CORPORATION.

kazun.miyazaki@toshiba.co.jp

 

We have developed a Java/XML-based functional database system of olfactory receptors (OR) from databases which can be accessed via Internet. The feature of our system is analyzing the XML data for OR by using predictive tools on the Web, and then accumulating annotated data in the analyzed one semi-automatically.

Long Abstract

 

 

151A. A Standard Corpus for Evaluating Extraction of Molecular Interaction Pathway Information from Scientific Abstracts.

Soon Heng Tan and See-Kiong Ng. Laboratories for Information Technology, Singapore.

soonheng@lit.org.sg

 

Vast amounts of molecular interaction pathway information can be extracted automatically from MEDLINE's abstracts using natural language processing, but progress has been hindered by a lack of a standard corpus for evaluation. We describe a test corpus we have created from our Pathweaver project that is suitable for such evaluation.

Long Abstract

 

 

152A. A First Study of the Central Role of the Analyst in the Knowledge Discovery Process in Biology.

Sandy Maumus1,2, Amedeo Napoli2, Rafik Taouil2 and Sophie Visvikis1. 1INSERM U525, Université Henri Poincaré (Nancy 1) – Faculté de Pharmacie, 30 rue Lionnois, 54000 Nancy, France and 2LORIA – UMR 7503, B.P. 239, 54506 Vandoeuvre-Lès-Nancy, France.

sandy.maumus@nancy.inserm.fr

 

Based on an application of symbolic data mining methods on a test database, we underline the role played by the analyst in the knowledge discovery process. Encouraged by positive results, we plan to apply these methods on a large database for investigating the relationships between gene polymorphisms and cardiovascular diseases intermediate phenotypes.

Long Abstract

 

 

153A. Assessing the Compactness and Isolation of Individual Clusters Observed in Microarray Data.

Per-Olof Fjallstrom. Affibody.

perfj@affibody.com

 

The ”clusters” returned by standard clustering methods applied to microarray data are not necessarily biologically relevant. We present a method for assessing if such clusters are unusually compact and isolated. The method has been successfully applied to several microarray data sets. It does not require estimates of the variance of experimental error.

Long Abstract

 

 

154A. An Amino Acid Centered Database to Facilitate Protein Crystallisation.

K. MacLeod and E. Westwick. Astex TechnologyLtd.

e.westwick@astex-technology.com

 

An amino acid centered relational database has been designed to store sequences of P450 proteins that have been engineered in order to optimise crystallisation behavior. Amino acids are stored as individual entities, allowing the physical and chemical properties of the residues to be correlated with experimental outcome, using SQL queries.

Long Abstract

 

 

155A. In silico reconstruction of metabolic network from unannotated raw genome sequences

Jibin Sun and An-Ping Zeng. Microbial Systems and Genome Analysis, GBF.

AZE@GBF.de

 

A method is proposed to in silico reconstruct metabolic network directly from unannotated genome sequences. A comparison of data from different sequencing stages (3.9 vs. 7.9 time coverage) for one

organism revealed that a 3.9 time coverage of the genome is sufficient (with 99.3% identity) for reconstructing the metabolic network.

Long Abstract

 

 

156A. AFLP® Nucleotide Sequence Quality Assessment and Improvement Tool.

Antoine Janssen1, Jan van Oeveren1, Pieter Vos1, Gert Vriend2, Roland Siezen2, Rene van Schaik3 and Jack Leunissen2. 1Keygene N.V., Wageningen, The Netherlands and 2Center for Molecular and Biomolecular Informatics, University of Nijmegen, Nijmegen, The Netherlands and 3Organon, Oss, The Netherlands.

antoine.janssen@keygene.com

 

The Keygene/CMBI AFLP® quality assessment and improvement tool is a web based application that automates quality assessment and visualization of (cDNA-)AFLP® data. It improves proprietary data by use of public data. The analysis includes coverage / redundancy calculation, internal contig building, full length discovery and potential SNP discovery. http://www.cmbi.nl/kg_bin/dataset_annotator.pl.

Long Abstract

 

 

157A(i). Schema Mapping and Data Integration with Clio.

Barbara Eckman, Mauricio Hernández, Howard Ho, Felix Naumann and Lucian Popa. IBM.

felix@us.ibm.com

 

Bioinformatics data sources typically have large, complex structures, reflecting the richness of the scientific concepts they model. Clio is an information integration tool the helps users define mappings between disparate schemas, thus providing an integrated view of all related data sources and enabling data transformations between the sources.

Long Abstract

 

 

157A(ii). The GENIA Corpus: an Annotated Corpus in Molecular Biology Domain.

Tomoko Ohta1, Yuka Tateisi2, Jin-Dong Kim2 and Jun-ichi Tsujii1,2. 1Univ. of Tokyo and 2CREST, JST.

okap@is.s.u-tokyo.ac.jp

 

We are developing the necessary resources including domain ontology and annotated corpus from MEDLINE abstracts. We have already annotated 2,500 abstracts with 31 different semantic classes. Part-of-speech annotation to the same set of abstracts annotated for named entities is under way using Penn Treebank set. In this poster, we report on the current status of our corpus.

Long Abstract

 

 

Genome Annotation.

 

158A. An Integrated Approach to High-Throughput Genetic Sequence Analysis. 50

159A. Annotation of Potential Transcription Factor Binding Sites in Orthologous, Paralogous and Pseudogenes by Statistical Analysis Comparing Them with the Sites with Known SNP's-Disease Associations. 50

160A. The Cellular Immune System as a Gene Prediction Resource. 51

161A. A High Throughput System for Mining EST and cDNA Databases. 51

162A. GenDB is an Open Source Framework for Genome Annotation. 51

163A. Biomax PEDANT™ Human Genome Database - Automatic and Manual Functional Annotation of the Human Genome. 51

164A. Expert-system based annotation strategies using GenDB, an open source genome annotation system. 51

165A. Analysis of the Replication Origin of the Middle-Sized Linear Plasmid pSCL2 of Streptomyces clavuligerus. 52

166A. Enhanced Functional Annotation by the EBI Sequence Database Group. 52

167A. An Automated Prediction System for Gene Functions Combined with RiceGAAS (Rice Genome Automated Annotation System). 52

168A. Global Open Biology Ontologies. 52

169A. Exon Structure Analysis, Ortholog Identification, and SNP Candidate Screening by Mapping RIKEN Mouse cDNA Clones to Multiple Genome Assemblies. 52

170A. TESS-II: Describing and Finding Gene Regulatory Sequences with Grammars. 53

171A. An Extensible Gene-centric Architecture for Querying across Multiple Databases Using J2EE. 53

172A. Top-Down EST Clustering Using the Draft Human Genome Map. 53

173A. Analysis of the del(13)SVEA36H Region on Mouse Chromosome13. 53

174A. Construction of a Genome-Wide, Fine-Grained Human-Mouse Synteni Map by Identifying Conserved Stretches in the Translations of Both Genomes. 54

175A. No poster. 54

176A. Sequence and Structural Integration within the InterPro, Proteome Analysis and SWISS-PROT Databases. 54

177A. eProteome: a Resource for Proteome Annotation. 54

178A. PyFACT: A Tool for Function Assignment and Classification to a Sequence using Dictionary-Based Approach. 54

179A. The EnsEMBL Annotation Process. 54

180A. CGP: A Tool for the Selection of Candidate Disease Genes. 55

181A. Literature and its Referents: Analyzing PubMed Citations Across PFAM. 55

182A. No poster. 55

183A. Identification of Potentially Functional Reverse Transcriptase Sequences in the Human Genome. 55

184A. Poxvirus Orthologous Clusters (POCs) Software Package. 55

185A. Selection of SNPs for a Genome-Wide Linkage Disequilibrium Mapping Set. 56

186A. Semantic Integration in the Mouse Genome Informatics System. 56

 

158A. An Integrated Approach to High-Throughput Genetic Sequence Analysis.

Gong-Xin Yu, E. Marland, A. Rodriguez and N. Maltsev. Argonne National Lab., 9700 S. Cass Ave., Argonne, IL, 60439, USA.

gxyu@mcs.anl.gov

 

We report a system that consists of a scalable pipeline for parallel sequence analysis; a rule-based knowledge base; a voting algorithm for functional assignments and a web browser. The knowledge base is the core of the system, which enables users to resolve conflicts among different computational tools, enhance confidence, and avoid over-interpretation.

Long Abstract

 

 

159A. Annotation of Potential Transcription Factor Binding Sites in Orthologous, Paralogous and Pseudogenes by Statistical Analysis Comparing Them with the Sites with Known SNP's-Disease Associations.

Julia V. Ponomarenko, Galina V. Orlova, Tatyana I. Merkulova, Elena V. Gorshkova, Oleg N. Fokin, Gennady V. Vasiliev, Anatoly S. Frolov and Mikhail P. Ponomarenko. Institute of Cytology and Genetics, 10 Lavrentyev Ave., Novosibirsk, 630090, Russia.

jpon@bionet.nsc.ru

 

A database-tools system, rSNP_Guide, http://wwwmgs.bionet.nsc.ru/mgs/systems/rsnp/, analyzes SNPs in regulatory gene regions. Based on seventeen transcription factor (TF) binding sites with disease-associated mutations, we have localized 148 potential TF sites at orthologous, paralogous and pseudogenes. Statistical significance of the strength of each potential site was estimated as "presence", "absence", or "weakness".

Long Abstract

 

 

160A. The Cellular Immune System as a Gene Prediction Resource.

Gila Lithwick, Yael Altuvia and Hanah Margalit. The Hebrew University, Jerusalem.

gilal@md.huji.ac.il

 

We carried out a comprehensive search comparing peptides eluted from major histocompatibility complex (MHC) molecules to human sequence data. Our findings illustrate how these peptides are informative for the identification of new genes, for hypothetical gene verification, for verifying gene expression at the protein level and for supporting splice junctions.

Long Abstract

 

 

161A. A High Throughput System for Mining EST and cDNA Databases.

Jonathan Segal and Hui Huang. Genome Therapeutics Corporation.

jsegal@genomecorp.com

 

We present a high-throughput system for mining EST and cDNA databases (dbEST, gb_new_est, etc.) to find possible extensions for known cDNAs and to find previously unmapped genes over a region of interest. Efficient use of our computational cluster lets us search for extensions for thousands of genes in several hours.

Long Abstract

 

 

162A. GenDB is an Open Source Framework for Genome Annotation.

Folker Meyer, Daniela Bartels, Thomas Bekel, Alexander Goesmann, Burkhard Linke, Alice McHardy and Oliver Rupp. Center for Genome Research, Bielefeld University.

fm@Genetik.Uni-Bielefeld.DE

 

Centered around a relational database management system, the GenDB system provides the software infrastructure for genome annotation projects. URL: http://GenDB.Genetik.Uni-Bielefeld.DE.

Long Abstract

 

 

163A. Biomax PEDANT™ Human Genome Database - Automatic and Manual Functional Annotation of the Human Genome.

Matthias Fellenberg1, Erik Kimpel1, Andreas Fritz1, Oliver Heinrich1, Victor Solovyev2, Michael Firsov3, and Christine M.E. Schüller1. 1Biomax Informatics AG, Lochhamer Straße 11, 82152 Martinsried, Germany, 2Softberry Inc., 116 Radio Circle, Suite 400, Mount Kisco, NY 10549, USA and 3Petrogen Ltd., Engels Pr., 128/A, #4-H, 194356 St. Petersburg, Russia.

matthias.fellenberg@biomax.de

 

The Pedant-Pro™ Sequence Analysis Suite from Biomax was used to perform systematic analysis for in-depth functional and structural characterization of the predicted human proteome resulting in the PEDANT™ Human Genome Database. An expert annotation team continues to refine the analysis data by verifying automatically predicted features, including literature references.

Long Abstract

 

 

164A. Expert-system based annotation strategies using GenDB, an open source genome annotation system.

Alice McHardy, Jan Kleinluetzum and Folker Meyer. Center for Genome Research, Bielefeld University

alice@genetik.uni-bielefeld.de

 

Simulating the decision process of a human expert, rule-based annotation strategies for interpretation of functional evidence can be formulated to automate the annotation process. Using GenDB (http://gendb.genetik.uni-bielefeld.de), an open source genome annotation system relying on a relational database backend, different rule-based strategies allowing (partial) automation of the annotation process are implemented.

Long Abstract

 

 

165A. Analysis of the Replication Origin of the Middle-Sized Linear Plasmid pSCL2 of Streptomyces clavuligerus.

Wei Wu and Kenneth L. Roy. Dept. of Biol. Sci., University of Alberta, Edmonton, Canada.

wwu@ualberta.ca

 

The putative replication origin region of the linear plasmid pSCL2 of Streptomyces clavuligerus has been sequenced and analyzed. Two ORFs, encoding RepC1 and RepC2, downstream of the origin of replication were highly homologous to RepL1 and RepL2 of pSLA2-L of Streptomyces rochei. IS116 was found further downstream of the repC genes.

Long Abstract

 

 

166A. Enhanced Functional Annotation by the EBI Sequence Database Group.

Manuela Pruess, Rolf Apweiler, Ernst Kretschmann and Michele Magrane. European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

mpr@ebi.ac.uk

 

At the EBI, reliable gene and protein databases, indispensable tools for computational analysis and data mining, are developed and maintained. High quality data annotation comprises both manual and automated methods, the latter not being a substitute for manual annotation, but providing strong support. All databases are publicly available at http://www.ebi.ac.uk/.

Long Abstract

 

 

167A. An Automated Prediction System for Gene Functions Combined with RiceGAAS (Rice Genome Automated Annotation System).

Hiroyuki Watanabe1, Yuji Shimizu1, Katsumi Sakata1, Hiroshi Ikawa1, Yoshiaki Nagamura2, Takashi Matsumoto2 and Kenichi Higo2. 1Tsukuba Division, Mitsubishi Space Software Co., Ltd., Tsukuba, Japan and 2National Institute of Agrobiological Sciences, Tsukuba, Japan.

hiroyuki@tkb.mss.co.jp

 

A new annotation program was developed to automatically predict the rice gene functions using homology searches for the integration with RiceGAAS (Rice Genome Automated Annotation System, http://RiceGaas.dna.affrc.go.jp/). The results of prediction were compared with those of manual annotation described in GenBank flat files, and shown at http://alnilam.mi.mss.co.jp/rgadb/.

Long Abstract

 

 

168A. Global Open Biology Ontologies.

Midori Harris. EMBL-EBI.

midori@ebi.ac.uk

 

The Gene Ontology (GO) Consortium supports the development of bio-ontologies describing domains not covered by the three GO vocabularies. Criteria for inclusion and links to existing and planned ontologies are available on the Global Open Biology Ontologies (GOBO) web site: http://www.geneontology.org/doc/gobo.html.

Long Abstract

 

 

169A. Exon Structure Analysis, Ortholog Identification, and SNP Candidate Screening by Mapping RIKEN Mouse cDNA Clones to Multiple Genome Assemblies.

Serge Batalov 1 and Colin F. Fletcher 2. 1 Computational Biology Department and 2 Mouse Genetics Program, Genomics Institute of the Novartis Research Foundation (GNF), San Diego, CA 92121, USA.

batalov@gnf.org

 

Mapping to Celera and public genome assemblies (sequenced from different strains) allows one to determine exon structure, identify alternatively spliced forms and localize the correct human ortholog for functional annotation. Intronless genes can be flagged for determination of retransposition events, pseudogenes, or genomic contamination. High quality sequence discrepancies can lead to SNP identification.

Long Abstract

 

 

170A. TESS-II: Describing and Finding Gene Regulatory Sequences with Grammars.

Jonathan Schug and Christian J. Stoeckert, Jr. Center for Bioinformatics at the University of Pennsylvania.

jschug@pcbi.upenn.edu

 

We present a grammar formalism and a parser to find gene regulatory sites in genomic sequence and annotation from DAS-compliant genome resources. The grammar can match against both sequence and existing annotation and makes it easy to express complex and flexible relationships between binding sites in a very concise form.

Long Abstract

 

 

171A. An Extensible Gene-centric Architecture for Querying across Multiple Databases Using J2EE.

David Block, Serge Batalov and Hilmar Lapp. The Genomics Institute of the Novartis Research Foundation (GNF), San Diego, California.

dblock@gnf.org, batalov@gnf.org and lapp@gnf.org

 

Current genomic databases are sequence-centric, while much of present and future biology is concerned with genes: their functions, roles, modifications, and expression. A gene-centric database architecture is described, along with an implementation using J2EE. A virtual “Platonic” set of genes is created using synonymous, homologous and syntenic relationships.

Long Abstract

 

 

172A. Top-Down EST Clustering Using the Draft Human Genome Map.

Namshin Kim1, Seokmin Shin1 and Sanghyuk Lee2.1School of Chemistry, Seoul National University and 2Division of Molecular Life Science, Ewha Womans University.

deepreds@hanmail.net

 

A new EST clustering algorithm utilizing the draft genome map is developed. Human ESTs are mapped onto the UCSC assembly of human genome (so-called the goldenpath) using the BLAT program, and their alignments are clustered in top-down fashion. The resulting clusters are compared with the UniGene and TIGR Gene Indices.

Long Abstract

 

 

173A. Analysis of the del(13)SVEA36H Region on Mouse Chromosome13.

AM Mallon1, J Weekes1, P Denny1, MRM Botcherby2, P Gautier4, H Hummerich5, S Cross4, V van Heyningen4, R Edgar4, N Leaves2, J Greystrong2, L Greenham2, S Jones2, K Maggott2, S Manjunath2, E Russell2, G Strachan2, M Strivens1, P North2, E Boal2, V Cobley.2, G Hunter2, G Kimberley2, L Cave-Berry2, L Mathews3, S Simms3, S Gregory3, R Evans3. T Hubbard3, R Durbin3, M Cadman1, R Mc Keone1, A Southwell1, C Sellick1, M Iravani5, S White4, P Little6, I Jackson4, J Rogers3, RD Campbell2 and SDM Brown1. 1MRC UK Mouse Genome Centre and Mammalian Genetics Unit, Harwell, Oxfordshire, OX11 0RD, UK, 2MRC UK-HGMP Resource Centre, Hinxton Genome Campus, Cambridge CB10 1SB, UK, 3Sanger Centre, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK, 4MRC Human Genetics Unit, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK, 5Imperial College, Exhibition Road, South Kensington, London SW7 2AZ, UK and 6School of Biochemistry and Molecular Genetics, University of New South Wales, Sydney 2052, Australia.

a.mallon@har.mrc.ac.uk

 

A regional and functional approach has been adopted by the MRC UK mouse-sequencing programme, which will improve the efficiency of mutation scanning and the identification of genes underlying mutations of interest. Detailed annotation of 14Mb of mouse finished sequence from the Del36H region on mouse chromosome 13 will be described.

Long Abstract

 

 

174A. Construction of a Genome-Wide, Fine-Grained Human-Mouse Synteny Map by Identifying Conserved Stretches in the Translations of Both Genomes.

Adrian Bruengger and John Markus. Novartis Pharma AG., WKL-125.13.58, Basel, BS4053, Switzerland.

adrian.bruengger@pharma.novartis.com

 

Using distributed computing, we have identified all those conserved segments in the translations of both genomes, whose ungapped alignments contain a short identical stretch and a score exceeding a certain threshold. The obtained dataset allows the rapid identification of orthologs and homologs and visualizes as a genome-wide fine-grained synteny map.

Long Abstract

 

 

175A. No poster.

 

176A. Sequence and Structural Integration within the InterPro, Proteome Analysis and SWISS-PROT Databases.

Virginie Mittard, Rolf Apweiler, Daniel Barrell, Ujjwal Das, Wolfgang Fleischmann, Alexander Kanapin, Paul Kersey, Evgenia Kriventseva, Phil McNeil, Nicola Mulder and Florence Servant. EMBL Outstation - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

virginie@ebi.ac.uk

 

The challenge for the next decade is to integrate the sequence and structural data and to provide structural and functional annotations for both protein families and entire genome sequences. This project will be applied within the InterPro, Proteome analysis and SWISS-PROT databases.

Long Abstract

 

 

177A. eProteome: a Resource for Proteome Annotation.

Serge Saxonov, Peter Tan and Douglas L. Brutlag. Stanford Biochemistry, 251 Campus Drive X 215Stanford, CA94305, USA

saxonov@smi.stanford.edu

 

eProteomes is a database of motif hits in proteins from the 72 publicly sequenced genomes as well as other sequence collections. The motifs were generated from the Blocks+ database and were represented as both regular expressions and PSSMs. eProteomes can be accessed through a powerful web-based interface at http://fold.stanford.edu/proteome.

Long Abstract

 

 

178A. PyFACT: A Tool for Function Assignment and Classification to a Sequence using Dictionary-Based Approach.

Jee-Hyub Kim1, Sung-Ho Goh1, Cheol-Goo Hur1 and Doil Choi2. 1National Center for Genome Information, KRIBB and 2Plant Diversity Research Center, KRIBB.

hurlee@mail.kribb.re.kr

 

PyFACT was developed to fit with the need of function assignment and classification of ESTs, and it is a dictionary-based approach encoded in Python language. PyFACT provide useful graphic and stats as well as function assignment using MIPS and GO function code. The additional dictionaries will make it more valuable.

Long Abstract

 

 

179A. The EnsEMBL Annotation Process.

M. Clamp1, D. Barker1, E. Birney2, G. Cameron2, Y. Chen2, L. Clarke1, G.Coates1, T. Cox1, J. Cuff1, V. Curwen1, T. Cutts1, T. Down1, R. Durbin1, E. Eyras1, J. Gilbert1, M. Hammond2, A. Kasprzyk2, D. Keefe2, S.Keenan1, H. Lehväslaiho2, C.Melsopp2, E. Mongin2, R. Pettett1, S. Potter1, A. Rust2, E. Schmidt2, S.Searle1, G. Slater2, J. Smith1, W. Spooner1, A. Stabenau2, J. Stalker1, A. Ureta-Vidal2, I. Vastrik2, T. Hubbard1. 1Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambs, CB10 1SA, UK and 2European Bioinformatics Institute (EMBL-EBI), Genome Campus, Hinxton Cambs, CB10 1SA, UK.

lec@sanger.ac.uk

 

The EnsEMBL Genome annotation process consists of four main stages: 'raw compute', gene build, protein analysis and comparative analysis. These processes are run across a variety of genomes from Human and Mouse to Worm and Mosquito. The data produced is then displayed on the web at www.ensembl.org.

Long Abstract

 

 

180A. CGP: A Tool for the Selection of Candidate Disease Genes.

Damian Smedley, Janet Kelso, Soraya Bardien-Kruger, Johann Visagie, Winston Hide and Mark McCarthy. 1Imperial College School of Medicine, Du Cane Road, London, UK and 2South African National Bioinformatics Institute, Cape Town, South Africa.

d.smedley@ic.ac.uk

 

The Candidate Gene Profiler (CGP) allows researchers to select candidate human disease genes based on their genomic location, expression pattern and protein function. The tool utilizes a novel expression ontology covering all the tissues represented in dbEST (VocabProbe).

Long Abstract

 

 

181A. Literature and its Referents: Analyzing PubMed Citations Across PFAM.

Richard K. Belew1, Robert Finn2 and Alex Bateman2. 1Computer Science & Engr. Dept., Univ. California - San Diego and 2Wellcome Trust Sanger Institute.

rik@cs.ucsd.edu

 

In this poster we consider a more gross analysis of reference pattern rather than the text of individual articles. This analysis is motivated by the early bibliometric analyses of citation patterns across thescientific literature, and more recent linkage analyses of WWW pages. Considering a corpus of approximately 600,000 TREMBL/SwissPROT protein entries, the number of references made toparticular articles follows the ubiquitous Zipfian distribution. We also performed a correlational analysis of the frequency-rank of a particular document's references, as a function of how many different PFAMfamilies contain a protein mentioning this reference. As expected, the approximately 80 most frequent, "generic" publications are indeed scattered across the most PFAM families.

Long Abstract

 

 

182A. No poster.

 

183A. Identification of Potentially Functional Reverse Transcriptase Sequences in the Human Genome.

E.F Donaldson, D.W. Lee, A.R. Juntunen and M.A. McClure. Montana State University, Department of Microbiology and Center for Computational Biology.

donaldso@parvati.msu.montana.edu

 

The Genome Parsing Suite provides prototype software to identify Reverse Transcriptase sequences in a genome, filtering hits to retain probable homologues, and scoring these sequences by evaluating the ordered-series-of-motifs indicative of Reverse Transcriptase. Flanking regions for each homologue are then analyzed to determine the genomic content of the Retroid Agent.

Long Abstract

 

 

184A. Poxvirus Orthologous Clusters (POCs) Software Package.

Angelika Ehlers, Stephanie Slack, Rachel L. Roper and Chris Upton. Department of Biochemistry and Microbiology, University of Victoria.

cupton@uvic.ca

 

POCs is a JAVA client-server application that accesses a database containing all poxvirus genomes; it automatically groups genes into families. POCs has a user-friendly interface permitting complex SQL queries to retrieve groups of DNA/protein sequences and gene families for use with a variety of integrated tools. Access at http://www.poxvirus.org.

Long Abstract

 

 

185A. Selection of SNPs for a Genome-Wide Linkage Disequilibrium Mapping Set.

Charles R. Scafe, Hadar Avi-Itzhak, Ryan T. Koehler, Marion Laig-Webster, Yu Wang, Eugene G. Spier, and Francisco M. De La Vega. Applied Biosystems, Foster City, CA, USA.

scafecr@appliedbiosystems.com

 

We describe a triage strategy for selecting SNPs for a genome-wide Linkage Disequilbrium mapping assay set. SNPs with multiple independent observations of the minor allele are put through assay design QC and positional selection steps. This procedure allows efficient construction of >150,000 validated assays for LD mapping.

Long Abstract

 

 

186A. Semantic Integration in the Mouse Genome Informatics System.

JA Blake, M Ringwald, J.E. Richardson, D.P. Hill, J.A. Kadin, H.J. Drabkin, J. Beal, C. Smith, C.M. Lutz, C. Bult, L.E. Corbani, A. Planchart, T.F. Hayamizu and J.T. Eppig. Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME USA.

jblake@informatics.jax.org

 

The Mouse Genome Informatics system (www.informatics.jax.org) depends upon strict attention to object identity and semantic integrity to support an extensive and fully integrated genetics and genomics resource. Incorporation of the Gene Ontologies, the Mouse Anatomical Dictionary and the development of the MGI Phenotype classifications empower data exploration and system interoperability.

Long Abstract

 

 

 

Sequence Comparison.

 

187A. PatternHunter: Faster and More Sensitive Homology Search. 56

188A. Discovery of Biological Sequence Motifs using a Stochastic Dictionary Model. 56

189A. Comparative Sequence Analysis of Non-Coding DNA in Orthologous Gene Loci. 56

190A. A Bioinformatic Pipeline for In-Silico High-Throughput Discovery of Single Nucleotide Polymorphisms. 57

191A. Maximum Score with Group Selection Method for BLAST post-processing. 57

192A. combAlign: A Protein Sequence Alignment Algorithm Considering Recombinations. 57

193A. Classifying Alignment Significance with Support Vector Machines. 57

194A. A Clustering Algorithm for Testing Interval Graphs on Noisy Data. 57

195A. Peptide Sequencing using Natural Abundance Isotope Information and de novo Spectral Analysis. 58

196A. Sequence Variation in the C-terminal Merozoite Surface Protein-1 Gene of Plasmodium falciparum and Epitope-Specific Human Antibody Response. 58

197A. Clustal-G: ClustalW Analysis Using Grid and Parallel Computing. 58

198A. The Distance Function for Computing the Continuous Distance of Biopolymer Sequences. 58

199A. BLAST Merge/Split Modules for BLAST Accelerator. 58

200A. Biological Significance of Jumping Alignments. 59

201A. Augmenting Physical Maps with Sequence. 59

202A. Towards a Vaccine for Scabies. 59

203A. Distributed BLAST System Based on Web: GOST BLAST. 59

204A. A Database on Alternative Splice Forms. 59

205A. The Enhanced Suffix Array and Its Applications to Genome Analysis. 60

206A. Automated Generation of Heuristics for Biological Sequence Comparison. 60

 

187A. PatternHunter: Faster and More Sensitive Homology Search.

Bin Ma, John Tromp and Ming Li. Bioinformatics Solutions Inc., 145 Columbia Street West, Waterloo, Ontario, Canada, N2L 3L2.

mli@bioinformaticssolutions.com

 

Genomics and proteomics studies routinely depend on homology searches based on the strategy of finding short seed matches which are then extended. The exploding genomic data growth presents a dilemma for DNA homology search techniques: increasing seed size decreases sensitivity whereas decreasing seed size slows down computation.

Long Abstract

 

 

188A. Discovery of Biological Sequence Motifs using a Stochastic Dictionary Model.

Mayetri Gupta and Jun S. Liu. Dept. of Statistics, 1 Oxford St., Harvard University, Cambridge, MA 02138, U.S.A.

gupta@stat.harvard.edu

 

We present a novel method for detecting conserved sequence motifs using a stochastic dictionary model. An MCMC strategy is devised with recursive techniques for increased efficiency. Our approach can find multiple motifs of unknown widths, and with insertions and deletions. Polynucleotide repeat traps are tackled without the necessity of masking.

Long Abstract

 

 

189A. Comparative Sequence Analysis of Non-Coding DNA in Orthologous Gene Loci.

Christoph Dieterich, Brian Cusack, Haiyan Wang, Katja Rateitschak, Antje Krause and Martin Vingron. Max-Planck-Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany.

christoph.dieterich@molgen.mpg.de

 

Non-coding DNA segments that are conserved between human and mouse genomic sequences are good indicators of regulatory sequences. We use a systematic approach to detect conserved elements in non-coding regions of orthologous gene pairs. Our results will be made available via the Distributed Annotation System of the ENSEMBL consortium.

Long Abstract

 

 

190A. A Bioinformatic Pipeline for In-Silico High-Throughput Discovery of Single Nucleotide Polymorphisms.

Dipinder S. Keer Mentored by: Dr. Marie-Michèle Cordonnier-Pratt1, Dr. Mark Huber2, Mr. Manish Shah1, Dr. Chun Liang1, Mr. Robert Sullivan1, Mrs. Aynsley Eastman1 and Dr. Lee Pratt1. 1Department of Botany and 2Department of Management Information Systems, University of Georgia, Athens, GA 30602, USA.

dskeer@uga.edu

 

We present a bioinformatic pipeline for in-silico high-throughput discovery of Single Nucleotide Polymorphisms in Expressed Sequence Tags. SNP detection was carried out using PolyPhred. The SNP discovery process attempts to eliminate deficiencies inherent to PolyPhred while integrating SNP discovery using PolyPhred with the EST generation pipeline already in place.

Long Abstract

 

 

191A. Maximum Score with Group Selection Method for BLAST post-processing.

Yuri Kapustin1, Vyacheslav Chetvernin2 and Tatiana Tatusova2. 1Informax Inc. and 2NCBI.

kapustin@ncbi.nlm.nih.gov

 

Maximum Score with Groups Selection is a method for BLAST post-processing which filters out noise and ambiguities producing the best reconcilable alignment combinations. The method is based on the greedy approach and a specialized group selection technique.

Long Abstract

 

 

192A. combAlign: A Protein Sequence Alignment Algorithm Considering Recombinations.

Katja Wegner, Stephan Jansen, Stefan Wuchty and Ursula Kummer. European Media Laboratory GmbH, Schloss-Wolfsbrunnenweg 33, D-69118 Heidelberg, Germany.

wegner@eml.villa-bosch.de

 

The algorithm, combAlign, aligns pairs of protein sequences regarding point mutations and recombinations. combAlign generates lists of local alignments which are subsequently mapped to a graph. The path providing maximal score denotes the best attainable combAlignment. Compared to existing algorithms sequences arranged in a recombinative manner are aligned significantly better.

Long Abstract

 

 

193A. Classifying Alignment Significance with Support Vector Machines.

Lars Arvestad1, Alexander Schliep2 and Olaf Wendisch3.

1Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden, 2Department Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany and 3ZAIK, University of Cologne, Germany.

schliep@molgen.mpg.de

 

A simple method for recognizing sequence homology using Support Vector Machines has been investigated. By utilizing features such as sequence composition in addition to alignment score, pairwise sequence comparisons compare favorably with methods that use information from multiple intermediate sequences.

Long Abstract

 

 

194A. A Clustering Algorithm for Testing Interval Graphs on Noisy Data.

Wen-Lian Hsu1, Kuen-Pin Wu1and Wei-Fu Lu2. 1Institute of Information Science, Academia Sinica, Taipei, Taiwan, ROC and 2Institute of Computer and Information Science, National Chiao Tung University, Hsin-chu, Taiwan, ROC.

hsu@iis.sinica.edu.tw

 

An important problem in DNA sequence analysis is to reassemble the clone fragments to determine the structure of the entire molecule. An error-free version of this problem can be modeled as an interval graph recognition problem. However, lab data is almost never flawless. We present a clustering algorithm to treat data containing errors, which can accommodate some probabilistic assumptions about the overlapping relationships.

Long Abstract

 

 

195A. Peptide Sequencing using Natural Abundance Isotope Information and de novo Spectral Analysis.

William R. Cannon1 and Kenneth D. Jarman2. 1Computational Biology, Biochemistry and Biophysics and 2Applied Mathematics.

william.cannon@pnl.gov

 

We demonstrate the use of natural abundance isotopic "labels" to aid in the identification of peptides with a novel de novo algorithm. The data are from ion trap MS/MS analysis of tryptic peptides. Isotopic resolution of ion series leads to an increased confidence in the identification of the precursor peptide.

Long Abstract

 

 

196A. Sequence Variation in the C-terminal Merozoite Surface Protein-1 Gene of Plasmodium falciparum and Epitope-Specific Human Antibody Response.

Stanley Adoro, Roseangela Nwuba, Chiaka Anumudu, Yusuf Omosun and Mark Nwagwu. Cellular Parasitology Programme, Department of Zoology, University of Ibadan, Ibadan, Nigeria.

bodijahouse@skannet.com

 

Computational and immunological methods were used to analyze the influence of variations in the C-terminal merozoite surface protein (MSP)-1 gene encoding MSP1(19kDa) of isolates of Plasmodium falciparum. While most amino acid variations were located in the loop regions, the human antibody response is to more conserved beta-sheet regions of MSP1(19kDa).

Long Abstract

 

 

197A. Clustal-G: ClustalW Analysis Using Grid and Parallel Computing.

Kuo-Bin Li. Bioinformatics Institute, Singapore 117609, Republic of Singapore.

kuobin@bii-sg.org

Clustal-G is an MPI and GRID-aware implementation of ClustalW. Based on ClustalW version 1.82, Clustal-G parallelizes the pairwise, as well as the progressive alignment, codes of the original program. Any PC/workstation clusters with MPI installed shouldbe able to run Clustal-G. In addition, the computation can be performed in a computational GRID environment using GLOBUS orMPICH-G2. The software is available at http://www.bii.a-star.edu.sg/~kuobin/clustalg/.

Long Abstract

 

 

198A. The Distance Function for Computing the Continuous Distance of Biopolymer Sequences.

G.O. Hakobyan and T.V. Margaryan. Chair of Higher Mathematics, Dept. of Physics, Yerevan State University, Armenia.

gaghakob@ysu.am

In some applications of sequence comparison theories the actual items to be compared are not successions of discrete elements, but "continuous" functions. The central role here plays the distance function of two independent variables. The present paper is aimed to construct a distance function with the help of the given "distance" matrix  D. Being itself as a continuous function it keeps all the properties which has the given distance matrix.

Long Abstract

 

 

199A. BLAST Merge/Split Modules for BLAST Accelerator.

Kiejung Park, Daesang Lee and Ki-Bong Kim. Information and Technology Institute, SmallSoft Co., Ltd., Daejeon, 305-811, South Korea.

kjpark@smallsoft.co.kr

 

To overcome the calculation time problem of BLAST in genome projects, we developed the sequence merger and BLAST output splitter modules which merge a set of BLAST query sequences into a smaller number of query sequences and split BLAST results to generate the results for the initial set of query sequences, respectively.

Long Abstract

 

 

200A. Biological Significance of Jumping Alignments.

Constantin Bannert and Jens Stoye. University of Bielefeld, Faculty of Technology, Genome Informatics.

bannert@TechFak.Uni-Bielefeld.DE

 

The "jumping alignments" algorithm matches a query sequence to a protein family represented by a multiple alignment. A reference sequence may change (jump) between sites of the alignment. We present results on the structural and functional significance of these 'jumps' and ways to improve the alignment process with this information.

Long Abstract

 

 

201A. Augmenting Physical Maps with Sequence.

F. Engler, S. Blundy, S. Pearson and C. Soderlund. Arizona Genomics Computational Laboratory.

efriedr@genome.clemson.edu

 

The mapping of Tentative Consensus sequences and Gene Ontologies to FPC contigs with BSS (BLAST Some Sequence) identifies regions of function along the contigs. Furthermore, using STC libraries and low-coverage whole genome shotgun sequence allows for automatic selection of a Minimal Tiling Path with markedly less redundancy than existing methods.

Long Abstract

 

 

202A. Towards a Vaccine for Scabies.

Pearly Harumal1, Deborah Holt3, Katja Fischer2, Shelley Walton1, Bart Currie1, David Kemp2, Matt Johnson3, Peter Wilson3, Vicky Hewitt3, John Davis3, Annette McGrath3 and Elizabeth Kuczek3. 1Menzies School of Health Research 2Queensland Institute of Medical Research and 3Australian Genome Research Facility, Level 5 Gehrmann Laboratories, University of Queensland, St Lucia, QLD 4072, Australia.

annette@agrf.org.au

 

A set of 12288 sequences from normalised and unnormalised cDNA libraries made from scabies mite were subjected to clustering using PHRAP and BLASTed to public domain databases (Swissprot and GenBank). We focus on gene discovery and protein family analysis of homologues of these sequences to house dust mite antigens.

Long Abstract

 

 

203A. Distributed BLAST System Based on Web: GOST BLAST.

Wan-Seon Lee1, Pan-Gyu Kim3, Mi-Ae Yoo1,2 and Hwan-Gue Cho1,3. 1Bioinformatics and Biocomplexity Research Center, 2Molecular Biology, Pusan National University and 3School of Computer Sci. and Eng., Pusan National University, Pusan 609-735, Korea.

bioinfos@korea.com

 

GOST BLAST is a distributed BLAST system for local area Network environment. GOST BLAST consists of two parts; master server and several client servers. This system supports multiple BLAST queries (sequences and database) and can reduce the turn-around time of BLAST results linearly.

Long Abstract

 

 

204A. A Database on Alternative Splice Forms.

Heike Pospisil, Alexander Herrmann, Harald Pankow and Jens Reich. Max-Delbrueck-Center for Molecular Medicine, Robert-Roessle-Str.10, 13125 Berlin, Germany.

alexander.herrmann@mdc-berlin.de

 

The Alternative Splice Database represents splice forms of 7 different organisms from ESTs and mRNA GenBank sequence records. The algorithm defines a possible alternative splice form by comparing high-scoring ESTs to mRNA sequences using BLAST. It is available at http://www.bioinf.mdcberlin.de/splice/db/.

Long Abstract

 

 

205A. The Enhanced Suffix Array and Its Applications to Genome Analysis.

Mohamed Ibrahim Abouelhoda, Stefan Kurtz, Enno Ohlebusch and Michael Hoehl. Faculty of Technology, University of Bielefeld, PO Box 10 01 31, 33501 Bielefeld, Germany.

mibrahim@TechFak.Uni-Bielefeld.DE, kurtz@TechFak.Uni-Bielefeld.DE, enno@TechFak.Uni-Bielefeld.DE

 

We enhance the suffix arrays with additional tables to replace algorithms using suffix trees with corresponding ones over suffix arrays with the same time complexity. Our algorithms are much faster and more space-efficient. 5n (5 bytes per nucleotide) are required for detecting repeats and 6n bytes for exact pattern matching.

Long Abstract

 

 

206A. Automated Generation of Heuristics for Biological Sequence Comparison.

Guy Slater and Ewan Birney. EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.

guy@ebi.ac.uk

 

We describe a system for automated generation of sequence comparison heuristics which operate by manipulation of the underlying alignment model. This allows rapid implementation of alignment algorithms which exhibit both favourable speed and accuracy. Examples of their use are given.

Long Abstract

 

 

 

Predictive Methods.

 

207A. Specificity and Predictability of DNA Binding from Minimal Structure Information. 60

208A. Identification of –1 Programmed Ribosomal Frameshift Signals in Saccharomyces cerevisiae. 60

209A. A New Method for Identifying Large Number of Contaminated ESTs. 61

210A. Predicting Functions of Novel Protein Motifs by Mining the Knowledge-Base of Gene Ontology. 61

211A. An Efficient Method for Predicting the Membrane Spanning b-strands. 61

212A. Improving the Secondary Structure Prediction of the N-termini of a-helices Using Empirical and Evolutionary Data. 61

213A. A New Method for Iterative Multiple Sequence Alignment using Secondary Structure Prediction. 61

214A. Using Structure and Sequence Information for Predicting Transcription Factor Binding Sites. 62

215A. Computational Identification between N-Terminal Transmembrane Domains and Signal Peptides. 62

216A. SAM_T02 Protein Structure Prediction Webserver. 62

217A. Hybrid HMM and Naive Bayes Models. 62

218A. N-Myristoylation in Plants: a Computational Prediction of N-Myristoylated Protein Kinases in Arabidopsis. 62

219A. Predicting Microbial Metabolism: A Functional Group Approach. 63

220A. Species-Specific Protein Sequence and Fold Optimizations. 63

221A. Integrated primer design strategy for PCR amplification of bisulphite treated DNA. 63

222A. Target Explorer: An Automated Tool for Identification of New Target Genes for Specified Set of Transcription Factors. 63

223A. Prediction of Protein Subcellular Localization in Gram-negative Bacteria: An Updated Version of PSORT. 63

224A. ScanPromW: A Windows Program Searching for Promoter Patterns against a Genome Sequence. 64

225A. Target Prediction of Transcription Factors: Application of Structure-Based Method to Yeast Genome. 64

226A. Learning Better Motif Discrimination using Generative Models. 64

227A. How Far Can We Trust Membrane Protein Topology Predictions?. 64

228A. Detection and Classification of Sequence and Structural Patterns in DNA Using Genetic Algorithm Neural Networks. 64

229A. An Algorithm for Late-Onset Disease Gene Mapping using Partially Diagnosed Pedigrees. 65

230A. A Neural Network Approach for Studying the Relationship between Protein Sequences and Protein-Protein Interactions. 65

231A. No poster. 65

232A. PSAML: A Representation of Protein Data for Structure Comparison. 65

233A. Using Small-World Topology to Refine Networks Derived via High-Throughput Methods. 65

234A. Prediction of snoRNAs in the Human Genome. 65

235A. Computational Localization of Clusters of Transcription Factor Binding Sites in Promoter Sequences. 66

 

207A. Specificity and Predictability of DNA Binding from Minimal Structure Information.

Shandar Ahmad, M. Michael Gromiha and Akinori Sarai. RIKEN Tsukuba Institute, Tsukuba 305 0074, Japan.

shandar@rtc.riken.go.jp

 

Several one-dimensional properties of proteins have been investigated to determine their specificity towards DNA binding. Solvent accessibility of residues has been found to be most significant factor after the sequence neighbour information. No significant preference for any secondary structure type was found on the whole. Neural network has been designed to implement a prediction method based on these findings.

Long Abstract

 

 

208A. Identification of –1 Programmed Ribosomal Frameshift Signals in Saccharomyces cerevisiae.

Jonathan L. Jacobs and Jonathan D. Dinman, Ph.D. Department of Cell Biology and Molecular Genetics, University of Maryland, 2135 Microbiology Bldg., College Park, MD 20742.

jacobsjo@wam.umd.edu

 

Programmed ribosomal frameshifting (PRF) is a phenomenon usually associated with the viral biogenesis of alternatively coded. Recently it has been shown that these signals are functionally present in eukaryotic genomes. In this work, we present a bioinformatics approach for identifying putative –1 PRF sites in the genome of Saccharomyces cerevisiae.

Long Abstract

 

 

209A. A New Method for Identifying Large Number of Contaminated ESTs.

Rotem Sorek. Compugen Ltd.

rotem@compugen.co.il

 

We present a new method for identifying highly contaminated EST libraries. Using this method, we were able to identify EST libraries enriched with genomic contamination, partially spliced sequences and other types of contaminations. This allowed us to discard about 25,000 ESTs that were otherwise inferred as new splice variants.

Long Abstract

 

 

210A. Predicting Functions of Novel Protein Motifs by Mining the Knowledge-Base of Gene Ontology.

Xinghua Lu, Chengxiang Zhai, Vanathi Gopalakrishnan and Bruce G. Buchanan. Center for Biomedical Informatics, University of Pittsburgh Language Technologies Institute, Carnegie Mellon University.

xil3@pitt.edu

 

To predict the function of novel sequence patterns, a system was developed to mine the Gene Ontology knowledge-base and use association of GO terms with motifs to predict the function of patterns. The system was tested using patterns from PROSITE, and a statistical framework was developed to predict the confidence of function prediction.

Long Abstract

 

 

211A. An Efficient Method for Predicting the Membrane Spanning b-strands.

M. Michael Gromiha1, Shandar Ahmad2 and Makiko Suwa1. 1Computational Biology Research Center (CBRC), AIST, 2-41-6 Aomi, Koto-ku, Tokyo 135-0064, Japan and 2RIKEN Tsukuba Institute, 3-1-1 Koyadai, Tsukuba 305-0074, Japan.

michael-gromiha@aist.go.jp

 

A new method has been proposed to predict the membrane spanning b-strands in outer membrane proteins by the combination of “rule-based approach” and neural networks. We observed a reasonable improvement in the accuracy of prediction.

Long Abstract

 

 

212A. Improving the Secondary Structure Prediction of the N-termini of a-helices Using Empirical and Evolutionary Data.

Claire L. Wilson, Simon J. Hubbard and Andrew J. Doig. Department of Biomolecular Sciences, U.M.I.S.T.

clw@bms.umist.ac.uk

 

Current secondary structure prediction methods perform well at identifying helical locations, but often fail to correctly identify N-terminal positions. Empirically-derived free energies are used to represent residue preferences at the N-terminal positions. Analysis of neighbouring N-terminal positions reveals the true sequence is close by and often energetically more favourable.

Long Abstract

 

 

213A. A New Method for Iterative Multiple Sequence Alignment using Secondary Structure Prediction.

Simossis V.A. and Heringa J. Division of Mathematical Biology, The National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA UK.

vsimoss@nimr.mrc.ac.uk

 

We present an iterative method that integrates secondary structure prediction and multiple sequence alignment. This iterative scheme includes SymSSP; a routine that optimises predicted secondary structure information and a new optimal segmentation algorithm to yield a consensus prediction. The complete process iteratively optimises multiple alignment quality as well as secondary structure prediction and is implemented in the PRALINE method (Heringa J., 1998, 2000). All methods are available at http://mathbio.nimr.mrc.ac.uk.

Long Abstract

 

 

214A. Using Structure and Sequence Information for Predicting Transcription Factor Binding Sites.

Tommy Kaplan, Nir Friedman and Hanah Margalit. The Hebrew University, Jerusalem, Israel.

tommy@cs.huji.ac.il

 

We describe an EM-based approach that uses solved protein-DNA complexes and genomic sequences to predict the binding specificity of transcription factors, based on their DNA-binding domain. We demonstrate the potential of our method by its application to the Cys2His2 zinc-finger DNA-binding family.

Long Abstract

 

 

215A. Computational Identification between N-Terminal Transmembrane Domains and Signal Peptides.

Zheng Yuan1 and Rohan D Teasdale2. Institute for Molecular Bioscience and Special Research Centre for Function and Applied Genomics, University of Queensland, QLD 4072, Australia.

1z.yuan@imb.uq.edu.au , 2r.teasdale@imb.uq.edu.au

 

A new method is developed based on the sequence features (amino acid composition, hydrophobicity and position) of the hydrophobic regions for N-terminal transmembrane domains and signal peptides. Using Fisher's linear discriminant functions, we can well predict the two types of peptides. This method can complement current transmembrane protein prediction and signal peptide prediction methods to obtain more accurate predictions.

Long Abstract

 

 

216A. SAM_T02 Protein Structure Prediction Webserver.

Rachel Karchin, Mark Diekhans, Jonathan Casper, Spencer Tu, Richard Hughey and Kevin Karplus. University of California, Santa Cruz Computer Science and Computer Engineering Departments.

rachelk@soe.ucsc.edu

 

SAM_T02 predicts the fold and secondary structure of a target protein sequence, using multi-track hidden Markov models and neural nets trained on SAM-T2K multiple alignments. SAM_T02 is available at http://www.soe.ucsc.edu/research/compbio/HMM-apps/T02-query.html.

Long Abstract

 

 

217A. Hybrid HMM and Naive Bayes Models.

Beverly Seavey1, David Page2 and Brian Kay3. 1University of Wisconsin Dept. of Computer Sciences, 2University of Wisconsin Dept. of Computer Sciences and Dept of Biostatistics and Medical Informatics, 3Argonne National Laboratory.

seavey@cs.wisc.edu

 

Analysis of sequence data and analysis of feature data are currently disparate fields within bioinformatics, with little work in combining them. This is unfortunate because data sets with mixtures of sequence and feature data are likely to become the norm in the near future of bioinformatics. The primary purpose of the work described here is to introduce and evaluate an algorithm that is a hybrid of HMMs and the simple feature based approach of naive Bayes. We apply it to the task of predicting peptide ligand binding specificity of various Src-homology 3 (SH3) domains.

Long Abstract

 

 

218A. N-Myristoylation in Plants: a Computational Prediction of N-Myristoylated Protein Kinases in Arabidopsis.

Tobey M. Tam and Michael Gribskov. San Diego Supercomputer Center, University of California at San Diego.

ttam@sdsc.edu

 

The N-terminal N-myristoylated consensus sequence has been previously determined for animals and yeast, however, little is known in plants. We have developed a prediction program based on pairwise alignment that identifies an N-myristoylation consensus motif for plants. The program is based on a log-odds matrix scoring system.

Long Abstract

 

 

219A. Predicting Microbial Metabolism: A Functional Group Approach.

Bo Kyeng Hou, Wenjun Kang, Larry P. Wackett and Lynda B.M. Ellis. Center for Environmental Molecular Science University of Minnesota.

bkher71@dreamwiz.com

 

A pathway prediction system has been developed to predict microbial catabolism, based on the UM-BBD, which contains a broad variety of reactions of organic functional groups. The biodegradation of a query compound whose metabolism is not yet known can be predicted based on the functional groups it contains.

Long Abstract

 

 

220A. Species-Specific Protein Sequence and Fold Optimizations.

Michel Dumontier and Christopher W.V. Hogue. Samuel Lunenfeld Research Institute, Mt. Sinai Hospital, Toronto, ON M5G 1X5.

micheld@mshri.on.ca

 

An organism’s ability to adapt to its particular environmental niche is of fundamental importance to its survival and proliferation. We have identified species-specific protein sequence and structural domain optimizations, which we exploited to generate predictive scoring functions. These scoring functions performed well in their species-specific protein identification ability and may be used in future protein-engineering experiments.

Long Abstract

 

 

221A. Integrated primer design strategy for PCR amplification of bisulphite treated DNA.

Tamas Rujan, Reinhold Wasserkort and Armin O. Schmitt. Epigenomics AG, Berlin, Germany. www.epigenomics.com, rujan@epigenomics.com

 

We developed an integrated strategy for primer design for single and multiplex PCR amplification of bisulphite treated DNA. In addition to criteria usually used for primer design on genomic DNA we perform further tests, for example to avoid unwanted priming. Experiments suggest a high success rate for sPCR and mPCR.

Long Abstract

 

 

222A. Target Explorer: An Automated Tool for Identification of New Target Genes for Specified Set of Transcription Factors.

Sosinsky A., Wildonger J., Bonin K., Mann R. and Honig B. Howard Hughes Medical Institute and Department of Biochemistry and Molecular Biophysics, Columbia University, New York, USA.

as1689@columbia.edu

 

Target Explorer creates customized library of binding site matrices, searches for clusters of transcription factor binding sites and retrieves annotation for potential target genes that surround identified clusters. It was successfully applied for identification of new target genes for cooperative factors Lozenge and Pointed. Target Explorer is available at http://trantor.bioc.columbia.edu/search_for_BS.

Long Abstract

 

 

223A. Prediction of Protein Subcellular Localization in Gram-negative Bacteria: An Updated Version of PSORT.

Jennifer L. Gardy, Cory A. Spencer, Shannan J. Ho Sui and Fiona S.L. Brinkman. Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, B.C., Canada.

jlgardy@sfu.ca

 

In consultation with Kenta Nakai, the developer of PSORT, we are improving the algorithm for prediction of subcellular localization of proteins in Gram-negative bacteria. This algorithm, which currently represents the only comprehensive predictive tool for prokaryotic subcellular localization prediction, has not been updated since 1991. For further information see http://www.pathogenomics.bc.ca/brinkman/research.html.

Long Abstract

 

 

224A. ScanPromW: A Windows Program Searching for Promoter Patterns against a Genome Sequence.

Daesang Lee, Chankyu Park and Kiejung Park. Information and Technology Institute, SmallSoft Co., Ltd. and Department of Biological Science, KAIST, Daejeon, 305-701, South Korea.

dslee@bioneer.kaist.ac.kr

 

The ScanPromW program, which runs on Windows environment, allows users to search a microbial genome database for promoter consensus elements. This uses optimized consensus sequence profile through the position-specific similarity assessment and the genome scanning results are listed according to their score to the consensus profile.

Long Abstract

 

 

225A. Target Prediction of Transcription Factors: Application of Structure-Based Method to Yeast Genome.

Akinori Sarai1, Samuel Selvaraj1, Michael M. Gromiha1, Joerg-Gerald Siebers1, Ponraj Prabakaran1 and Hidetoshi Kono2. 1RIKEN Tsukuba Institute 3-1-1 Koyadai, Tsukuba 305-0074 Japan and 2University of Pennsylvania 231 South 34 St. Philadelphia PA 19104 USA.

sarai@rtc.riken.go.jp

We have developed a structure-based method for the target prediction of transcription factors. We have applied the method to the analysis of yeast genome, predicting the targets of particular transcription factors. The results suggest that the method is capable of predicting experimentally known target genes and binding sites correctly.

Long Abstract

 

 

226A. Learning Better Motif Discrimination using Generative Models.

Gal Elidan, Yoseph Barash, Tommy Kaplan and Nir Friedman. Hebrew University, Ross Bldg., Givat Ram, Jerusalem, 91904, Israel.

galel@cs.huji.ac.il

A common model for representing transcription factors binding sites is position specific score matrices. These assume independence between positions. We explore several extensions that relax this assumption. These include factorized mixture models, context specific mixture models, and Bayesian networks. We evaluate these variants on synthetic and real-life datasets.

Long Abstract

 

 

227A. How Far Can We Trust Membrane Protein Topology Predictions?

Karin Melén1, Gunnar von Heijne1 and Anders Krogh2. 1StockholmBioinformatics Center, Stockholm University, Sweden and 2Bioinformatics Centre, University of Copenhagen, Denmark.

karin@sbc.su.se

 

Methods for predicting the topology of membrane proteins usually reach an accuracy of between 60 and 75%. However, it is in general not clear how reliable a specific prediction is. We have analyzed different ways of assessing the reliabilty and we have derived reliability scores for five common methods.

Long Abstract

 

 

228A. Detection and Classification of Sequence and Structural Patterns in DNA Using Genetic Algorithm Neural Networks.

Robert G. Beiko and Robert L. Charlebois. Department of Biology, University of Ottawa.

rbeiko@science.uottawa.ca

 

We present a method to detect and classify patterns in DNA, by subdividing a sequence of DNA into smaller windows, then converting these windows into different measures of structure and sequence composition. By selecting different groups of these variables, and using them to train neural networks, we were able to identify conserved patterns in upstream regions of Escherichia coli and Sulfolobus solfataricus.

Long Abstract

 

 

229A. An Algorithm for Late-Onset Disease Gene Mapping using Partially Diagnosed Pedigrees.

Guo-Yun Yu and Christopher M. Gomez. Department of Neurology, University of Minnesota, Minneapolis, Minnesota, USA.

gyy@tc.umn.edu

 

Late-onset disease gene mapping is often a challenge because of diagnosis uncertainty. Technical advances make it possible to take a fundamentally different approach to discover such disease genes. We will present an algorithm to use partially diagnosed pedigrees to map disease genes.

Long Abstract

 

 

230A. A Neural Network Approach for Studying the Relationship between Protein Sequences and Protein-Protein Interactions.

Richard Chang and David Page. Department of Computer Sciences, University of Wisconsin-Madison.

richard.chang@abbott.com

 

The protein interactions of the SH3 domain are translated to an artificial intelligence multiple instance problem. A solution using neural networks as part of an Expectation Maximization algorithm is described. This leads to improved accuracy over a simple neural network.

Long Abstract

 

 

231A. No poster.

 

232A. PSAML: A Representation of Protein Data for Structure Comparison.

Su-Hyun Lee1, Jin-Hong Kim2, Geon-Tae Ahn2 and Myung-Joon Lee2. 1Changwon National University and 2University of Ulsan.

suhyun@sarim.changwon.ac.kr

 

We present an XML representation of protein data named PSAML, which can be used for comparing protein structures and detecting their similarities. The PSAML language is designed on the protein data model named PSA, which describes a protein structure as the secondary structures of the protein and their relationships.

Long Abstract

 

 

233A. Using Small-World Topology to Refine Networks Derived via High-Throughput Methods.

Debra S. Goldberg and Frederick P. Roth. Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School.

debg@hms.harvard.edu

 

Many biological networks are small-world networks. To exploit the small-world properties of the S. cerevisiae protein interaction network, we developed measures of local neighborhood cohesiveness around potentially interacting protein pairs. Using these measures, we are able to accurately assess the reliability of protein-protein interactions observed in error-prone yeast two-hybrid studies.

Long Abstract

 

 

234A. Prediction of snoRNAs in the Human Genome.

SAGARA Jun-Ichi1, NAKAMURA Shugo2, KENMOCHI Naoya3, SATO Tomoyuki1,4, OKOUCHI Ikuo1,4, SUWA Makiko1 and ASAI Kiyoshi1. 1Computational Biology Research Center, National Institute of Advanced Industrial and Technology, 2Department of Biotechnology, The University of Tokyo, 3Central Research Laboratories, Miyazaki Medical College and 4Fuji Research Institute Corporation.

jun@ni.aist.go.jp

 

We predict snoRNAs in the human genome using a statistical method which investigates sequential motifs. We have also developed a human intron database produced from exons predicted by Gene Decoder*2 which is a gene finding technology based on HMMs. We show the prediction data of snoRNAs and the human intron database.

Long Abstract

 

 

235A. Computational Localization of Clusters of Transcription Factor Binding Sites in Promoter Sequences.

Szymon M. Kielbasa, Nils Bluethgen and Hanspeter Herzel. Theoretical Biology, Humboldt University Berlin.

s.kielbasa@itb.biologie.hu-berlin.de

 

We present an iterative algorithm to detect over-represented pairs of motifs in promoter sequences. Results of applying the method to promoters of coregulated genes of yeast and human promoters regulated by the Ras pathway are shown. A comparison to the composite elements available in the Compel database is given.

Long Abstract

 

 

 

New Frontiers.

 

236A. Trends in Molecular Bioinformatics. 66

237A. Discovery Genome Functions with Modeling Five-Elements Computing. 66

238A. Discovery Genome Mechanisms for Cancer Therapy with Agents. 66

239A. BioBhasha - A Programming Language for Biologist. 67

240A. A Rule-Based Approach for Automatically Identifying Gene and Protein Names in MEDLINE  67

241A. Java Library of Generic Internet Robot Algorithms. 67

242A. Novel Approach in Computational Analysis of Biological Complexity. 67

243A. AbML: an XML Schema Description for Antibody Information. 67

244A. Bijective Mapping of Discrete Biological Sequences. 68

245A. A High-speed Similar Protein Retrieval Method using the Distance between Molecular Surface Data. 68

246A. Eukaryotic Linear Motif Resource for Functional Sites in Proteins. 68

247A. A Database for the Management of Gene Expression Data in situ. 68

248A. A Comprehensive Portal to Bioinformatics Training. 68

249A. Semi-Automated Calling System for SNPs (Single Nucleotide Polymorphisms). 69

250A. Analyzing Protein Sequence Tags (PSTs) of Complex Protein Mixtures: A Fast Deisotoping and Deconvolution Algorithm for ESI-MS spectra. 69

 

236A. Trends in Molecular Bioinformatics.

Ashwin Sivakumar1, R. Balaji2, Vidhya Gomathi Krishnan1 and John Howard Parish1. 1School of Biochemistry and Molecular Biology, The University of Leeds, Leeds LS2 9JT, UK and 2Indian Institute of Sciences, Bangalore, India.

bmbasi@bmb.leeds.ac.uk

 

"Post Genome Informatics" takes bioinformatics beyond its original boundaries. With the concurrent advances in computer technology the biological data are amenable to analysis and pattern recognition. This paper is a collection of logistic and statistical analysis of the trends in the area of molecular bioinformatics.

Long Abstract

 

 

237A. Discovery Genome Functions with Modeling Five-Elements Computing.

Jianwei Sun. Independent Researcher and Developer, Room 1608, No.5, Lane 500, MaoTai Road, Shanghai, 200336, China.

jianwei_sun@hotmail.com

It is desired to discover the genome functions resulting in a physiological effect. In this investigation, the mathematical model for the Five-Elements system is investigated by building an evolutionary computing platform for investigating signal transduction produced response of functional-operatings in the Five-Elements system under the paradigm of discovery genome functions with therapy systems.

Long Abstract

 

 

238A. Discovery Genome Mechanisms for Cancer Therapy with Agents.

Jianwei Sun. Independent Researcher and Developer, Room 1608, No.5, Lane 500, MaoTai Road, Shanghai, 200336, China.

jianwei_sun@hotmail.com

It is better to build informatics models for discovering genome mechanisms for cancer therapy from analysing the data generated by microarrays. In this investigation, the modeling is directly using computing power and software agent paradigm with the tangible cases of breast cancer therapy for the discovering job. Meanwhile, an evolutionary computing platform is being built for the investigation.

Long Abstract

 

 

239A. BioBhasha - A Programming Language for Biologists.

BVLS Prasad. Indian Institute of Science, Molecular BioPhysics Unit, Bangalore, Karnataka, 560012, India.

shiva@mbu.iisc.ernet.in

 

'BioBhasha- A Programming Language for Biologist' is developed using C++ Programming Language and Object Oriented Paradigm (OOP). BioBhasha provides a set of Biological Abstract Data Types (BioADTs), which a programmer can use to write programming code in biological terminology. This design makes BioBhasha extensible, maintainable, reusable and biologist friendly. It is designed to provide a bio-programming environment that encourages creativity in exploratory research and flexibility in developing novel bio-computational applications.

Long Abstract

 

 

240A. A Rule-Based Approach for Automatically Identifying Gene and Protein Names in MEDLINE Abstracts.

Hong Yu, M.S., M.Phil.1, Vasileios Hatzivassiloglou, PhD2, Carol Friedman, PhD1,4, Ivan H. Iossifov3, and Andrey Rzhetsky, PhD1,3. 1Dept. Medical Informatics, 2Dept. Computer Science, 3Columbia Genome Center, Columbia University and 4Department of Computer Science, Queens College, City University of New York, New York 10032, USA.

hy52@columbia.edu

 

Identifying gene and protein terms is important for obtaining biological knowledge from literature. We have developed GPmarkup (for gene and protein-name mark up), a system that automatically identifies gene and protein terms and maps gene and protein symbols (e.g., DR3) to names (e.g., Death Receptor 3) in MEDLINE abstracts.

Long Abstract

 

 

241A. Java Library of Generic Internet Robot Algorithms.

Audrius Meskauskas, Frank Lehmann Horn and Karin Jurkat Rott. Department of Applied Physiology, University of Ulm , Einsteinallee 11, D 89069 Ulm.

Audrius.Meskauskas@medizin.uni-ulm.de

 

We suggest the package (library and java code generators) for creating bioinformatical internet robots. The library provides cache, connecting strategy, security system against improper use and organizing system, providing view on the running program. This system can integrate new analysis tools as soon as they appear in the internet pages.

Long Abstract

 

 

242A. Novel Approach in Computational Analysis of Biological Complexity.

Ahmed Fadiel1 and Stuart Lithwick2. 1The Centre for Applied Genomics and 2The Bioinformatics Supercomputing Centre. The Hospital for Sick Children, Toronto, Ontario, Canada.

afadiel@bioinfo.sickkids.on.ca

 

We hypothesized that gene order/location is genome specific and is correlated with the genome evolution and it’s complexity. We tested this hypothesis using non-conventional computational approaches based on complexity analysis. Our results indicated that gene/ORF distribution patterns are genome-specific and are largely conserved within chromosomes of each species. In addition, interestingly we found that genome complexity is correlated with the evolutionary distance between species.

Long Abstract

 

 

243A. AbML: an XML Schema Description for Antibody Information.

Uwe Plikat, Adrian Bruengger and Christoph Wanke. Novartis Pharma, Basel, Switzerland.

uwe.plikat@pharma.novartis.com

 

We propose an XML schema, AbML, for the standardized description and exchange of antibody information. We make use of existing standards for the description of certain data objects as well as provide controlled vocabulary wherever feasible.

Long Abstract

 

 

244A. Bijective Mapping of Discrete Biological Sequences.

Jonas S Almeida1,2 and Susana Vinga2. 1Dept. of Biometry and Epidemiology, Medical Univ. South Carolina, 135 Cannon Street, Suite 303, P.O. Box 250835, Charleston, SC 29425, USA and 2Inst. Tecnologia Química e BiológicaUniv., Nova Lisboa, Av. da República (EAN), P.O.Box 127, 2781-901 Oeiras, Portugal.

almeidaj@musc.edu

 

Universal Sequence Maps (USM, http://bioinformatics.musc.edu/~jonas/usm/) are novel iterative mapping functions, derived from Chaos Game Representation (CGR), that enable bijective mapping of any discrete sequence into continuous unitary hypercubes. USM enables scale independent representation of transition matrices and, as such, offers an advantageous platform for discriminant analysis of biological sequences.

Long Abstract

 

 

245A. A High-speed Similar Protein Retrieval Method using the Distance between Molecular Surface Data.

Yoshikazu Kaneta. Graduate School of Information Science and Technology Engineering, Osaka University, Japan).

kaneta@ise.eng.osaka-u.ac.jp

 

An efficient similar protein retrieval method based on the distance space constructed by calculating the dissimilarity between any two protein molecular surfaces is proposed. The application to the enzyme protein database shows that the retrieval time is reduced to 9.5% of the sequential retrieval.

Long Abstract

 

 

246A. Eukaryotic Linear Motif Resource for Functional Sites in Proteins.

Rune Linding1, Pal Puntervoll2, Christine Gemund1, Scott Cameron3 and Toby Gibson1. 1EMBL, Germany and 2University of Bergen, Norway and 3University of Dundee, UK.

linding@EMBL-Heidelberg.DE

 

The ELM (Eukaryotic Linear Motif) resource is a set of tools to detect functional sites within protein sequences. Context-based discriminatory rules will be applied to filter out false hits, giving end users a small number of plausible functions. ELM information is available at http://elm.eu.org.

Long Abstract

 

 

247A. A Database for the Management of Gene Expression Data in situ.

M. Samsonova1, A.Pisarev1, E.Pustel'nikova1 and P.Baumann2. 1Bioinformation Systems Lab, St.Petersburg State Technical University, St.Petersburg, Russia and 2Active Knowledge GmbH, Kirchenstr. 88, D-81675 Munich, Germany.

samson@fn.csa.ru

 

We propose a novel strategy for management of the information on gene expression in situ. It consists in application of the array DBMS RasDaMan for database design. We have developed the database named as Mooshka (http://urchin.spbcas.ru/Mooshka) which stores in situ data on the expression of segmentation genes in Drosophila blastoderm, as well as numerical results of analysis of a structure and behavior of the segmentation genetic network. Mooshka provides a possibility to search and analyze information within an image and allows one to implement a wide range of data processing operations as internal database queries.

Long Abstract

 

 

248A. A Comprehensive Portal to Bioinformatics Training.

Bernhard Haubold, Manon von Bülow, Jayshree Mistry, Karin Maslen and Monika Haas. LION bioscience AG, Heidelberg, Germany.

monika.haas@lionbioscience.com

 

Web-Based Training Bioinformatics is a comprehensive course in the application of information technology to biomedical research. It currently covers six themes, ranging from databases to comparative genomics. Each theme has a modular structure with each module consisting of a theoretical introduction, a tutorial on representative tools, and worked exercises.

Long Abstract

 

 

249A. Semi-Automated Calling System for SNPs (Single Nucleotide Polymorphisms).

Yoko Higashi1, Arata Sato1, Hirotaka Higuchi1, Hitoshi Sakano1, Toshihiko Morimoto1, Tsutomu Matsunaga1, Keisuke Ishii2 and Masaaki Muramatsu2. 1NTT Data Co. Ltd. and 2Hubit Genomix, Inc.

higashiy@rd.nttdata.co.jp

 

In SNP calling with ordinary software, the laboratory staffs need to look at the plots and manually review each genotype, which is a big bottleneck for genotype analysis. We developed software which makes it possible to call genotypes semi-automatically. It attained 80% accuracy and reduced working hours by 80%.

Long Abstract

 

 

250A. Analyzing Protein Sequence Tags (PSTs) of Complex Protein Mixtures: A Fast Deisotoping and Deconvolution Algorithm for ESI-MS spectra.

U. Bauer, R. Moraga, C. Baumann and J. Schwarz. Xzillion GmbH & CoKG, Bioinformatics/Mass Spectrometry.

Ute.Bauer@xzillion.com

 

Recently proteomics technologies for analysis of protein expression based on ESI LC-MS/MS emerged. The PST approach reduces the complexity of the protein mixture digest by isolating C-/N-terminal peptides. Hence interpretation of LC-MS data becomes possible by combining a fast and sensitive deconvolution algorithm with a special PST peptide database search.

Long Abstract

 

Session B.

 

 

Microarrays.

 

1B. Quality Assurance Methods for Processing Microarray Imagery. 69

2B. Data Management and Analysis for Gene Expression Array. 69

3B. An Empirical Comparison Of Methods For Detecting Differentially Expressed Genes In Cancer Datasets. 70

4B. No Poster. 70

5B. ArrayExpress, a Public Repository for Microarray Gene Expression Data. 70

6B. No Poster. 70

7B. cDNA Microarray Images synthesized from the Real Spot Edge Templates. 70

8B. SOURCE: The Stanford Online Universal Resource for Clones and ESTs. 71

9B. Global Gene Expression Profiling of E. coli with Interruption of Acetate Production. 71

10B. Gene Expression Profiling Analysis Augmented by Mathematically Transformed Gene Ontology. 71

11B. Higher-Order Gene Interaction Revealed by Log-Linear Analysis of DNA Microarrays. 71

12B. A New Normalization Method for cDNA Microarray Data using Clustering Background. 71

13B. Genetic Network Analysis from Microarray Gene Expression Data via Bayesian Network and Nonparametric Regression. 72

14B. Correlation of Hypoxia-Regulated Genes with Hypoxia-Enhanced Metastatic Ability by Gene Expression Profiling. 72

15B. ISAcycle: Independent Subspace Analysis of Gene Expression Data. 72

16B. Computational Tools, Floral Primordial Development pathways and Its regulation using DNA chip Technology in Arabidopsis. 72

17B. No poster. 72

18B. Linearization of DNA Macroarray data. 72

19B. Statistical Analysis of Multi-Center Microarray Data. 73

20B. Bayesian Networks and Perturbation Experiments. 73

21B. Quality Measures in cDNA Microarray Experiments. 73

22B. Large-scale Analysis of the Human and Mouse Transcriptomes. 73

23B. Experimental Design and Analysis of Microarray Data. 73

24B. Robustness of Ensemble Learning for Independent Component Analysis of Micro-Array Channel-Ratio Data. 74

25B. Comparative Analysis of Algorithms for Signal Quantitation from Oligonucleotide Microarrays. 74

26B. An Experimentally Optimized Algorithm for High Throughput, Parallel Determination of Gene Structure using Microarrays. 74

27B. Comparison of Transcript Abundance Models in Gene Expression Microarrays. 74

28B. Generation of a Human Fovea Gene Expression Database. 75

29B. Testing Non-Linearity and Adjusting Lowess Transformation in cDNA Microarray Data Sets. 75

30B. Molecular Class Discovery in Cancer and Clinical Outcome Prediction using Genome-Wide Gene Expression Profiling: A Case Study on Ovarian Carcinomas. 75

31B. "Spotting Error" in Microarray Data. 75

32B. An Algorithm for Identifying Regulatory Relationships in Single-mutant Gene Expression Data. 75

33B. Microarray Analysis of the Developing Mouse Cerebellum. 76

34B. Elucidation of Genes Involved in HTLV-I-induced Transformation using the K-harmonic Means Algorithm to Cluster Microarray Data. 76

35B. Between Group Eigen Analysis: A Simple and Flexible Class Prediction Method for Gene Expression Data. 76

36B. MAP: Microarray Annotation Program. 76

37B. Development of a Relational Toxicogenomics Database for the Prediction of Chemical Toxicity. 76

38B. No poster. 77

 

1B. Quality Assurance Methods for Processing Microarray Imagery.

Peter Bajcsy, Ph.D.1, 1 Zonglin L. Liu, Ph.D.2, and Lei Liu, Ph.D.3. 1National Center for Supercomputing Applications, 605 East Springfield Avenue, Champaign, IL 61820, 2Department of Animal Sciences, University of Illinois, Urbana, IL 61801 and 3The W. M. Keck Center for Comparative and Functional Genomics, University of Illinois at Urbana-Champaign, 330 ERML, 1201 W. Gregory Dr., Urbana, IL 61801.

pbajcsy@ncsa.uiuc.edu, z-liu@staff.uiuc.edu and leiliu@uiuc.edu

 

We present three quality assurance (QA) methods for processing DNA microarray images. The three methods are based on signal-to-noise ratio, topology of a microarray dot and statistical distributions of background and foreground (signal) pixel intensities inside of a microarray grid cell. The methods are applied to DNA microarray images to detect systematic errors and remove any unreliable information from further analysis.

Long Abstract

 

 

2B. Data Management and Analysis for Gene Expression Array.

Olga Krebs, Rolf Kabbe, Karlheinz Gross and Roland Eils. German Cancer Research Center, Heidelberg, Germany.

o.krebs@dkfz.de

 

We have designed and implemented an array informatics system which integrates data management and analysis and is intended to support and integrate RNA expression data with other kinds of functional genomics data. Its functionality ranges from the storage of the data in relational data base management systems (currently Oracle RDBMS running on a Unix system) and Data Warehouse to front-end tools for the presentation and maintenance of the data.

Long Abstract

 

 

3B. An Empirical Comparison Of Methods For Detecting Differentially Expressed Genes In Cancer Datasets.

Soumyaroop Bhattacharya, Tue Tri Nguyen, Satish Patel, Jia Ke and James Lyons-Weiler. Center for Bioinformatics and Computational Biology, University of Massachusetts Lowell, United States of America.

Soumyaroop_Bhattacharya@student.uml.edu

 

We compared the performance of a number of simple methods for the analysis of microarray data using empirical and simulated datasets based on their Predictive Utility (proportion of datasets that the method returns a classification with the pre-defined bipartition ie. all in Group A separated from all in Group B).

Long Abstract

 

 

4B. No Poster.

 

 

5B. ArrayExpress, a Public Repository for Microarray Gene Expression Data.

Helen Parkinson, Niran Abeygunawardena, Ele Holloway, Misha Kapushesky, Gaurab Mukherjee, Philippe Rocca-Serra, Susanna Sansone, Ugis Sarkans, Mohammad Shojatalab, Jaak Vilo and Alvis Brazma. European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. parkinson@ebi.ac.uk

 

ArrayExpress is a public repository for gene expression data based at the EBI. ArrayExpress uses the MAGE-OM model (an OMG standard) and accepts submissions in MAGE-ML format and via the MIAME compliant submission and annotation tool MIAMExpress. MIAMExpress incorporates terms from the ontology developed by the MGED ontology working group.

Long Abstract

 

 

6B. No Poster.

 

 

7B. cDNA Microarray Images synthesized from the Real Spot Edge Templates.

Hye Young Kim1, Yong Sung Lee2, Young Seek Lee3 and Jin Hyuk Kim1. 1Dept. of Physiology and 2Dept. of Biochemistry, College of Medicine, Hanyang University, Seoul, 133-791, Korea and 3Dept. of Biochemistry and Molecular Biology, College of Science and Technology, Hanyang University, Ansan, 425-791, Korea.

hykim121@hanyang.ac.kr

 

Synthetic cDNA microarray images could be generated with mathematical model of cDNA distribution in the spots. Because these images are based on the random ratio of specific RNA in control and experimental group, they can be used for evaluation of accuracy of microarray experiments.

Long Abstract

 

 

8B. SOURCE: The Stanford Online Universal Resource for Clones and ESTs.

Hernandez-Boussard T, M. Diehn, A. Alizadeh, J.C. Matese, G. Binkley, H. Jin, J. Gollub, J. Demeter, J. Hebert, C.A. Ball, P.O. Brown, D. Botstein and G. Sherlock. Stanford University, Stanford CA USA.

boussard@genome.stanford.edu

 

The Stanford Online Universal Resource for Clones and ESTs (SOURCE - http://genome-www.stanford.edu/source) compiles information from biological public databases that can be used to annotate human, mouse or rat clones. SOURCE produces 'gene reports' to facilitate the analysis of large datasets, including gene expression patterns and physical mapping for a given gene.

Long Abstract

 

 

9B. Global Gene Expression Profiling of E. coli with Interruption of Acetate Production.

Won Jae Heo, Sung Ho Yoon and Sang Yup Lee. Korea Advanced Institute of Science and Technology, Daejeon, 305-701, Korea.

wanja@mail.kaist.ac.kr

 

Physiological functions of the acetate synthetic pathway in Escherichia coli have been studied for many years and now transcriptome analysis has been performed for global understanding of the role of the acetate producing pathway using DNA microarrays.

Long Abstract

 

 

10B. Gene Expression Profiling Analysis Augmented by Mathematically Transformed Gene Ontology.

Jill Cheng, John Martin, Melissa Cline, Tarif Awad and Michael A. Siani-Rose. Affymetrix, Inc., Santa Clara, California, USA.

jill_cheng@affymetrix.com

To improve current methodology on expression analysis, novel applications were developed based on the graph structure of gene ontology. This entails automatic biological interpretation of microarray results and a knowledge-guide clustering algorithm where expression profiles and functional annotations were combined. Analysis of results on a mouse hematopoietic time-series microarray experiment will be presented.

Long Abstract

 

 

11B. Higher-Order Gene Interaction Revealed by Log-Linear Analysis of DNA Microarrays.

Hiroyuki Nakahara1, Shin-ichi Nishimura1, Masato Inoue1, Gen Hori2 and Shun-ichi Amari1. 1Lab for Mathematical Neuroscience and 2Lab for Advanced Brain Signal Processing, RIKEN Brain Science Institute.

nishi@mns.brain.riken.go.jp

 

We introduce log-linear decomposition of higher-order interactions, based on the weak definition of conditional independence. Our measure, based on information geometry, can estimate fine structure of higher-order interaction. Using real datasets, we show that our method can reveal genetic 'switches' that modulate cellular functions.

Long Abstract

 

 

12B. A New Normalization Method for cDNA Microarray Data using Clustering Background.

Dong Mi Shin1, Hye Young Kim2, Myung Guen Chung3, Jin Hyuk Kim2, Young Seek Lee3 and Yong Sung Lee1. 1Department of Biochemistry, 2Department of Physiology, College of Medicine, Hanyang University, Seoul, Korea and 3Department of Biochemistry and Molecular Biology, College of Science and Technology, Hanyang University, Ansan, Korea.

yongsung@hanyang.ac.kr

 

New normalization method carried out after clustering the segments by ratio of spot intensity to background intensity was studied. This background dependent normalization decreased the number of genes whose expression levels were changed significantly and it could make their distribution more consistent through the whole range of signal intensities.

Long Abstract

 

 

13B. Genetic Network Analysis from Microarray Gene Expression Data via Bayesian Network and Nonparametric Regression.

Seiya Imoto1, Kim Sunyong1, Takao Goto1, Sachiyo Aburatani2, Kousuke Tashiro2, Satoru Kuhara2 and Satoru Miyano1. 1Human Genome Center, Institute of Medical Science, University of Tokyo and 2Graduate School of Genetic Resources Technology, Kyushu University.

imoto@ims.u-tokyo.ac.jp

 

We show a method for inferring genetic networks from cDNA microarray data by using Bayesian network model, which can capture even nonlinear structures between genes. Nonparametric regression models with B-splines and a criterion, BNRC, for evaluating the network are newly defined. We show its high performance with computational experiments.

Long Abstract

 

 

14B. Correlation of Hypoxia-Regulated Genes with Hypoxia-Enhanced Metastatic Ability by Gene Expression Profiling.

Patrick Subarsky and Richard P. Hill. Medical Biophysics, University of Toronto and Ontario Cancer Institute, Princess Margaret Hospital.

p.subarsky@utoronto.ca

 

Experimental metastatic potential of some tumor cell lines is transiently increased following hypoxic exposure. Hierarchical clustering of cDNA microarray time series data representing hypoxic exposure followed by oxic recovery demonstrated a globally repressed sub-set of genes and further discrete gene sub-sets with unique patterns of induction.

Long Abstract

 

 

15B. ISAcycle: Independent Subspace Analysis of Gene Expression Data.

Hyejin Kim, Seungjin Choi and Sung-Yang Bang. Dept. of Computer Science and Engineering, POSTECH.

marisan@postech.ac.kr

 

ISAcycle is an unsupervised learning method for gene expression data analysis, based on independent subspace analysis (ISA) which aims at finding independent feature subspace of multivariate data in an unsupervised fashion by maximizing the independence between norms of projections on linear subspaces. We apply ISA to cell cycle-related gene expression data analysis and show its usefulness: (1) the ability of assigning genes to multiple coexpression pattern groups; (2) the capability of clustering key genes that determine each critical point of cell cycle.

Long Abstract

 

 

16B. Computational Tools, Floral Primordial Development Pathways and Its Regulation using DNA Chip Technology in Arabidopsis.

Varsha Raja. Xintra ,Bioinformatics, Toronto, Ontario, Canada.

 

The primary goal of sequencing the entire Arabidopsis genome is to use this information to understand overall cellular, molecular and developmental processes. It can be further explored to understand the flowering mechanisms and its regulation at the molecular level. In order to accomplish this goal new experimental and computational tools will be needed. With the advent of structural and functional computational genomics, plant genome revolution has changed and promising toys such as high- throughput screening, imaging systems, micro array and chip technology will become powerful tools in order to understand biological processes such as flower development and enable the analysis of gene expression patterns.

 

 

17B. No poster.

 

 

18B. Linearization of DNA Macroarray data.

Yi Xie, Adele Cutler and Bart Weimer. Utah State University Logan, Utah 84322.

yixie@cc.usu.edu

 

The problem of data nonlinearity limits the usefulness of DNA expression arrays in functional genomic research. In this report, we demonstrate that we were able to linearize the raw data on a membrane-based DNA macroarray. The accuracy of these linear transformations was validated by a serial dilution experiment.

Long Abstract

 

 

19B. Statistical Analysis of Multi-Center Microarray Data.

Taesung Park1, Sung-Gon Yi, Hosik Choi1, Seung-Yeoun Lee2, Kee-Ho Lee3, Jung Kyoon Choi4, Sangsoo Kim4, Yeom Young Il4, Choi Jong Young5 and Daeghon Kim Chonbuk6. 1Department of Statistics, Seoul National University, Seoul, Korea , 2Department of Applied Mathematics, Sejong University, Seoul, Korea , 3Laboratory of Molecular Oncology, Korea Cancer Center Hospital , 4Korea Research Institute of Bioscience and Biotechnology, Taejon, Korea , 5The Catholic University of Korea, Seoul, Korea and 6National University, Jeonju, Chonbuk, Korea.

skon@kr.FreeBSD.org

 

For the case when the same type of microarrarys from different labs or clinical centers, we propose a statistical model to account for an additional variability caused by different clinical centers. The proposed model is based on the ANOVA model.

Long Abstract

 

 

20B. Bayesian Networks and Perturbation Experiments.

Iosifina Pournara and Lorenz Wernisch. Dept of Crystallography, Birkbeck College, University of London.

i.pournara@cryst.bbk.ac.uk

 

Bayesian Learning is used to construct genetic networks that describe how the expression level of each gene depends on the external simuli and on the expression levels of other genes. I am investigating the robustness of Bayesian Learning and the significance of the perturbation experiments on constructing genetic networks.

Long Abstract

 

 

21B. Quality Measures in cDNA Microarray Experiments.

Taesung Park1, Ki-Woong Kim1, Sunggon Yi1, Seung-Yeoun Lee2, Jin-Hyuk Kim3, Hea Young Kim3 and Yong Sung Lee Hanyang3. 1Department of Statistics, Seoul National University, Seoul, Korea, 2Department of Applied Mathematics, Sejong University, Seoul, Korea and 3University College of Medicine, Seoul, Korea.

tspark@stats.snu.ac.kr

 

Although several spot quality measures have been considered, it has not been investigated yet which quality measures are most sensitive to detect spots with poor quality. We perform a systematic comparison to investigate the sensitivity of these quality measures to detect spots with poor quality.

Long Abstract

 

 

22B. Large-scale Analysis of the Human and Mouse Transcriptomes.

Andrew I. Su 1, Michael P. Cooke 1, Keith A. Ching 1, Yaron Hakak 1, John R. Walker 1, Tim Wiltshire 1, Anthony P. Orth 1, Raquel G. Vega 1, Lisa M. Sapinoso 1, Aziz Moqrich 2, Ardem Patapoutian1,2, Garret M. Hampton 1, Peter G. Schultz1,2, and John B. Hogenesch1. 1Genomics Institute of the Novartis Research Foundation (GNF); San Diego, CA and 2The Scripps Research Institute; La Jolla, CA.

asu@gnf.org

 

We present a preliminary description of the normal mammalian transcriptome comprised of gene expression measurements from 91 human and mouse samples. We have mined this dataset for insights into molecular and physiological gene function, mechanisms of transcriptional regulation, disease etiology, and comparative genomics. These data are accessible at http://expression.gnf.org.

Long Abstract

 

 

23B. Experimental Design and Analysis of Microarray Data.

Justin C. Fay and Michael B. Eisen. Department of Genome Sciences, Lawrence Berkeley National Laboratory.

jcfay@lbl.gov

 

Replicated DNA microarray experiments were used to identify the amount of error produced during the labeling, hybridization and scanning steps in microarray experiments. Analysis of this error indicated certain probes are more variable among replicates than others. Two statistical methods were used to account for this variance in error and the results were compared.

Long Abstract

 

 

24B. Robustness of Ensemble Learning for Independent Component Analysis of Micro-Array Channel-Ratio Data.

David P Kreil1, David J C MacKay2 and Gos Micklem1. 1Department of Genetics, University of Cambridge, Downing Site, Cambridge CB2 3EH, UK. 2Inference Group, Cavendish Laboratory, Madingley Road, Cambridge CB3 0HE, UK.

d.kreil@gen.cam.ac.uk

 

Whether signatures of factor loadings (of gene expression ratios on the uncovered latent variables) were retained in multiple analyses and after exclusion of different data subsets was indicated well by their respective relative signature data powers. The linear model with a Gaussian error term, and mixtures of Gaussians for the components, reconstructs log-ratios better than untransformed data.

 

Long Abstract

 

 

25B. Comparative Analysis of Algorithms for Signal Quantitation from Oligonucleotide Microarrays.

Yoseph Barash1, Elinor Dehan2, Nir Friedman1 and Naftali Kaminski1. 1School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel and 2Functional Genomics, Sheba Medical Center, Tel Hashomer, Israel.

hoan@cs.huji.ac.il

 

A crucial issue in using microarray data is the ability to quantitate the mRNA expression signal. We present methodologies for comparing different algorithms for signal quantitation for oligonucleotide arrays. We apply these over publicly available datasets as well as on a dataset that contain several pairs of repeated hybridizations.

Long Abstract

 

 

26B. An Experimentally Optimized Algorithm for High Throughput, Parallel Determination of Gene Structure using Microarrays.

John Castle, Phil Garrett-Engele, Zhengyan Kan, Ralph Santos, Patrick Loerch, Chris Armour, Eric Schadt, Dan Shoemaker and Jason M. Johnson. Rosetta Inpharmatics, 12040 115th Ave NE, Kirkland, WA 98034.

john_castle@merck.com

 

We developed an algorithm to analyze high throughput, parallel microarray data to determine gene structure. The algorithm models microarray intensity data, examines residuals to predict tissue-specific alternate splicing events, and was optimized using the results of RT-PCR experiments. We present results for known and novel gene forms.

Long Abstract

 

 

27B. Comparison of Transcript Abundance Models in Gene Expression Microarrays.

J Lande, V Gimino, MI Hertz and RA King. Molecular, Cellular Developmental Biology and Genetics, and Department of Medicine Pulmonary Medicine Division.University of Minnesota.

land0038@umn.edu

 

Microarray analysis of bronchoalveolar lavage cells from lung transplant recipients was done to determine if there are reproducible genetic profiles associated with rejection events. Data analysis of oligonucleotide microarrays performed with two different measures of transcript abundance resulted in different sets of genes when the same criteria were applied.

Long Abstract

 

 

28B. Generation of a Human Fovea Gene Expression Database.

A.C. Ziesel1, B. Li1, S. Bernstein2 and P.W. Wong1. 1Dept of Biological Sciences, Univ. of Alberta and 2Dept of Ophthalmology, Univ. of Maryland.

aziesel@ualberta.ca

 

The results of a macroarray assay, combined with the development of Perl scripts for data collection and MySQL database structure, we have undertaken to develop a fovea gene expression database. We believe that this data structure will be of value, and fills a requirement in the interested research community.

Long Abstract

 

 

29B. Testing Non-Linearity and Adjusting Lowess Transformation in cDNA Microarray Data Sets.

Alex Loguinov1, Rus Yukhananov2, Saira Mian3 and Chris Vulpe1. 1Department of Nutritional Sciences and Toxicology, University of California, Berkeley CA; 2Department of Anesthesiology, Brigham and Women’s Hospital, Boston MA and 3Radiation Biology and Environmental Toxicology, Lawrence Berkeley National Laboratory, Berkeley CA.

Avl53@aol.com

 

A strong linear relationship between log-transformed Cy3- and Cy5-intensities on a slide is an important requirement for differential gene expression analysis. The well-known lack of fit test to check the straight line approximation requires replication and assumes homoscedasticity as well, which rarely happen. Instead of it we used a simple graphical alternative based on robust locally weighted regression smoother (lowess) and simultaneous prediction confidence band for linear regression to confirm the linear fit and MM-estimator to adjust lowess transformation.

Long Abstract

 

 

30B. Molecular Class Discovery in Cancer and Clinical Outcome Prediction using Genome-Wide Gene Expression Profiling: A Case Study on Ovarian Carcinomas.

Petre Dimitrov1, Ivo Meinhold-Heerlin2, Hilmar Lapp1 and Garret M. Hampton1. 1Genomics Institute of the Novartis Research Foundation, San Diego, CA 92121 and 2Department of Obstetrics and Gynecology, University of Bonn Medical School, Bonn, D-53105, Germany.

dimitrov@gnf.org

 

We present a variety of statistical and machine-learning methods employed to discover molecular classes and to predict clinical outcome, using genome-wide gene expression profiling data on ovarian carcinomas (12600 genes on oligonucleotide arrays, 60 samples from 57 patients). We will report the results we obtained and demonstrate the success of the various approaches.

Long Abstract

 

 

31B. "Spotting Error" in Microarray Data.

Marlena Maziarz1 and Rafal Kustra2. Departments of 1Computer Science and 2Public Health Sciences, University of Toronto, Toronto, Canada.

marlena@cs.utoronto.ca

 

Artifacts present in microarray data are often unnoticed and persist through the pre-processing, normalization and analysis stages, affecting the analysis results. Based on the data generated by several image-processing tools we present ways of identifying such artifacts, discuss their effect on analysis results, and suggest ways of minimizing their impact.

Long Abstract

 

 

32B. An Algorithm for Identifying Regulatory Relationships in Single-mutant Gene Expression Data.

Susannah L. Green1, Stephanie W. Ruby1 and Andreas Wagner2. 1University of New Mexico Health Sciences Center, Dept. of Molecular Genetics and Microbiology and 2University of New Mexico, Dept. of Biology.

slgreen@unm.edu

 

We have developed an algorithm to utilize information from the complete set of single-mutant profiles of an organism to distinguish direct genetic regulatory interactions from indirect. It is based on a graph-theoretical framework, and functions by reconstructing the simplest graph consistent with the gene expression data.

Long Abstract

 

 

33B. Microarray Analysis of the Developing Mouse Cerebellum.

Jenny Gu, David Gold, Bruce Hamilton and Michael Gribskov. University of California, San Diego.

jgu@sdsc.edu

 

The disrupted developmental cascade of the cerebellum in mice containing the staggerer mutation was investigated using microarray analysis. Instead of standard analytical methods that focus on individual expression intensity differences, methods integrating global properties of gene expression across a time course were used to identify genes affected by the mutation.

Long Abstract

 

 

34B. Elucidation of Genes Involved in HTLV-I-induced Transformation using the K-harmonic Means Algorithm to Cluster Microarray Data.

J. Lynn Fink1,2, Michelle Fung3, Kathleen L. McGuire3 and Michael Gribskov2. 1University of California, San Diego, Department of Biomedical Sciences, 2San Diego Supercomputer Center and 3San Diego State University, Department of Biology.

jlfink@sdsc.edu

 

The K-harmonic means clustering algorithm was implemented and applied to microarray data gathered from HTLV-I-transformed human T-cells. The clustered genes were then categorized accoring to biological function in order to determine which cellular processes might be important in the transformation process.

Long Abstract

 

 

35B. Between Group Eigen Analysis: A Simple and Flexible Class Prediction Method for Gene Expression Data.

Aedín C. Culhane1, Guy Perriére2 and Desmond G. Higgins1. 1Department of Biochemistry, University College, Cork, Ireland and 2Laboratoire de Biometrie et Biologie Evolutive, Universite Claude Bernard, Lyon 1, 43 boulevard du 11 Novembre 1918, 69622 Villeurbanne Cedex, France.

A.Culhane@ucc.ie

 

We describe the application of Between Group Analysis (BGA) to the analysis of microarray data. BGA is a supervised method that utilises known sub groupings in the data, and can be used when the number of variables (genes) exceeds the number of cases (arrays). Further details are available at http://miah.ucc.ie/BGA/.

Long Abstract

 

 

36B. MAP: Microarray Annotation Program.

Shih-Te Yang. Institute of Biochemistry, National Yang-Ming University, Taipei, Taiwan, 11221, ROC.

g38903040@ym.edu.tw

 

Microarray Annotation Program (MAP, http://ymbc.ym.edu.tw/map/) is a user-centric bioinformatics environment for microarray data analysis. MAP integrates gene, alternative splicing, pathways, cellular roles, disease, literature, protein-protein interaction, protein domains, and cross hybridization information together. If a significant gene list is provided, a pathway and gene ontology profile will be generated.

Long Abstract

 

 

37B. Development of a Relational Toxicogenomics Database for the Prediction of Chemical Toxicity.

L D Burgoon2,3,4, P C Boutros1,2,3, E Dere1,2,3, and T R Zacharewski1,2,3. 1Dept. of Biochemistry and Molecular Biology, 2Institute for Environmental Toxicology and 3National Food Safety and Toxicology Center and 4Dept. of Pharmacology and Toxicology, Michigan State University, East Lansing, MI.

burgoonl@msu.edu

 

Discussion of the development of a laboratory information management and genomic database for the comprehensive analysis of chemical toxicity is discussed. Limited queries of dbZach can be performed via our website:http://dbzach.fst.msu.edu.

Long Abstract

 

 

38B. No poster.

 

 

Systems Biology.

 

39B. Modeling and Simulation of Mitochondrial Energy Metabolism Using E-CELL System. 77

40B. SequenceX: A Workbench for Analysis and Visualization of Whole Genome Sequences. 77

41B. Estimation of the Circadian Rhythm Pathway Using E-CELL Dynamic Models. 77

42B. A Generic Interface for Integrating Spatial Information in E-CELL Simulation Environment Version 3. 77

43B. E-CELL 3: A Multi-Algorithm Cell Simulator. 78

44B. "E-Neuron project: A Kinetic Simulation of Cerebellar LTD Using the E-CELL system". 78

45B. Construction of Diabetes Model for Pathophysiological Analysis Using E-CELL Simulation Environment. 78

46B. A System-Level Approach to Reconstruction of Human Metabolism. 78

47B. Hybrid Algorithm for Large-Scale Modeling of the Cell and its Implementation in the E-CELL system. 78

48B. Modeling Genetic Regulatory Networks Using Dynamic Bayesian Networks. 79

49B. Pathwome, Pathnome and Simulation Unit. 79

50B. Heterocyst Formation in Anabaena sp. Strain PCC 7120: An Experimental System for Modeling Development. 79

51B. EML: A Cell Model Description Language for the E-CELL System. 79

52B. The Systems Biology Workbench v1.0: Framework and Modules. 79

53B. E-Neuron Project: a Simulation and Control Analysis of Hippocampal LTP/LTD using E-CELL System. 80

54B. On Dynamics Estimation from Given Time-Course Data Using GA and S-system: PEACE1. 80

55B. Longe Range Correlations in Human DNA - the Role of Repeats. 80

56B. Computer Dynamic Modeling of Gene Networks and the Analysis of the Action of Mutations. 80

57B. No Poster. 80

 

39B. Modeling and Simulation of Mitochondrial Energy Metabolism Using E-CELL System.

Katsuyuki Yugi and Masaru Tomita. Institute for Advanced Biosciences, Keio University, JAPAN.

chaos@sfc.keio.ac.jp

 

We constructed a dynamic kinetic model of mitochondrial energy metabolism using the E-CELL Simulation Environment. The model reached a steady-state after about 93000 seconds from the initial state. Further analyses of this model using the metabolic control analysis will be reported.

Long Abstract

 

 

40B. SequenceX: A Workbench for Analysis and Visualization of Whole Genome Sequences.

Jeong-Hyeon Choi and Hwan-Gue Cho. Advanced Bioinformatics Lab., Dept. of Computer Science, Pusan National University.

jhchoi@pearl.cs.pusan.ac.kr

 

SequenceX is a workbench that can analyze the occurring pattern and k-mer using string B-tree. This system provides various types of basic query and visualizes the whole genome sequence by the several methods in genome scale. This system also allows advanced query by concatenating a few basic queries, and the system supports an programming interface based on Java script.

Long Abstract

 

 

41B. Estimation of the Circadian Rhythm Pathway Using E-CELL Dynamic Models.

Fumihiko Miyoshi, Yoichi Nakayama, Tomoya Kitayama and Masaru Tomita. Institute for Advanced Biosciences, Keio University.

fumi@sfc.keio.ac.jp

 

We have estimated unknown gene regulation networks and protein interactions of circadian rhythm in Synechococcus sp. PCC7942 using E-CELL System with GA. So for, two self-oscillating circadian rhythm models have been estimated. It has been suggested that Kai proteins complexes play an important role in circadian rhythm.

Long Abstract

 

 

42B. A Generic Interface for Integrating Spatial Information in E-CELL Simulation Environment Version 3.

Yohei Yamada1,2, Kouichi Takahashi 1,2 and Masaru Tomita1. 1Institute for Advanced Biosciences, Keio University and 2Graduate School of Media and Governance, Keio University.

yoyo@sfc.keio.ac.jp

 

Simulation engines require native support for factors such as the structure and location of compartments, spatial concentration gradients for all species, and interactions between moving molecules. This presentation shows how a generic interface for integrating spatial information, shared by multiple algorithms, will be designed and implemented in E-CELL 3.

Long Abstract

 

 

43B. E-CELL 3: A Multi-Algorithm Cell Simulator.

Kouichi Takahashi, Tomoya Kitayama and Masaru Tomita. Institute for Advanced Biosciences, Keio Univ.

shafi@e-cell.org

 

We present computational procedures and system architecture of E-CELL 3 simulation software, which supports mixed-mode simulations with plug-in algorithm modules. Required features for simulators of biological cells, in which many components with different properties are interacting in diverse manners, are identified and discussed.

Long Abstract

 

 

44B. "E-Neuron project: A Kinetic Simulation of Cerebellar LTD Using the E-CELL system".

Yuka Okawara1, Shinichi Kikuchi1, Kenji Fujimoto1, Mai Abe1, Kohtaro Takei1,2 and Masaru Tomita1.

1Institute for Advanced Biosciences, Keio Universit and 2Department of Physiology, Toho University School of Medicine.

t01153yo@sfc.keio.ac.jp

 

Long-term depression (LTD) is a cellular basis of synaptic plasticity in cerebellar Purkinje cells, and whole molecular mechanisms are not well understood. We developed an LTD model using our computational environment, E-CELL System, and simulated cerebellar LTD. This model is useful to elucidate unidentified signal transduction pathways.

Long Abstract

 

 

45B. Construction of Diabetes Model for Pathophysiological Analysis Using E-CELL Simulation Environment.

Yasuhiro Naito1, Hiroshi Ohno1, Masaru Tomita1 and Hiromu Nakajima2. 1Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan, and 2Osaka Medical Center for Cancer and Cardiovascular Diseases, Osaka, Japan.

ynaito@sfc.keio.ac.jp

 

We are constructing a simulation model for pathophysiological analysis of diabetes using E-CELL simulation environment. This time, a prototype model including the pancreatic beta cell and the liver cell has been constructed. The simulation results and the performance of the model will be discussed.

Long Abstract

 

 

46B. A System-Level Approach to Reconstruction of Human Metabolism.

A.V. Markov, A.E. Bugrim, N. Gornostaev, Y. Lyupina, E. Kirillov, S. Ageev and T.A. Nikolskaya. GeneGo, Inc, 101 W. Madison Ave., New Buffalo, MI 49117, USA.

andrej@genego.com

 

We have developed an approach that allows integration of clinical information with high-throughput molecular data within the context of functional biological networks. In the core of this approach is a collection of human tissue-specific and condition-specific biochemical processes (pathways) that are linked by common inputs and outputs into maps (models).

Long Abstract

 

 

47B. Hybrid Algorithm for Large-Scale Modeling of the Cell and its Implementation in the E-CELL system.

Yoichi Nakayama, Katsuyuki Yugi, Ayako Kinoshita and Masaru Tomita. Institute for Advanced Biosciences, Keio University, Tsuruoka, 997-0035, Japan.

ynakayam@sfc.keio.ac.jp

 

We propose a hybrid method which is a combination of flux-based static modeling with dynamic modeling based on kinetic equations. We use the flux distribution analysis as a method for calculating each flux in stoichiometric models and implemented a set of solvers for the flux distribution analysis in E-CELL system.

Long Abstract

 

 

48B. Modeling Genetic Regulatory Networks Using Dynamic Bayesian Networks.

Y. Zeng, R. Khan, J. Garcia-Frias and G. Gao. Department of Electrical and Computer Engineering, University of Delaware.

zeng@eecis.udel.edu

 

A major challenge in computational biology is to uncover gene/protein interactions. This poster proposes a simplified dynamic Bayesian network (DBN) model to learn the regulatory scheme from the expression data. The framework utilizes a Correct Answer Known Evaluator (CAKE) to evaluate the behavior of the proposed DBN model.

Long Abstract

 

 

49B. Pathwome, Pathnome and Simulation Unit.

Yunchen Gong and Xin Zhao. Department of Animal Science, McGill University, 21111 Lakeshore Rd., Ste-Anne-de-Bellevue, H9X3V9, Quebec, Canada.

zhao@macdonald.mcgill.ca, ygong@po-box.mcgill.ca

 

Pathways exist in interaction databases. Collection of all the possible pathways is defined as pathwome. The subnet between two molecules is defined as a pathnet. All the pathnets composed pathnome. In a simulation unit, molecules could affect each other. Using DIP, a database for human pathwome, pathnome and simulation unit has been constructed.

Long Abstract

 

 

50B. Heterocyst Formation in Anabaena sp. Strain PCC 7120: An Experimental System for Modeling Development.

Carla Davidson1, Przemyslaw Prusinkiewicz2 and Mike Surette1. 1Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Calgary, Calgary, Canada and 2Department of Computer Science, Faculty of Science, University of Calgary, Calgary, Canada.

Carla@cpsc.ucalgary.ca

 

The genetic network responsible for heterocyst formation in Anabaena sp. Strain PCC 7120 is ideal for modeling. The spatial and temporal regulation of five genes responsible for heterocyst formation was observed using the reporter genes GFP and luxCDABE to facilitate the development of a mathematical model of the network.

Long Abstract

 

 

51B. EML: A Cell Model Description Language for the E-CELL System.

Takeshi Sakurada1,2, Ryosuke Suzuki1,2, Kouichi Takahashi1,2 and Masaru Tomita1,3. 1Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan 2Graduate School of Media and Governance, Keio University, Fujisawa, Japan and 3Department of Environmental Information, Keio University, Fujisawa, Japan.

sakurada@sfc.keio.ac.jp

 

We are developing EML (E-CELL Model description Language) as a language for describing cell models on the E-CELL3 system. EML is a subset language of XML which is suitable for large-scale, multi-formalism, and database-driven cell modeling and simulation. A Python library for EML processing is also under development.

Long Abstract

 

 

52B. The Systems Biology Workbench v1.0: Framework and Modules.

Michael Hucka1,2, Andrew Finney1,2, Herbert M. Sauro1,2, Hamid Bolouri1,2,3, John C. Doyle1,2 and Hiroaki Kitano1,2,4. 1California Institute of Technology, CA, USA, 2JST/ERATO Kitano Symbiotic Systems Project, Tokyo, Japan, 3University of Hertfordshire, UK and 4The Systems Biology Institute, Tokyo, Japan.

mhucka@caltech.edu

 

The Systems Biology Workbench (SBW) is an Open-Source Software Integration environment that enables C, C++, Delphi, Java, Perl or Python applications to discover each other and communicate using a lightweight network protocol. Modules available for SBW include tools for simulation, bifurcation analysis, and optimization. V1.0 is now available from http://www.sbw-sbml.org/.

Long Abstract

 

 

53B. E-Neuron Project: a Simulation and Control Analysis of Hippocampal LTP/LTD using E-CELL System.

Shinichi Kikuchi1, Taro Fuchikawa1, Kenji Fujimoto1, Yuka Okawara1, Kohtaro Takei1,2 and Masaru Tomita1. 1Institute for Advanced Biosciences, Keio University and 2Toho University, School of Medicine.

kikuchi@sfc.keio.ac.jp

 

We are developing an integrated model of hippocampus LTP and LTD. We pay our attention to Ca2+, CaM, CaN, CaMKII, NO and CREB. We are analyzing its mechanism using Metabolic Control Analysis (MCA). As the first result, it was found that CaM plays an important role in LTP and LTD.

Long Abstract

 

 

54B. On Dynamics Estimation from Given Time-Course Data Using GA and S-system: PEACE1.

Shinichi Kikuchi1,2, Daisuke Tominaga2, Masanori Arita1,2, Katsutoshi Takahashi2 and Masaru Tomita1. 1Institute for Advanced Biosciences, Keio University and 2Computational Biology Research Center, AIST. kikuchi@sfc.keio.ac.jp

 

We have proposed a method to predict system dynamics using GA and S-system. Here, we improved an evaluation function to eliminate futile parameters, and employed a novel crossover SPX and a gradual optimization strategy to raise the optimization ability. We succeeded to infer the dynamics of a small genetic network.

Long Abstract

 

 

55B. Longe Range Correlations in Human DNA - the Role of Repeats.

Stephan Beirer, Dirk Holste and Hanspeter Herzel. Institute for Theoretical Biology, Humboldt University Berlin.

s.beirer@biologie.hu-berlin.de

 

We study the role of Alu repeats with respect to long-range correlations (LRC) in chromosome 22. We find that Alu repeats are responsible for correlations below 300 bp and that the density of Alu repeats itself exhibits LRCs. However, LRCs can be found in Alu free sequences as well.

Long Abstract

 

 

56B. Computer Dynamic Modeling of Gene Networks and the Analysis of the Action of Mutations.

Alexander Ratushny, V.A. Likhoshvai, E.V. Ignatieva, E.A. Ananko, O.A. Podkolodnaya, Y.G. Matushkin, and N.A. Kolchanov. Institute of Cytology and Genetics, Siberian Branch of the Russian A, 10 Lavrentiev Ave., Novosibirsk, 630090, Russia.

ratushny@bionet.nsc.ru

 

Computer dynamic models of the function of two gene networks, (1) the regulation of cholesterol synthesis in the cell and (2) the regulation of erythroid cell differentiation, are constructed. The graphic interface of the gene networks and theirs computer dynamic models are accessible on a web page: http://wwwmgs.bionet.nsc.ru/mgs/gnw/gn_model/.

Long Abstract

 

 

57B. No Poster.

 

 

Functional Genomics.

 

58B. Proteomics of Synechocystis sp. strain PCC 6803: Identification of plasma membrane proteins. 81

59B. Predicting Chromatin Packing Based on Gene Expression Data. 81

60B. Predicting the Functional Effects of Mutations in G-Protein Coupled Receptors. 81

61B. Removal of Background Signal from in situ Data on the Expression of Segmentation Genes in Drosophila. 81

62B. A Novel Approach Combining Gene Variation and Gene Expression Data. 82

63B. Identification and Application of Gene Signatures for the Molecular Dissection of Disease. 82

64B. From RegulonDB to a Multigenomic Microbial Database of Operon Organization and Gene Regulation. 82

65B. Construction of Multiple Sequence Alignment (MSA) by Iteration of Conservation Controlled Hidden Markov Model (HMM) Application for Enzyme Family Classification and Functional Residues Identification. 82

66B. Modulewriter, a Tool that Provides a Perl Interface to SQL Databases. 82

67B. estBASE: Web-based Automatic EST Assembly and Annotation System for the Analysis of ESTs of the Young Antlers of the Deer. 83

68B. Functional Clustering of a Proteome using Relevance Network in Human Hepatocellular Carcinoma. 83

69B. Integrated Classification and Systemic Analysis of Protein Structure by ProDA. 83

70B. An Exhaustive Study of the Notion of Functional Link Based on Microarray Data, Protein-Protein Interactions and Pathways Information. 83

71B. GenomeMasker and GenomeTester: Tools for Design of High Quality Genomic PCR Primers. 83

72B. Statistical Processing of SELEX Results. 84

73B. Genome-Wide Studies of Spatial Clustering of Expressed Genes with a High-Resolution Human Transcriptome Map. 84

74B. Seeking the Vertebrate Secretome. 84

75B. Chronobiological Analysis for the Reliable Identification of Cell-Cycle Regulated Genes from Gene Expression Profiles. 84

76B. PlasmoCyc: A Pathway/Genome Database for Malaria. 84

77B. Correlation between Sequence Similarity and Minimal Network Distance in the Metabolic Network of E. coli. 85

78B. Reconstructing Alternative Splice Variants from EST Clusters. 85

79B. PseudoCyc, a Pathway-Genome Database for Pseudomonas aeruginosa. 85

80B. Predicting Protein Domains using Discriminative Margin Classifiers. 85

81B. ProDB-Tools: A System for Automated MS Data Analysis using Mascot. 85

82B. Analysis of Gene Expression Data with Fuzzy Mutual Information. 86

 

58B. Proteomics of Synechocystis sp. strain PCC 6803: Identification of plasma membrane proteins.

Fang Huang1, Ingela Parmaryd1, Fredrik Nilsson2, Annika L. Persson3, Himadri B. Pakrasi4, Bertil Andersson 1,5 and Birgitta Norling1#.. 1Department of Biochemistry and Biophysics, Arrhenius Laboratories for Natural Sciences, Stockholm University, Stockholm, SE-10691, Sweden, 2Astra Hässle, Department of Bioanalytical Chemistry, AstraZeneca R&D Mölndal, Mölndal, SE-43183 Sweden, 3Department of Zoological Cell Biology, The Wenner-Gren Institute, Stockholm University, Stockholm, SE-10691, Sweden and 4Department of Biology, Washington University, St Louis, MO 63130, USA and 5Division of Cell Biology, Linköping University, Linköping, SE-58185, Sweden.  #-for correspondence.

fang@dbb.su.se

 

The cyanobacterium Synechocystis 6803 is a model organism for studies on oxygenic photosynthesis. The plasma membrane proteins of Synechocystis were resolved by 2D-gel electrophoresis and 66 proteins were identified using MALDI-TOF mass spectrometry. We verify the presence of some core subunits of PSI and PSII in the membrane and find proteins involved in pili formation and Tol-secretion system.

Long Abstract

 

 

59B. Predicting Chromatin Packing Based on Gene Expression Data.

Gabriel Kreiman. Computation and Neural Systems, Caltech GNF.

gabriel@klab.caltech.edu

 

An important mechanism of transcriptional regulation involves chromatin packing. We here search for silent sections of the chromosome by comparing the median expression level within each segment to the overall distribution of expression values using a bootstrap procedure. Silent regions may indicate special chromatin packing or other common regulatory mechanisms.

Long Abstract

 

 

60B. Predicting the Functional Effects of Mutations in G-Protein Coupled Receptors.

S. Roy Kimura and Daniel Chasman. Variagenics, Inc., 60 Hampshire St., Cambridge, MA 02139.

rkimura@variagenics.com

 

We are interested in predicting the functional consequences of natural polymorphisms in human populations on G-protein coupled receptors (GPCRs), the targets of about 50% of today's drugs. Our approach involves devising sequence- and structure-based features reflecting the unique bases of GPCR function and applying them in statistically justified predictive models.

Long Abstract

 

 

61B. Removal of Background Signal from in situ Data on the Expression of Segmentation Genes in Drosophila.

Ekaterina Myasnikova and Maria Samsonova. St.Petersburg State Technical University, St.Petersburg, Russia.

myasnikova@fn.csa.ru

 

A method is developed for removal of non-specific background signal from in situ data on expression of segmentation genes in Drosophila. The expression patterns are rescaled numerically so that the data obtained under different experimental conditions are brought to the unified standard form with a zero background.

Long Abstract

 

 

62B. A Novel Approach Combining Gene Variation and Gene Expression Data.

Andreas Windemuth, Krishnan Nandabalan, Madan Kumar, Beena Koshy and Richard S. Judson. Genaissance Pharmaceuticals, Inc.

a.windemuth@genaissance.com

 

Most target discovery efforts have used variation measurements (SNPs and haplotypes) or gene expression levels. An approach to combine these data to yield novel insights into gene action and molecular pathways will be described. We discuss the utility of observed correlations between genetic, gene expression and phenotypic variation in the elucidation of biological mechanisms.

Long Abstract

 

 

63B. Identification and Application of Gene Signatures for the Molecular Dissection of Disease.

Romain Banchereau and Beth Basham. DNAX Research, Inc.

beth.basham@dnax.org

 

Clustering techniques were used to determine gene signatures for hematopoetic cell types from expression data. These cell-specific gene signatures may be used to understand tissue pathology. Here we show the determination of gene signatures for cells and a molecular interpretation of disease states in tissues obtained from mouse inflammation models.

Long Abstract

 

 

64B. From RegulonDB to a Multigenomic Microbial Database of Operon Organization and Gene Regulation.

Salgado H, Sanchez-Solano F, Diaz-Peredo E, Gama-Castro S, Garcia-Alonso D, Perez-Rueda E, Jimenez-Jacinto V, Medrano-Soto A, Moreno-Hagelsieb G and Collado-Vides J. Nitrogen Fixation Research Center, Av. Universidad s/N, Cuernavaca, Morelos, 62210, Mexico.

heladia@cifn.unam.mx

 

RegulonDB is a database on transcriptional regulation and operon organization in Escherichia coli K12. We have extended it to generate a new multigenomic database containing the annotations in Genbank for 70 genomes, together with predictions of operons and regulatory proteins. A new interface and associated tools have been developed. http://www.cifn.unam.mx/Computational_Genomics/regulondb/.

Long Abstract

 

 

65B. Construction of Multiple Sequence Alignment (MSA) by Iteration of Conservation Controlled Hidden Markov Model (HMM) Application for Enzyme Family Classification and Functional Residues Identification.

Weidong Tian1,2 and Jeffrey Skolnick 2. 1Department of Biology, Washington University, St. Louis, One Brookings Drive, St. Louis, MO 63139 and 2Laboratory of Computational Genomics, Donald Danforth Plant Science Center, 975 N. Warson Rd., St. Louis, MO 63132.

wtian@artsci.wustl.edu

 

We employed iteration of the conservation controlled Hidden Markov Model (HMM) application for constructing multiple sequence alignment (MSA) for enzyme family classification and functional residues identification. Both the alignment quality and the ability to detect functionally important residues of the resulted MSA are significantly improved when compared with the MSA constructed by clustalW.

Long Abstract

 

 

66B. Modulewriter, a Tool that Provides a Perl Interface to SQL Databases.

Christina Zheng, Poornaprajna Udupi and Micheal Gribskov. San Diego Supercomputer Center, University of California San Diego, La Jolla California.

czheng@sdsc.edu

 

Modulewriter is a tool that provides a customized API to SQL databases. Modulewriter provides access methods to retrieve and set values for columns in each table; add, delete and update rows in each table; and a method to retrieve the last ID in a table.

Long Abstract

 

 

67B. estBASE: Web-based Automatic EST Assembly and Annotation System for the Analysis of ESTs of the Young Antlers of the Deer.

Sujin Chae1, Hyung-Yong Kim1, Mi-Ra Roh1, Sun-Shim Choi1, Samwoong Rho2, Hyunsu Bae 2,3, Youngho Moon4 and Yong-ho In1. 1Bioinfomatix Inc, Seoul, Korea, 2College of Oriental Medicine, Kyunghee Univ. Seoul, Korea, 3Purimed Co. Seoul, Korea and 4Genotech Co. Daejon, Korea.

sujin@bioinfomatix.com

 

To make effective analysis of the vast amounts of expressed sequence tag(EST) data generated by the deer antlers EST project, we developed estBASE system – EST assembly and annotation system with web-based graphical interfaces which can provide EST assembly, BLAST, PROSITE, PRINTS, Profile search facilities executed in real time automatic manner.

Long Abstract

 

 

68B. Functional Clustering of a Proteome using Relevance Network in Human Hepatocellular Carcinoma.

Yujin Hoshida1, Takayuki Kawakami1, Fumihiko Kanai1, Naoya Kato1, Masao Omata1, Naoya Hatano2, Hisaaki Taniguchi2 and Shin-ichiro Nishimura3.1Dept. of Gastroenterology, Graduate School of Medicine, Univ. of Tokyo, Japan, 2Membrane Dynamics Project, RIKEN Harima Institute at SPring-8, Japan and 3Computational Biology Research Center, Japan.

hoshiday-int@h.u-tokyo.ac.jp

 

Relevance network is a useful functional clustering method for RNA expression data, and it is easy to apply to clinical samples. We extended this method to proteome data to perform comprehensive functional proteomic analysis. This allows us to perform an efficient survey of a therapeutic target protein.

Long Abstract

 

 

69B. Integrated Classification and Systemic Analysis of Protein Structure by ProDA.

Jun-Hyung Park1,2, Heung-Soo Park2, Byeung-Chul Kang1,2, Hee-Keyung Park2 and Cheol-Min Kim1. 1Busan Genome Center, College of Medicine, Pusan National University, Busan, Korea and 2Institute for Biomedical Research , SJ HIGHTECH Co. ,LTD. Busan, Korea.

kimcm@pusan.ac.kr

 

ProDA(Protein Domain Analysis System) integrated the information on protein structure in three major classification databases, SCOP, CATH and FSSP. ProDA provides systemic information on structural classification, hierarchical classification, the relation of similar proteins, and basic information about protein structure at one site. ProDA is available at http://www.proda.or.kr.

Long Abstract

 

 

70B. An Exhaustive Study of the Notion of Functional Link based on Microarray Data, Protein-Protein Interactions and Pathways Information.

William Dirks and Golan Yona. Center of Applied Mathematics and Department of Computer Science Cornell University.

bdirks@cam.cornell.edu

 

To understand functional relationships between genes based on microarray expression data we combine similarity in expression profiles with sequence similarity and information on interacting proteins and cellular pathways. We study the combination of these measures to better ascertain the type of functional relationships between genes. We evaluate our method on the publicly available Saccharomyces cerevisiae data.

Long Abstract

 

 

71B. GenomeMasker and GenomeTester: Tools for Design of High Quality Genomic PCR Primers.

Eric Reppo, Reidar Andreson and Maido Remm. Estonian Biocentre, University of Tartu and BioData Ltd.

mremm@ebc.ee

 

We have written two programs to achieve fully automatic PCR primer design for genomic applications. GenomeMasker masks all over-represented words in the genome. GenomeTester counts all occurrences of PCR primers in the genome and tests whether additional false PCR products could be produced with a given set of PCR primers.

Long Abstract

 

 

72B. Statistical Processing of SELEX Results.

Damien Eveillard and Yann Guermeur. LORIA - UHP.

Damien.Eveillard@loria.fr

 

The SELEX method is widely used to discover high affinity ligands to a large variety of different target molecules. However, the standard method implemented to process the resulting pool of ligands exhibits significant drawbacks. To overcome these shortcomings, we propose a statistical pre-processing of the output based on clustering techniques.

Long Abstract

 

 

73B. Genome-Wide Studies of Spatial Clustering of Expressed Genes with a High-Resolution Human Transcriptome Map.

Marcel F. van Batenburg1, H. J. Bussemaker1, B. van Schaik2, M. Roos2, R. Monjami2, A. H. van Kampen2, R. Versteeg2 and H. Caron2. 1Dept of Biological Sciences, Columbia University, US. and 2Academic Medical Centre, University of Amsterdam, The Netherlands.

marcelvb@science.uva.nl

 

We present a new, fully sequence-based version of the Human Transcriptome Map that integrates the UCSC "Golden Path" assembly and Unigene with SAGE data. We explore to what extend regions of increased gene expression (RIDGEs) correlate with high gene density and discuss the role of genome duplications. http://bioinfo.amc.uva.nl/HTM.

Long Abstract

 

 

74B. Seeking the Vertebrate Secretome.

Eric W. Klee, Stephen C. Ekker, Scott C. Fahrenkrug and Lynda B. M. Ellis. University of Minnesota, USA

klee0025@tc.umn.edu

 

Secreted proteins mediate cell-to-cell communication during vertebrate development and growth. The “secretome”, or collection of secreted proteins, is a high-priority target for functional-annotation and gene-expression analysis. Secreted proteins contain specific “signatures” in their amino termini. We describe a method for identifying the secretome from incomplete EST and genomic sequence data.

Long Abstract

 

 

75B. Chronobiological Analysis for the Reliable Identification of Cell-Cycle Regulated Genes from Gene Expression Profiles.

Jihun Kim, Ji Hoon Kim, Kack Kyun Kim and Ju Han Kim. SNUBiomedical Informatics, Seoul National University School of Medicine, Seoul 110-799, Korea.

juhan@snu.ac.kr

 

Chronobiological analysis can reliably identify periodical patterns from large-scale gene expression profiles. Objective determination of acrophase demonstrated high correlations between motifs-usage patterns and cell-cycle periodicity. Agonistic and antagonistic pairs of motifs are suggested and shown to be used to reconstruct genetic regulatory networks.

Long Abstract

 

 

76B. PlasmoCyc: A Pathway/Genome Database for Malaria.

Iwei Yeh1, Theo Hanekamp2, Hagai Ginsburg3, Peter Karp4 and Russ Altman1. 1Stanford University, 2University of Wyoming and 3Hebrew University and 4SRI International.

yeh@smi.stanford.edu

 

PlasmoCyc (http://plasmocyc.stanford.edu) is a pathway/genome database for Plasmodium falciparum, a causative agent of malaria. PlasmoCyc was constructed automatically using the EcoCyc framework. To the automatically generated database, we added additional malaria specific pathways from Malaria Parasite Metabolic Pathways (http://sites.huji.ac.il/malaria/). PlasmoCyc provides an interactive, queryable pathway/genome database for P. falciparum.

Long Abstract

 

 

77B. Correlation between Sequence Similarity and Minimal Network Distance in the Metabolic Network of E. coli.

Sara Light and Per Kraulis. Stockholm Bioinformatics Center, Stockholm University.

sara@sbc.su.se

 

We investigated the correlation between network distance and sequence similarity in the whole metabolic network of E. coli. The network representation was built using KEGG. A significant over-representation of homologous genes at network distance 1 was found, which was persistent in networks where the most ubiquitous compounds had been excluded.

Long Abstract

 

 

78B. Reconstructing Alternative Splice Variants from EST Clusters.

Alexander Sczyrba1, Stefan Kurtz1, Curtis R. Altmann2 and Robert Giegerich1. 1Faculty of Technology, Bielefeld University, Germany and 2Laboratory of Molecular Vertebrate Embryology, The Rockefeller University, New York, USA.

asczyrba@techfak.uni-bielefeld.de

 

Recent studies suggest that 40-60% of human genes show alternative splicing. We describe an approach to identify such splice forms based on sequence clusters. Pairwise matches within one cluster are assembled into a graph, resulting in a compact representation of different splice forms. The algorithm is independent of the availability of the genomic sequence.

Long Abstract

 

 

79B. PseudoCyc, a Pathway-Genome Database for Pseudomonas aeruginosa.

Pedro Romero and Peter Karp. Bioinformatics Research Group, SRI International.

promero@ai.sri.com

Our PathoLogic software was used to generate PseudoCyc, a pathway-genome database (PGDB), from Pseudomonas aeruginosa, strain PAO1's annotated genome. PseudoCyc includes the entire PAO1's genome, as well as its predicted metabolic network. Literature validation of some PathoLogic predictions was also performed. http://biocyc.org.

Long Abstract

 

 

80B. Predicting Protein Domains using Discriminative Margin Classifiers.

Adiel Cohen, Eleazar Eskin, Christina Leslie and William Stafford Noble. Department of Computer Science, Columbia University.

eeskin@cs.columbia.edu

 

We present a new method for predicting domains in protein sequences using discriminative margin classifiers. The methodology can efficiently determine and locate which out of thousands of possible domains are present in a protein sequence. The method is tested over the SCOP database.

Long Abstract

 

 

81B. ProDB-Tools: A System for Automated MS Data Analysis using Mascot.

Christian Rueckert and Andreas Wilke. Graduate School in Bioinformatics and Genome Research., Universitaetsstr. 25Bielefeld, 33615, Germany

Christian.Rueckert@Genetik.Uni-Bielefeld.DE

 

ProDB-Tools is a system for high-throughput mass spectrometry data storage and analysis. The proteins corresponding to the mass spectra are identified via use of the Mascot software. Data interpretation is aided by comparing multiple Mascot queries and display of facts from GenDB, our genome annotation system.

Long Abstract

 

 

82B. Analysis of Gene Expression Data with Fuzzy Mutual Information.

Carsten O. Daub, Janko Weise, Ralf Steuer, Joachim Selbig and Sebastian Kloska. Max-Planck-Institute of Molecular Plant Physiology.

daub@mpimp-golm.mpg.de

 

We introduce a new distance measure for gene expression data based on mutual information. To model measurement noise we introduce fuzziness into the data discretisation step. We determine the joint probability function for pairs of genes by modeling expression values as sums of distributions calculated from B-Spline Weight functions.

Long Abstract

 

 

 

Structural Biology.

 

83B. Structure Determination of Archaebacterial Proteins via NMR. 86

84B. Prediction of Protein Structure using a Laptop. 86

85B. The DSSPcont Database: Continuum Secondary Structure. 86

86B. The Protein Mutation Resource: A Tool for Protein Engineering. 87

87B. Automatic Classification of Protein Structures Using Gauss Integrals. 87

88B. Multiresolution Dynamic Programming for Alignment to Model GPCR Remote Homologs. 87

89B. Molecular Mechanism for Drug Resistance: a Case Ctudy of the HIV-1 Protease. 87

90B. Target Space for Structural Genomics Revisited. 87

91B. A New Structural Motif in the Large Ribosomal Subunit is revealed by Graph Theory. 87

92B. OmniMerge: An Algorithm for Systematic Conformational Search with Constraint Satisfaction. 88

93B. A XML Meta Parser: Application to RNAML, a Standard Syntax for Exchanging RNA Structure Information. 88

94B. Fast Core Structure Threading. 88

95B. Common Protein Surfaces and their Uses. 88

96B. Studying the Sequence-Structure Relationship across the Protein Kinase Family through Comparative Structural Analysis. 88

97B. 3D-Hit, Fast Structural Comparison of Proteins. 89

98B. No poster. 89

99B. Design and Simulation of a Genetic Circuit Amplifier. 89

100B. Essential Dynamics data for a deeper insight in motif analysis: the PAS protein case. 89

101B. Modelling of Information Flow in Cells. 89

102B. Deskside Hundred Teraflop Computing for Multiple in silico Applications. 90

103B. Hydration Free Energies and Water-Cyclohexane Partition Coefficients of Amino Acid Side Chain Analogues: A Comparison of the OPLS/AA and Gromos 96 Force Fields. 90

 

83B. Structure Determination of Archaebacterial Proteins via NMR.

Godwin Amegbey, Hassan Monzavi, Bahram Habibi-Nazhad and David Wishart. Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton AB. T6G 2N8.

dsw@redpoll.pharmacy.ualberta.ca

 

The putative protein products of two Methanobacteria thermoautotrophicum genes, MT0807 and MT0776 were investigated. MT0807 is an 85 amino acid residue protein thought to function as a thioredoxin. The preliminary NMR structure and enzymatic assays indicate this protein has thioredoxin-like activities. The backbone chemical shift assignment of MT0776 of unknown structure and function is also reported.

Long Abstract

 

 

84B. Prediction of Protein Structure using a Laptop.

Rajarshi Maiti, Haiyan Zhang and David Wishart. University of Alberta, 3118 Dentistry and Pharmacy Building, Edmonton, Alberta, T6G-2N8.

rmaiti@redpoll.pharmacy.ualberta.ca

 

A method for the determination of small protein structures using conformational search and distance geometry calculations is described. The algorithm is fast, allows up to 1 billion protein conformations to be randomly sampled on a standard desktop or laptop and can predict the protein structure with good accuracy in less than 15 minutes.

Long Abstract

 

 

85B. The DSSPcont Database: Continuum Secondary Structure.

Phil Carter, Claus AF Andersen and Burkhard Rost. Columbia University Bioinformatics Center, Columbia University.

carter@cubic.bioc.columbia.edu

 

DSSPcont uses a continuum secondary structure to represent protein flexibility. Using different hydrogen bond thresholds across ten discrete DSSP assignments a continuum is calculated using weighted averages. The DSSPcont assignment represents protein motion consistent with NMR models. The DSSPcont database reflects all proteins within PDB. DSSPcont is available from: http://cubic.bioc.columbia.edu/services/DSSPcont/

Long Abstract

 

 

86B. The Protein Mutation Resource: A Tool for Protein Engineering.

Werner G. Krebs, Murlidharan Nair and Phil Bourne. Integrative Biosciences, The San Diego Supercomputer Center MC 0527, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0527, USA.

wkrebs@sdsc.edu and wernerkrebs@yahoo.com

 

The Protein Mutant Resource is a database of families of PDB structures related by mutations. Structural studies of the effects of non-silent mutations on protein conformational change are an important key in deciphering the language that relates protein amino acid primary structure to tertiary structure. PMR is available at http://pmr.sdsc.edu

Long Abstract

 

 

87B. Automatic Classification of Protein Structures Using Gauss Integrals.

Peter Røgen1 and Boris Fain2. 1Department of Mathematics, Technical University of Denmark, Denmark and 2Department of Structural Biology, Stanford University, USA.

Peter.Roegen@mat.dtu.dk

 

The geometry of a protein backbone is characterized by 30 numbers (Gauss integrals) and the usual Euclidean metric compares protein shapes without any use of alignment. Computationally the method is very fast for all against all comparison and it gives a highly reliable automatic classification procedure for the CATH hierarchy.

Long Abstract

 

 

88B. Multiresolution Dynamic Programming for Alignment to Model GPCR Remote Homologs.

Lorraine Marsh. Long Island University, Brooklyn, NY 11201.

lmarsh@liu.edu

 

A dynamic programming method (Galign) for alignment of GPCRs with a rhodopsin template was developed. Alignment is scored at several levels including residue conservation, transmembrane domain identity and sequence similarity. Comparative modeling of a yeast GPCR remote homolog produces a model with explanatory power.

Long Abstract

 

 

89B. Molecular Mechanism for Drug Resistance: a Case Ctudy of the HIV-1 Protease

Wei Wang1 and Peter A. Kollman2. 1Dept. of Genetics, Stanford University, Stanford, CA 94305 and 2Dept. of Pharmaceutical Chemistry, UCSF, Box 0446, San Francisco, CA 94143.

wwang@genome.stanford.edu

 

A molecular mechanism for drug resistance of any drug target is proposed and a case study of the HIV-1 protease is discussed in detail. We use a computational algorithm to identify important mutations that cause drug resistance and suggest how to design resistance-evading drugs.

Long Abstract

 

 

90B. Target Space for Structural Genomics Revisited.

Jinfeng Liu and Burkhard Rost. CUBIC, Dept. of Biochemistry and Molecular Biophysics, Columbia University.

liu@cubic.bioc.columbia.edu

 

In the context of target selection for the North-East Structural Genomics Consortium (NESG), we dissected proteins in five eukaryotic proteomes into structural domains and domain-like fragments, explored a number of different strategies to cluster protein space, and found about 30,000 fragment clusters that may be suitable targets for structural genomics.

Long Abstract

 

 

91B. A New Structural Motif in the Large Ribosomal Subunit is revealed by Graph Theory.

Sébastien Lemieux and François Major. Université de Montréal.

lemieuxs@iro.umontreal.ca

 

Fundamental RNA structure building blocks were extracted from a minimal cycle basis of the large ribosomal subunit structural graph. A novel RNA motif was identified, similar to the GNRA tetraloop, but formed by nucleotides from two independent strands.

Long Abstract

 

 

92B. OmniMerge: An Algorithm for Systematic Conformational Search with Constraint Satisfaction.

Lisa Tucker-Kellogg and Tomas Lozano-Perez. Artificial Intelligence Laboratory, M.I.T.

ltk@ai.mit.edu

 

OmniMerge performs a systematic search to enumerate all conformations that satisfy geometric constraints, such as interatomic distance constraints from NMR spectroscopy. It defines a subproblem for every subchain of the molecule and enforces consistency between the solution sets of overlapping subchains to reduce the number of solutions and improve efficiency.

Long Abstract

 

 

93B. A XML Meta Parser: Application to RNAML, a Standard Syntax for Exchanging RNA Structure Information.

Martin Larose, Mélissa Jourdain and François Major. Université de Montréal.

jourdaim@iro.umontreal.ca

 

We present an XML meta-parser, XMLCPG, to construct portable C libraries, eliminating the needs of a specific grammar, such as the DTD. XMLCPG was developed for managing several versions of RNAML, a standard syntax for RNA structure information.

Long Abstract

 

 

94B. Fast Core Structure Threading.

Natasha L. Sefcovic1,2, Aron Marchler-Bauer1, Anna R. Panchenko1 and Stephan H. Bryant1. 1Computational Biology Branch, NCBI, NLM, NIH, 2Biology Department, Johns Hopkins University.

sefcovic@ncbi.nlm.nih.gov

 

In this project, we developed a fast threading program that uses a block alignment model with a dynamic programming alignment algorithm. Its performance is compared to another threading program that uses the same alignment model, but which has a Monte Carlo alignment algorithm instead.

Long Abstract

 

 

95B. Common Protein Surfaces and their Uses.

Stephen Long1,2, Peter Adams1, Darryn Bryant1, Mark Smythe2 and Tran Trung Tran2. 1Departments of Mathematics and 2Institute for Molecular Bioscience. All from the University of Queensland.

sml@maths.uq.edu.au

 

This work develops virtual screens to focus libraries of molecules based on common surface shapes of a database of protein-protein interactions. These surface shapes are defined by the Ca - Cb vectors of residues participating in this interaction. Elements of parallel computing and algorithm optimisation help extract these most frequent shapes.

Long Abstract

 

 

96B. Studying the Sequence-Structure Relationship across the Protein Kinase Family through Comparative Structural Analysis.

Jo-Lan Chung, Eric D. Scheeff, Ilya N. Shindyalov and Philip E. Bourne. San Diego Supercomputer Center, University of California at San Diego.

jlchung@sdsc.edu

 

Many protein kinases have very low sequence similarity yet share a catalytic fold. To analyze this situation, for this family and other protein families, we have developed a new graphical tool to study the relationship between sequence substitutions and the conserved components of a structural framework across structurally aligned proteins.

Long Abstract

 

 

97B. 3D-Hit, Fast Structural Comparison of Proteins.

Dariusz Plewczynski1,2, Jakub Pas3, Marcin von Grotthuss3 and Leszek Rychlewski3. 1Interdisciplinary Center for Mathematical and Computational Modeling, University of Warsaw, Pawinskiego Street 5a, 02-106 Warsaw, Poland, 2The Burnham Institute, 10901 North Torrey Pines Road, La Jolla CA 92037, USA, and 3Bioinformatics Laboratory, BioInfoBank Institute, ul. Limanowskiego 24A, 60-744, Poznan, Poland.

dplewczynski@burnham.org

 

3D-Hit is a fast scanning method for detecting structural similarities between proteins. The algorithm is based on a hashing function, which decomposes proteins into segments of 13 residues. The scanning procedures start with assigning a set of similar segments from database to each segment in the query protein. These initial hits are expanded by two iterations of structural superposition of larger segments of 99 and 299 residues. The method generates an alignment for the query protein by concatenating partial structural alignments.

Long Abstract

 

 

98B. No poster.

 

99B. Design and Simulation of a Genetic Circuit Amplifier.

Gianna De Rubertis1,2 and Stephen Davies1,3. 1Institute of Biomaterials and Biomedical Engineering, 2Department of Chemical Engineering and Applied Chemistry and 3the Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto.

gianna.derubertis@utoronto.ca

 

A genetic circuit amplifier is designed from elements of Lambda phage. Stochastic simulations are performed to analyze circuit performance. The dynamic response of the system to differing rise and decay times are traced back to the underlying reaction mechanisms. Protein degradation rate was observed to be critical to amplifier performance.

Long Abstract

 

 

100B. Essential Dynamics data for a deeper insight in motif analysis: the PAS protein case.

Pandini A., Meazzi V., Pavesi G., Mauri G. and Bonati L. Università degli Studi di Milano-Bicocca

alessandro.pandini@unimib.it

 

The PAS superfamily was chosen as a study-case for analysing evolutionary relationships in structure and function. We investigated the PAS signaling module with MD/ED Simulations and Bioinformatics techniques and proposed a methodology to relate chemico-physical and evolutionary information. A Perl-based tool developed for this purpose is presented.

Long Abstract

 

 

101B. Modelling of Information Flow in Cells.

Marco Weismueller, Rainer Koenig, Gaelle Dubois and Roland Eils. Div. "Intelligent Bioinformatics Systems", German Cancer Research Center, 69120 Heidelberg, Germany.

r.eils@dkfz-heidelberg.de, m.weismueller@dkfz.de

 

Cells use signal transduction to process information. We model signalling behaviour in an object-oriented way in the Swarm simulation system. We use the signal interaction data from TRANSPATH(R)Professional to feed the Swarm system with local interaction data. Swarm then processes dynamically signal flow in the signal network and derives by this emergent properties of the network.

Long Abstract

 

 

102B. Deskside Hundred Teraflop Computing for Multiple in silico Applications.

Ken Cameron, Ray McConnell and Brid O'Conaill. Clearspeed Technologies Ltd, 3110 Great Western Court, Huntsground Road, Bristol, UK, BS34 8HP, UK.

ray@clearspeed.com

 

Current computing architectures are not providing the performance needed for in silico applications. We propose a computer architecture that enables hundred Teraflop computing at the desk-side. The architecture addresses both the massive compute requirements and the need for low power, so that systems can be made practical for widespread deployment.

Long Abstract

 

 

103B. Hydration Free Energies and Water-Cyclohexane Partition Coefficients of Amino Acid Side Chain Analogues: A Comparison of the OPLS/AA and Gromos 96 Force Fields.

Justin L. MacCallum and D. Peter Tieleman. University of Calgary.

jlmaccal@ucalgary.ca

 

The hydration free energies and water/cyclohexane partition coefficients of amino acid side chain analogues have been calculted with the OPLS/AA force field and compared to results with the Gromos 96 force field and experimental results.

Long Abstract

 

 

 

Data Visualization.

 

104B. Introduction to the EBI XML Standards. 90

105B. MGOS Database. 90

106B. SNPDB--An Integrated SNP Assay Design and Validation Database System. 90

107B. Analysis, Visualization, and Management of Mouse ENU Mutagenesis Phenotyping Data. 91

108B. GenTerpret: The Display, Interpretation and Annotation of DNA Sequences Integrating 3rd Party Algorithms. 91

109B. Variations of Human Genomic Retroelement Distributions Associated with Age and Proximity to Genes. 91

110B. Display of Near Optimal Sequence Alignments. 91

111B. Lisp-PVM: Parallel Virtual Lisp Machines for Bioinformatics. 91

112B. Gene Expression Emulation using a Bayesian Gene Regulation Network Model and Tools. 92

113B. Integrated Gene-Chip Image Analysis. 92

114B. No poster. 92

 

104B. Introduction to the EBI XML Standards.

Runte K., Lopez R., Lombard V., Stoehr P. and Apweiler R. European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

krunte@ebi.ac.uk

 

The European Bioinformatics Institute (EBI) presents a standard for development and deployment of XML file formats. An internal working group, comprised of members from most major database providers at the EBI, has created not only definitions of commonly used data types, but also a set of coding and naming standards.

Long Abstract

 

 

105B. MGOS Database.

Thomas Tidwell, James Hatfield, Vishal Pampanwar and Kiran Rao. AGCoL.

thomas@genome.clemson.edu

 

MGOS is a web accessible database that correlates the genomic functionality of the host (oryza sativa) and the pathogen (Magnaporthe grisea). It will provide a query-able FPC-based map display of markers, clones, tiling path, EST matches, transcripts, and proteins. MGOS will support the exchange of annotation information via DAS.

Long Abstract

 

 

106B. SNPDB--An Integrated SNP Assay Design and Validation Database System.

Xiaoqing You, Heinz Hemken, Annie Titus, Lily Xu, Joanna Curlee, Francisco De La Vega and Gene Spier. Applied Biosystems, 850 Lincoln Centre Drive, Foster City, CA94404, USA.

youxn@appliedbiosystems.com

 

At Applied Biosystems, we developed the SNPDB database system to facilitate and support the design of over 150,000 ready-to-use probe-based 5’ nuclease SNP assays and the development of a linkage disequilibrium map. The system includes a centralized database, data acquisition tools and applications for data retrieval and analysis.

Long Abstract

 

 

107B. Analysis, Visualization, and Management of Mouse ENU Mutagenesis Phenotyping Data.

Ching KA, H. Lapp, C. Fletcher and MP Cooke. Genomics Institute of the Novartis Research Foundation.

ching@gnf.org

 

Presentation of high-throughput screening data for human review requires visualization which highlights patterns and trends. MouseTRACS is a series of programs, which provide visual browsing, and data graphing to identify families of mutagenized mice harboring significant phenotypes. Mice are automatically flagged, scheduled for retesting, and confirmed by the analysis programs.

Long Abstract

 

 

108B. GenTerpret: The Display, Interpretation and Annotation of DNA Sequences Integrating 3rd Party Algorithms.

Gordon B. Hutchinson. RabbitHutch Biotechnology Corporation.

hutch@rabbithutch.com

 

GenTerpret is a multi-platform, Java-based gene annotation tool that integrates the output of web and command line programs into a graphical user interface and provides a means to rapidly annotate DNA sequence. Bioinformatics developers can create their own interface using an open-architecture parse file format. GenTerpret is available at http://www.rabbithutch.com.

Long Abstract

 

 

109B. Variations of Human Genomic Retroelement Distributions Associated with Age and Proximity to Genes.

Louie N. van de Lagemaat1, Patrik Medstrand2 and Dixie L. Mager1. 1Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, B. C. and 2Dept. of Cell and Molecular Biology, Section for Developmental Biology, Lund University, Lund, Sweden.

lvandela@terryfox.ubc.ca

 

Transposable element distributions vary by the age of the element and position in the genome. By partitioning the human genome in various ways, we can study the interactions between the parasitic and host genomes. We find potential evidence of selection for and against different retroelements near genes.

Long Abstract

 

 

110B. Display of Near Optimal Sequence Alignments.

M.E. Smoot1, W.R. Pearson2 and S.A. Guerlain1. University of Virginia, 1Department of Systems and Information Engineering and 2Department of Biochemistry.

mes5k@virginia.edu

 

We have developed a web based Java/Perl/C++ software system to create and display near optimal alignments of protein or DNA sequences. The tool displays alignments sequentially and is designed to help investigators identify those parts of the alignments that are relatively invariant among a set of solutions.

Long Abstract

 

 

111B. Lisp-PVM: Parallel Virtual Lisp Machines for Bioinformatics.

Daniel McShan and Imran Shah. University of Colorado Health Science Center, 4200 E 9th Ave, C-245Denver, CO, 80120,USA.

Daniel.McShan@uchsc.edu

 

The speed, power, and flexibility of Common Lisp are combined with the Parallel Virtual Machine (PVM) library to create Lisp-PVM -- a robust distributed environment for solving computationally intensive bioinformatics problems. We used Lisp-PVM for large-scale genomic, proteomic and metabolic computations with nearly linear increase in performance.

Long Abstract

 

 

112B. Gene Expression Emulation using a Bayesian Gene Regulation Network Model and Tools.

Murali Rangan and Arie Avnur. Gene Networks Inc., 560 S. Winchester Blvd., San Jose, California, 95128, USA.

Arie@gene-networks.com

 

A new gene regulation model enables the simulation of gene expression treatment. Affected genes are annotated to specify their expression level modification. The simulation software tool can read this formal annotation and the model’s gene regulation representation to calculate new gene expression levels, simulating the treatment.

Long Abstract

 

 

113B. Integrated Gene-Chip Image Analysis.

David Kil, Murali Rangan and Arie Avnur. Gene Networks, Inc.560, S. Winchester Blvd, San Jose, California, 95128USA.

res0n98g@verizon.net

 

Gene-chip image analysis is generally regarded as a laborious, tedious, yet extremely important first step in understanding the intricate roles of genes. In this poster, we explore how the salient concepts in multiple disciplines can be integrated to create an efficient, yet powerful tool in gene-chip image analysis. The advantages of our approach are high-throughput processing and high-quality interpretation of gene-chip data.

Long Abstract

 

 

114B. No poster.

 

 

Phylogeny and Evolution.

 

115B. Taxonomy Workbench. 92

116B. Phylogenetic analysis of the mosaic Sinorhizobium meliloti pSymB replicon. 92

117B. Analysis of the Global Structure and Evolution of Metabolic Networks Reconstructed from Genomic Information for Various Organisms. 93

118B. New Likelihood Models for Phylogenetic Analysis of Protein Sequences. 93

119B. Large Scale Phylogenetic Analysis of Arabidopsis Duplications. 93

120B. Structural EM for Phylogeny When the Rate Varies Among Sites. 93

121B. Analysis of Grass Gene Families. 93

122B. PHYLLAB: MatLab Toolbox for Sequence Manipulation and Phylogenetic Analysis. 94

123B. Going, Going….Gone – Intron Streamlining in Complex Eukaryotic Genomes. 94

 

115B. Taxonomy Workbench.

Michael Wildpaner, Georg Schneider, Alexander Schleiffer and Frank Eisenhaber. Research Institute of Molecular Pathology (IMP).

mike@imp.univie.ac.at

 

The Taxonomy Workbench is a visual editor for sequence sets based on taxonomic classification. It can be used to navigate the organism space or selections defined by custom sequence sets. Sets can be modified with standard set operations, printed as taxonomic maps with sequence assignment statistics and downloaded for further processing. A public version of the Taxonomy Workbench can be accessed at http://mendel.imp.univie.ac.at/taxonomy/.

Long Abstract

 

 

116B. Phylogenetic analysis of the mosaic Sinorhizobium meliloti pSymB replicon.

Kim Wong and Brian Golding. McMaster University, Biology Department, 1280 Main St. West, Life Sciences 522, Hamilton, Ontario L8S 4K1, Canada.

kim@life.biology.mcmaster.ca

 

The mosaic nature of microbial genomes makes it difficult to decipher the evolutionary histories of these genomes. Nearest neighbor analysis, using traditional and predictive methods, reveals regions of gene order conservation and rearrangement, and genes likely involved in horizontal transfer. These methods were applied to the Sinorhizobium meliloti pSymB replicon and it was estimated that 13% of the genes were involved in horizontal transfer.

Long Abstract

 

 

117B. Analysis of the Global Structure and Evolution of Metabolic Networks Reconstructed from Genomic Information for Various Organisms.

Hongwu Ma and An-Ping Zeng. Microbial Systems and Genome Analysis, GBF - German Research Center for Biotechnology.

AZE@GBF.de

 

The metabolic networks of 81 fully sequenced organisms were in silico reconstructed and represented as directed graphs for structural and evolutionary analysis. Through considerations of physiologically meaningful pathway connections clear differences in the network structure of the three domains of organisms can be inferred that are consistent with evolutionary relationships.

Long Abstract

 

 

118B. New Likelihood Models for Phylogenetic Analysis of Protein Sequences.

M. W. Dimmic1, J. S. Rest2, O. Soyer4, S. E. Ingalls4, D. P. Mindell2,3 and R. A. Goldstein1,4. 1Biophysics Research Division, 2Ecology and Evolutionary Biology, 3Museum of Zoology and 4Department of Chemistry, University of Michigan, Ann Arbor, MI 48109-1055, USA.

mdimmic@umich.edu

 

We present novel phylogenetic models which are mechanistic and which provide for heterogeneous selective pressures on protein evolution. When applied to datasets of G-protein coupled receptors (GPCRs), the optimized models are able to highlight general stuctural features of the protein as well as possible adaptive changes in protein evolution.

Long Abstract

 

 

119B. Large Scale Phylogenetic Analysis of Arabidopsis Duplications.

Brad Chapman, John Bowers and Andrew H. Paterson. Plant Genome Mapping Laboratory, University of Georgia.

chapmanb@arches.uga.edu

 

Whole genome duplications appear to have played an important role in the evolution of Arabidopsis. We have analyzed two apparently separate sets of duplications and dated them within the evolution of dicot plants, using a suite of phylogeny-based scripts written in the python programming language.

Long Abstract

 

 

120B. Structural EM for Phylogeny When the Rate Varies Among Sites.

Tal Pupko1, Matan Ninio2, Itsik Pe'er Weizmann3 and Nir Friedman2. 1The Institute of Statistical Mathematics, Tokyo, Japan, 2Hebrew University, Jerusalem, Israel and 3Institute of Science, Rehovot, Israel.

matan@ismb2002.conf.ninio.org

 

The ``among site rate variation'' model of evolution is known to be superior to standard ML models. Nevertheless, no effective algorithm exists for reconstructing phylogenies from protein sequences with this model. We introduce such a procedure and investigate the effect of the more refined model on several examples.

Long Abstract

 

 

121B. Analysis of Grass Gene Families.

Christine G. Elsik and William R. Pearson. University of Virginia.

cge4n@virginia.edu

 

The assembly of gene families from divergent species (>50 Mya) using ESTs is difficult, because of low sequence similarity at the nucleotide level. We assembled gene families from ESTs of grass species (barley, maize, rice, sorghum, wheat), using the Arabidopsis proteome as a framework, in preparation for phylogenetic analysis.

Long Abstract

 

 

122B. PHYLLAB: MatLab Toolbox for Sequence Manipulation and Phylogenetic Analysis.

Pavel Morozov1 and Andrey Rzhetsky1,2. 1Columbia Genome Center, Columbia University, New York, USA and 2Department of Medical Informatics, Columbia University, New York, USA.

pm259@columbia.edu

 

PHYLLAB provides suitable tools for MatLab (MathWorks, Inc.) environment to perform variety of operations on sequence data, reconstruct phylogeny, estimate evolutionary model parameters, and manipulate phylogenetic trees.

Long Abstract

 

 

123B. Going, Going….Gone – Intron Streamlining in Complex Eukaryotic Genomes.

Richard McCaman1, Tae-Wan Ryu1, Wan-Ying Chang1, Douglas Eernisse1, Bradley Akitake2, Anna Bruett2, Ganesh Devendra2, Ambereen Kurwa2, Ana Lizama-Price2, Grace Tancaktiong2, and Nicholas Schisler2. 1Biology Department, California State, University at Fullerton, Fullerton, CA, U.S.A. and 2Biology Department, Pomona College, CA, U.S.A.

NJS04747@pomona.edu

 

Intron “streamlining,” defined as a reduction in intron number or length to achieve a minimal intron sequence burden, was analyzed using comparative genomics in eukaryotes. There appears to be a phylogenetically correlated reduction in intron number or length among fungi and a reduction in both intron parameters among insects.

Long Abstract

 

 

 

Data Mining.

 

124B. Analyzing Brain and Breast Cancer SAGE Libraries. 94

125B. Homophila: A Database of Human Disease Gene Cognates in Drosophila. 94

126B. BioMiner: An Integrated Framework for Data Mining in Functional Genomics. 94

127B. Motif Informatics: Integration of Sequence Information into Gene Expression Data Mining. 95

128B. Integration of Exon Predictions Using Multilayer Perceptron and Mixture of Experts Neural Networks. 95

129B. An Evaluation of Dimensionality Reduction Methods for Bio-Medical Spectra. 95

130B. Building a Database of Medium Resolution Electron Density Properties of Chemical Functions Applicable to all Biopharmacological Domains. 95

131B. Identification of ORFs from Organelle Genomes: A Data Mining Approach. 96

132B. Automatically Extracting Keyphrases for Clusters of Genes. 96

133B. A Novel Bayesian Clustering Approach for Predicted Regulatory Binding Sites. 96

134B. Software Development for High-Throughput DNA Sequencing. 96

135B. Ubiquitin and Ubiquitin-Like Pathway Proteins. 96

136B. Galaxy: a System for Flexible Data Tracking and High Throughput Analysis Pipeline. 97

137B. Search for Structure in Long Introns. 97

138B. Correlation between Intron Length and Its Base Composition. 97

139B. Method for the Best Model Selection of Paired Motifs in Promoter Regions of Genes. 97

140B.Intragenomic Reiterations Detection Using Hidden Markov Models. 97

141B. SVM-Decide: Towards a Decision Support System for Molecular Genetic Data. 98

142B. Easy Click: An SQL-Generating Application to Assist Researchers. 98

143B. An Adaptive Meta-Clustering Approach for Bioinformatics Applications. 98

144B. Discriminant Analysis of Multi-center Microarray Data. 98

145B. GeneBeans: a Bioinformatics Workflow and Data Management System. 98

146B. BAG: A Graph Theoretic Sequence Clustering Algorithm. 99

147B. Metabolic Cartography. 99

148B. Mining Three-Dimensional Chemical Structure Data. 99

149B. FAST: Functional Annotation of Sequence through Text. 99

150B. Assessing the Reliability of Self Organizing Maps. 99

151B. Finding Biological Themes in Microarray-derived Gene Lists with EASE: the Expression Analysis Systematic Explorer. 100

152B. Improving Literature-Based Discovery Support by Background Knowledge Inclusion for better Disease Candidate Gene Identification. 100

153B. Detecting Salient Changes in Gene Profiles. 100

154B. DiscoveryLink: IBM’s Data Integration Solution for the Life Sciences. 100

155B. Relationships between Alternative Splicing and Protein Structure. 101

156B. Understanding Scrambled Genes in Ciliates - Reverse Engineering a Biological Computer. 101

157B(i). BNS: A DNS-Inspired Biomolecule Naming Service. 101

157B(ii). OLAPOP-Online Analytical Processing of Proteins. 101

 

124B. Analyzing Brain and Breast Cancer SAGE Libraries.

Byron Kuo, Timothy Chan and Raymond Ng. Department of Computer Science University of British Columbia.

bkuo@cs.ubc.ca

 

The intent of the experiment is to attempt to characterize and find any similarities between seemingly different cancers (breast and brain) at the sub-cellular level. Based on publicly available SAGE libraries of cancerous and normal breast and brain tissues, we obtained a list of candidate cancer-related genes by applying the two-sample t-test and then analyzed their similarities at the gene expression level.

Long Abstract

 

 

125B. Homophila: A Database of Human Disease Gene Cognates in Drosophila.

Samson Chien, Lawrence T. Reiter, Ethan Bier and Michael Gribskov. University of California, San Diego / San Diego Supercomputer Center.

schien@sdsc.edu

 

Homophila is a database of human disease genes associated with their counterparts in Drosophila.

Homophila provides a comprehensive linkage between OMIM and Flybase in order to stimulate functional genomic studies in Drosophila that address questions concerning human genetic diseases. Homophila is available at http://homophila.sdsc.edu

Long Abstract

 

 

126B. BioMiner: An Integrated Framework for Data Mining in Functional Genomics.

Fazel Famili1, Roy Waker2, Alan Barton1, Qing-Yan Liu2, Ziying Liu1, Julio Valdes1, Youlian Pan1, Brandon Smith2, Junjun Ouyang1, Melanie Lehman2, Lynn Wei1 and Weiling Xu. 1Institute for Information Technology, 2Institute for Biological Sciences, National Research Council of Canada, Montreal Road, Ottawa, Ontario, K10 6R0, Canada.

fazel.famili@nrc.ca

 

This poster explains the role of integrated data mining systems in functional genomics. We will describe all stages of data preprocessing, the type of data and some of the useful knowledge that may be discovered from functional genomics data. The BioMiner architecture and its main functionalities along with some advantages of integrated architectures are explained.

Long Abstract

 

 

127B. Motif Informatics: Integration of Sequence Information into Gene Expression Data Mining.

Youlian Pan1, Roy Walker2, A (Fazel) Famili1 and Qing_Yan Liu2. 1Institute for Information Technology and 2Institute for Biological Sciences, National Research Council Canada, 1200 Montreal Road, Ottawa Ontario, K1A 0R6, Canada.

youlian.pan@nrc.ca

 

This paper rationalises the necessity of integrating information, such as patterns of transcription factor binding sites and transcription factors themselves, into the gene expression data mining processes. The paper also demonstrates the advantage of incorporating symbolic sequence data with numerical gene expression data analysis using an application in the BioMine project.

Long Abstract

 

 

128B. Integration of Exon Predictions Using Multilayer Perceptron and Mixture of Experts Neural Networks.

Youlian Pan1,2,3, Christoph W. Sensen4, Malcolm Heywood2 and Michael A. Shepherd2. 1Faculty of Computer Science, Dalhousie University, 6050 University Avenue Halifax, NS, Canada B3H 1W5; 2Canadian Bioinformatics Resources, NRC, 1411 Oxford St. Halifax, NS, Canada B3H 3Z1; 3Institute for Information Technology, NRC, 1200 Montreal Rd, Bldg. M-50, Ottawa, Ont. Canada K1A 0R6 and 4Department of Biochemistry and Molecular Biology, University of Calgary, 3330 Hospital Drive N.W., HSC 1150, Calgary, Alberta, Canada, T2N 4N1.

youlian.pan@nrc.ca

 

This paper investigates the potential of improving exon predictions by integrating GrailExp, GenScan, and MZEF using Multilayer Perceptron and Mixture of Experts neural networks. For human exon prediction, this integration system has significantly better recovery, by 25%, than any individual prediction engine alone. This system is available at http://www.cbr.nrc.ca/pany/integ.html.

Long Abstract

 

 

129B. An Evaluation of Dimensionality Reduction Methods for Bio-Medical Spectra.

Christopher Bowman and Richard Baumgartner. Institute for Biodiagnostics, National Research Council Canada, Winnipeg, Manitoba, Canada.

Christopher.Bowman@nrc.ca

 

We compare linear and nonlinear techniques, for identifying the intrinsic dimensionality of a data set, including local and global principal component analysis, and a novel implementation of the Whitney reduction network. The performance of these techniques is evaluated using independent training and validation sets drawn from magnetic resonance and mass spectroscopy.

Long Abstract

 

 

130B. Building a Database of Medium Resolution Electron Density Properties of Chemical Functions Applicable to all Biopharmacological Domains.

John Binamé1, Laurence Leherte1, Janice I. Glasgow 2, Suzanne Fortier3 and Daniel P. Vercauteren2. 1Laboratoire de Physico-Chimie Informatique, Facultés Universitaire Notre-Dame de la Paix, Namur, Belgium, 2 School of Computing, Queen's University, Kingston, ON, Canada, 3Department of Chemistry, Queen's University, Kingston, ON, Canada

daniel.vercauteren@fundp.ac.be

 

This work concerns the building of a database of topological properties of electron density functions of organic molecules at medium resolution to develop an automated way to reduce molecules to few relevant points. These points are further used in similarity search and pharmacophore proposition procedures applicable to all pharmacological domains.

Long Abstract

 

 

131B. Identification of ORFs from Organelle Genomes: A Data Mining Approach.

Sivakumar Kannan, Genevieve Boucher and Gertraud Burger. Canadian Institute for Advanced Research, Program in Evolutionary Biology, Départment de Biochimie, Université de Montréal, Montréal, Québec H3C 3J7, Canada.

siva@bch.umontreal.ca

 

Genomes of mitochondria and chloroplasts from diverse organisms carry on average 5 to 20 ORFs without assigned functions. In order to understand the biological role of these ORFs, we have developed a comprehensive analysis procedure using data mining methods. The approach and the predicted data will be presented.

Long Abstract

 

 

132B. Automatically Extracting Keyphrases for Clusters of Genes.

Rich Maclin1 and Mark Craven2. 1Computer Science Department, University of Minnesota, Duluth and 2Biostatistics and Medical Informatics Department, University of Wisconsin, Madison.

rmaclin@d.umn.edu

 

We present a tool for annotating high-throughput experiments by automatically extracting keyphrases to characterize clusters of genes. Our method autonomously associates genes with PubMed abstracts, extracts keyphrases that are statistically associated with gene clusters, and attempts to organize both genes and keyphrases into informative subclusters.

Long Abstract

 

 

133B. A Novel Bayesian Clustering Approach for Predicted Regulatory Binding Sites.

Zhaohui S. Qin1, Lee Ann McCue2, William Thompson2, Linda Mayerhofer2, Charles E. Lawrence2,3, and Jun S. Liu1. 1Department of Statistics, Harvard University, Cambridge, MA 02138, 2The Wadsworth Center, New York State Department of Health, Albany, NY 12201, 3Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY 12180.

qin@stat.harvard.edu

The availability of complete genome sequences has made possible the computational identification of hundreds of binding sites via cross-species comparisons. We describe a novel Bayesian motif clustering algorithm that predicts the number of clusters among these sites, and identifies the sites belonging to each cluster.

Long Abstract

 

 

134B. Software Development for High-Throughput DNA Sequencing.

Yaron Butterfield, Ran Guin, Ursula Skalska, Duane Smailus, Angelique Schnerch, Kevin Teague, Jacquie Schein, Marco Marra, Steven Jones and the Genome Sciences Centre. (http://www.bcgsc.bc.ca), British Columbia Cancer Research Centre, Vancouver, BC, Canada, V5Z 4E6.

ybutterf@bcgsc.bc.ca

 

We have established a bioinformatics pipeline to handle the large amount of DNA sequence data generated in our laboratory. We have created a laboratory information system where data is stored in a central relational database and in conjunction with Perl software, allows for efficient, high-throughput sequencing and processing.

Long Abstract

 

 

135B. Ubiquitin and Ubiquitin-Like Pathway Proteins.

Yanmei Lu, Nan Lin, Betty Huang, Donald G. Payan and Kunbin Qu. Rigel Pharmaceuticals, Inc., 240 E. Grand Ave, South San Francisco, CA 94080.

ylu@rigel.com

 

The ubiquitin pathway is involved in many important cellular processes. The NR database was mined for ubiquitin pathway related proteins using Gibbs Sampling and HMM. We identified around 900 proteins in the ubiquitin reaction cascade. Our results demonstrate diverse protein domain structure compositions and functions in ubiquitin-domain containing proteins and ubiquitin ligases.

Long Abstract

 

 

136B. Galaxy: a System for Flexible Data Tracking and High Throughput Analysis Pipeline.

Nan Lin, Davidson Wan, Jiao He, Yanmei Lu, Ying Huang, Donald G. Payan and Kunbin Qu. Rigel,Inc.

nlin@rigel.com

 

Galaxy is an enterprise solution system built at Rigel to construct and organize flexible “high throughput informatics pipelines” for data tracking, analysis and integration from diverse public sources with the overlay of the internal experiment data. It integrates functional platforms used by biologists with programming applications that are continuously updated.

Long Abstract

 

 

137B. Search for Structure in Long Introns.

Hideo Bannai1, Satoru Miyano1, Kenta Nakai1, Sascha Ott1 and Yoshinori Tamada2. 1University of Tokyo and 2Tokai University.

tamada@ims.u-tokyo.ac.jp

 

We use a program predicting short introns to analyse the processing of long introns (some thousand bases or longer). The focus is whether long introns contain a structure of short introns, such that the long introns can be processed by a series of splicing reactions rather than by one single reaction.

Long Abstract

 

 

138B. Correlation between Intron Length and Its Base Composition.

Hideo Bannai, Yoshinori Tamada, Sascha Ott, Kim Sunyong, Kenta Nakai and Satoru Miyano. Human Genome Center, Institute of Medical Science, University of Tokyo, 4-8-1 Minato-ku, Tokyo, 108-8639 Japan.

bannai@ims.u-tokyo.ac.jp

 

We analyzed the base compositions of introns available from Ensembl for H. sapiens, M. musculus, D. melanogaster and D. rerio, and have discovered a notable correlation between intron length and its base composition for H. sapiens and M. musculus. The tendency was not observed in D. melanogaster and D. rerio.

Long Abstract

 

 

139B. Method for the Best Model Selection of Paired Motifs in Promoter Regions of Genes.

Daisuke Shinozaki and Osamu Maruyama. Faculty of Mathematics, Kyushu University, Japan.

om@math.kyushu-u.ac.jp

 

We propose a method for the best model selection of paired motifs in promoter regions of a given set of genes. We apply our method to yeast data like sets of co-regulated genes and report the experimental result.

Long Abstract

 

 

140B.Intragenomic Reiterations Detection Using Hidden Markov Models.

Sébastien Hergalant, Bertrand Aigle, Bernard Decaris, Pierre Leblond and Jean-François Mari. LORIA (équipe Orpailleur, BP 239, 54506 Vandoeuvre-lès-Nancy, France) and Laboratoire de Génétique et Microbiologie (UMR UHP-INRA 1128, IFR 110, 54506 Vandoeuvre-lès-Nancy, France).

hergalan@loria.fr

 

We present a genomic data mining method in which the user describes a signal worked out by a second order HMM. This signal representing the probability to classify a nucleotidic residue or a group of residues in a particular state, allows the localization of repetitions in a complete bacterial genomic sequence.

Long Abstract

 

 

141B. SVM-Decide: Towards a Decision Support System for Molecular Genetic Data.

Jasmin Müller, Falk Schubert and Roland Eils. Intelligent Bioinformatics Systems, German Cancer Research Center.

j.mueller@dkfz-heidelberg.de

 

SVM-Decide is an approach to adapt knowledge from gene expression and genomic profiles for clinical decision support systems. Therefore we combine a support vector machine classifier with an explanation component for the physician. Furthermore an explicit competence model enables our system to classify only cases within its competence area.

Long Abstract

 

 

142B. Easy Click: An SQL-Generating Application to Assist Researchers.

Eric J. Grant and Dale L. Preston. Radiation Effects Research Foundation.

egrant@rerf.or.jp

 

Retrieving data from a relational database can be complex. ‘Easy Click’, a desktop PC application, dynamically generates SQL statements via a point-and-click interface shielding researchers from writing SQL, guaranteeing easy and consistent access to research data. An initialization file supplies Easy Click with variable definitions giving a completely customizable application.

Long Abstract

 

 

143B. An Adaptive Meta-Clustering Approach for Bioinformatics Applications.

Y. Zeng, J. Garcia-Frias, J. Tang and G. Gao. Department of Electrical and Computer Engineering, University of Delaware.

zeng@eecis.udel.edu

 

Because of the complexity of biological networks, clustering is a useful exploratory technique for analysis of gene expression data. This poster proposes a meta-clustering approach, which can extract the information from results of different clustering techniques adaptively and provides a better interpretation of the data patterns.

Long Abstract

 

 

144B. Discriminant Analysis of Multi-center Microarray Data.

Taesung Park1, Sung-Gon Yi1, Hosik Choi1, Seung-Yeoun Lee2, Kee-Ho Lee3, Jung Kyoon Choi4, Sangsoo Kim4, Yeom Young Il,4, Choi Jong Young5 and Daeghon Kim Chonbuk6. 1Department of Statistics, Seoul National University, Seoul, Korea 2Department of Applied Mathematics, Sejong University, Seoul, Korea , 3Laboratory of Molecular Oncology, Korea Cancer Center Hospital , 4Korea Research Institute of Bioscience and Biotechnology, Taejon, Korea , 5The Catholic University of Korea, Seoul, Korea and 6National University, Jeonju, Chonbuk, Korea.

pinebud2@snu.ac.kr

 

For the case when the same type of microarrarys from different clinical centers are collected , we propose new discrimination methods which account for variability caused by different clinical centers. The proposed methods are illustrated using the microarray data for liver cancer patients from three different clinical centers in Korea.

Long Abstract

 

 

145B. GeneBeans: a Bioinformatics Workflow and Data Management System.

Jeffrey L. Brown, Thomas C. Hudson and Kenisha V. Johnson. University of North Carolina at Wilmington.

hudsont@uncwil.edu

 

GeneBeans uses Enterprise Java Beans to provide biologists a graphical dataflow interface for constructing queries and analyses of gene index databases without the use of a textual query language. The tool is intended to make bioinformatics more generally practicable.

Long Abstract

 

 

146B. BAG: A Graph Theoretic Sequence Clustering Algorithm.

Sun Kim. School of Informatics Center for Genomics and Bioinformatics Indiana University, Bloomington.

sunkim@bio.informatics.indiana.edu

 

As more sequences become available in an exponential rate, sequence analysis on a large number of sequences becomes increasingly important. Sequence clustering algorithms are computational tools for that purpose. In this paper, we present our clustering algorithm BAG that uses two graph properties, biconnected components and articulation points.

Long Abstract

 

 

147B. Metabolic Cartography.

Daniel McShan, Shilpa Rao and Imran Shah. University of Colorado, Health Science Center, 4200 E 9th Ave, C-245Denver, CO80120, USA.

 

Daniel.McShan@uchsc.edu

 

We present a novel metabolic cartography approach for representing the metabolic space based on the biochemical properties of molecules. Summaries and visualizations of this space are presented, offering a quantitative and qualitative overview of the metabolome. We are using metabolic cartography in our research on pathway inference methods.

Long Abstract

 

 

148B. Mining Three-Dimensional Chemical Structure Data.

Sean McIlwain1, Arno F. Spatola2, David Vogel2, Slyvie Blondelle3 and David Page4.

1Department of Computer Sciences and Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI 53706, U.S.A., 2Institute for Molecular Diversity and Drug Design, Department of Chemistry, University of Louisville, Louisville, KY 40292, U.S.A., 3Torey Pines Research Institute for Molecular Studies, LaJolla, CA 92037, U.S.A. and 4Department of Biostatistics and Medical Informatics and Department of Computer Sciences, University of Wisconsin, WI 53706, U.S.A.

mcilwain@hotmail.com, spatola@louisville.edu, spatola@louisville.edu, sblondelle@tpims.org and page@biostat.wisc.edu

 

We apply inductive logic programming to the task of predicting anti-microbial activity, specifically, the ability of certain molecules to inhibit growth of Pseudomonas aeruginosa. This is done by taking into account the three-dimensional structure and biological activities from a database of tested molecules.

Long Abstract

 

 

149B. FAST: Functional Annotation of Sequence through Text.

Michael Elkaim and Chris Ponting. MRC Functional Genetics Unit, University of Oxford, Department of Human Anatomy and Genetics, South Parks Road, Oxford, OX1 3QX, United Kingdom.

michael.elkaim@anat.ox.ac.uk

 

The manual annotation of protein domains is an arduous task. We present FAST, a program that automatically annotates protein domains by extracting functional information, including key word stems and key quotes, from literature that is relevant to protein sequences containing these domains.

Long Abstract

 

 

150B. Assessing the Reliability of Self Organizing Maps.

Fajar Restuhadi1, Andrew Hayes2, Simon J. Hubbard1 and Stephen G. Oliver2. 1Dept. Biomolecular Sciences, UMIST, PO BOX 88, Manchester M60 1QD and 2School of Biological Sciences, Univ. of Manchester, Manchester M13 9PT.

adi@bms.umist.ac.uk

 

Self Organizing Maps (SOM) approaches were used to analyse our unique transcriptome data from high-throughput northern hybridisations. Objective function associated with the SOM algorithm for a constant size of neighbourhood and finite data set is the sum of squares intra-classes (SSIntra) extended to neighbour classes. At the end of its convergence, the SOM algorithm thus exactly minimizes the SSIntra function. We applied the bootstrap method to allow us to estimate the variability of SSIntra. If the SOM is computed several times according to the bootstrap principle, then we can calculate the mean and standard deviation of SSIntra of the distortion. Further, the variability of SSIntra can be estimated to asses the stability of the quantization error in the SOM.

Long Abstract

 

 

151B. Finding Biological Themes in Microarray-derived Gene Lists with EASE: the Expression Analysis Systematic Explorer.

Douglas A. Hosack and Richard A. Lempicki. Laboratory of Immunopathogenesis and Bioinformatics, SAIC Frederick.

Doug!@nih.gov

 

EASE is a software package that finds biological themes over-represented in any list of genes derived from microarray experiments or other high-throughput screening methods. It enables researchers utilizing these technologies to quickly find the interesting biological stories in their results.

Long Abstract

 

 

152B. Improving Literature-Based Discovery Support by Background Knowledge Inclusion for Better Disease Candidate Gene Identification.

Dimitar Hristovski1 and Borut Peterlin2. 1National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894 USA and 2Department of Human Genetics, Clinical Center Ljubljana Zaloska, 1000 Ljubljana, Slovenia.

dimitar.hristovski@mf.uni-lj.si and borut.peterlin@guest.arnes.si

 

We describe an interactive literature based discovery support system extended with background knowledge about disease/gene chromosomal or expression location. The goal of the system is to discover new, potentially meaningful relations between biomedical concepts (e.g. a gene candidate for a disease). The system is available at http://www.mf.uni-lj.si/bitola/.

Long Abstract

 

 

153B. Detecting Salient Changes in Gene Profiles.

Tanveer Syeda-Mahmood. IBM Almaden Research Center, 650 Harry Road, San Jose CA 95120.

stf@almaden.ibm.com

 

The functional state of an organism is determined largely by the pattern of expression of its genes. Salient changes in variation in expression of genes can give clues about important events, such as the onset of a disease. In this paper, we address, for the first time, the problem of salient change detection in gene profiles. In particular, we look for salient inflection points in the time-series of the gene profiles. Such inflection points are places where there is a significant change in the curvature of the gene profile, as revealed by a zero-crossing of the second derivative that is preserved over multiple scales of smoothing. By automatically selecting an optimal scale of description, we derive salient change points in genomic signals. The utility of salient chnage detection is demonstrated in the automatic identification of regulatory phase for genes active in the mitotic cell cyle of budding yeast.

Long Abstract

 

 

154B. DiscoveryLink: IBM’s Data Integration Solution for the Life Sciences.

Barbara Eckman, Laura Haas, Prasad Kodali, Eileen Lin, Julia Rice and Peter Schwarz. IBM Life Sciences.

baeckman@us.ibm.com

 

Integration of a widely diverse set of databases and applications is needed to carry out post-genomic bioinformatics research. We describe DiscoveryLink, IBM’s federated database offering, and illustrate how it is being used to provide integrated access to life sciences data, irrespective of where it is stored and its format.

Long Abstract

 

 

155B. Relationships between Alternative Splicing and Protein Structure.

Richard E. Green and and Steven E. Brenner. Univ. California, Berkeley.

ed@compbio.berkeley.edu

 

Alternative splicing may have an enormous impact on the protein coding diversity encoded by the human genome. We set out to investigate one aspect of this impact, the effect of alternative splicing on the domain organization of affected proteins. Surprisingly, there seems to be little correlation between domain organization and alternative splicing.

Long Abstract

 

 

156B. Understanding Scrambled Genes in Ciliates - Reverse Engineering a Biological Computer.

Andre Cavalcanti and Laura F. Landweber. Princeton University.

 

Scrambled genes are surprisingly common in spirotrichous ciliates. During cell development, these microorganisms must reorder the permuted pieces of such genes, tackling an intrinsically computational problem. We are developing tools for the identification and analysis of scrambled genes with the final goal of understanding the rules driving this biological process.

Long Abstract

 

 

157B(i). BNS: A DNS-Inspired Biomolecule Naming Service.

Robert Kincaid. Life Science Technologies Laboratory, Agilent Technologies.

robert_kincaid@agilent.com

 

BNS is prototype biomolecule naming service using Lightweight Directory Access Protocol (LDAP) to provide high-performance access to data derived from LocusLink. Gene and protein names and accessions can be resolved into various equivalents with very low latency. This enables a number of novel processes involving rapid conversions between accession schemes.

Long Abstract

 

 

157B(ii). OLAPOP-Online Analytical Processing of Proteins.

Deendayal Dinakarpandian and Vijay Kumar. School of Interdisciplinary Computing and Engineering, University of Missouri-Kansas City, Kansas City, MO 64110, USA.

dinakard@umkc.edu

 

OLAPOP refers to the online analytical processing of proteins, beyond flat-file retrieval and sequence analysis. This is based on a data warehouse approach to proteins with an emphasis on the provision of analytic facilities that allow for the study of protein properties in multiple dimensions.

Long Abstract

 

 

 

Genome Annotation.

 

158B. The Protein Information Resource for Functional Genomics and Proteomics. 101

159B. A Community-Aided Approach to Continually Updating Genome Annotations. 102

160B. Temporal Gene Expression Profiling by EST Annotation: a Resource for cDNA Microarray. 102

161B. The PEP Database: Proteins of Entire Proteomes. 102

162B. Comparative Analysis as an Important Factor in ORF Finding. 102

163B. Statisitics of Arabidopsis thaliana cDNAs, promoters and UTRs. 102

164B. Toward the Ubiquitous Access to the Biological Resources. 103

165B. Incidence and Expression of Alternatively Spliced Cancer Genes. 103

166B. Genome-Wide Search for Composite Clusters of Transcription Factor Binding Sites. 103

167B. Integrating Sequence Similarity and Literature. 103

168B. No poster. 103

169B. The Genostar Platform: a Software Environment for Genomic Annotation and Exploration. 104

170B. The GenoAnnot Module:  an Innovative Software for Genome Annotation. 104

171B. The GenoLink module: an Exploratory Software for Functional Annotation of Genomes. 104

172B. The GenoBool Module: a Statistical Exploratory Software for Genomic Data Mining. 104

173B. CluSTr - the Database of Clusters of SWISS-PROT+TrEMBL Proteins. 104

174B. Integrating Plant Genome Data for Knowledge Transfer and Discovery: What a Closer Look can Reveal from Comparing Complete and Genomeless plant Genomes. 105

175B. Analysis of Synteny in Mycobacterium tuberculosis. 105

176B. ViralGeneDB: a Database of Reannotated Viral Genomes. 105

177B. Oryza sativa Full-Length cDNA Genome Mapping: Searching for Alternative Splicing Sites. 105

178B. Detecting the Domain Structure of Proteins from Sequence Information. 106

179B. MGI Gets Loaded: Storage and Integration of Sequences as Database Objects in Mouse Genome Informatics. 106

180B. Protein-Based Exploration of Alternative Splicing in the Human Genome. 106

181B. Human-Mouse Genomic Analysis by Integration of Comparative Evidence. 106

182B. Annotating the Human Genome with Word Frequencies. 106

183B. In Silico Cloning and Computational Analysis Lead to Discovery and Characterization of Lgi Gene Family. 107

184B. Exon Prediction by Comparison. 107

185B. DIGITized Genes: Yet Another Archive of Putative Human Gene Sequences. 107

186B. No Poster. 107

 

158B. The Protein Information Resource for Functional Genomics and Proteomics.

Huang, H., W.C. Barker, Y. Chen, Z. Hu, P. Kourtesis, K.C. Lewis, B.C. Orcutt, B. Suzek, C.R. Vinayaka, L.L. Yeh, J. Zhang and C.H. Wu. Protein Information Resource, National Biomedical Research Foundation, Georgetown University Medical Center, 3900 Reservoir Road, NW, Washington, DC 20007, USA.

pirmail@nbrf.georgetown.edu

 

The PIR (http://pir.georgetown.edu) is a public resource of protein informatics and produces the Protein Sequence Database of functionally annotated protein. To assist protein exploration, the iProClass knowledgebase provides summary descriptions of protein family, function, structure and links to 45 biological databases. PIR-NREF contains 930,000 non-redundant sequences for comprehensive sequence searching.

Long Abstract

 

 

159B. A Community-Aided Approach to Continually Updating Genome Annotations.

Shannan J. Ho Sui1, Robert E.W. Hancock2 and Fiona S.L. Brinkman1. 1 Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, and 2 Department of Microbiology and Immunology, University of British Columbia, Vancouver, B.C., Canada.

brinkman@sfu.ca

 

Using an Internet-based, community-aided genome annotation approach, the Pseudomonas aeruginosa Genome Project published the complete P. aeruginosa sequence in the year 2000. We now report the methodology, tools developed, and effectiveness of using such an approach to continually update the genome annotation after the initial genome publication (see also http://www.pseudomonas.com).

Long Abstract

 

 

160B. Temporal Gene Expression Profiling by EST Annotation: a Resource for cDNA Microarray.

J. Yu1, R. Farjo1, S. P. MacNee1, W. Baehr3, D. E. Stamboliam4 and A. Swaroop1,2. 1Ophthalmology and 2Human Genetics, Univ Michigan, Ann Arbor, MI 48105. 3Moran Eye Center, Univ Utah Health Science Center, Salt Lake City, UT 84132. Ophthalmology, Univ Pennsylvania, Phila, PA 19014.

jindany@umich.edu

 

PUGA, a computer program written in Perl, was used to retrieve biological information from an online database for 7000 clones from 3 retinal cDNA libraries at different developmental stages. These fully annotated clones will serve as a resource for microarray printing and be used to establish temporal expression profiles in Silico.

Long Abstract

 

 

161B. The PEP Database: Proteins of Entire Proteomes.

Phil Carter, Jinfeng Liu and Burkhard Rost. Columbia University Bioinformatics Center, Columbia University.

carter@cubic.bioc.columbia.edu

 

PEP is a database of Proteins of Entire Proteomes. The database contains protein sequences, their predicted homology to other known sequences, and predicted secondary structure features. The PEP database is available as a set of flat files, with files for integration of PEP into SRS. PEP can be accessed from: http://cubic.bioc.columbia.edu/pep/.

Long Abstract

 

 

162B. Comparative Analysis as an Important Factor in ORF Finding.

M.Troukhan, V.Brover and N.Alexandrov. Ceres, Inc., 3007 Malibu Cyn. Rd., Malibu, CA, 90265, USA

mtroukhan@ceres-inc.com

 

A software system ORFER is presented that performs an open reading frame search in a given cDNA sequence with potential frame shifts. Amino acid similarity between several cDNAs from different organisms is one of the major factors in ORFER predictions.

Long Abstract

 

 

163B. Statisitics of Arabidopsis thaliana cDNAs, promoters and UTRs.

T.Tatarinova1, V.Brover2 and N. Alexandrov2. 1University of Southern California, Los Angeles, CA, 90089 Ceres, Inc. 3007 and 2Malibu canyon Road, Malibu CA,90265.

ttatarinova@ceres-inc.com

 

We observed a novel statistical feature of Arabidopsis thaliana promoters that remarkably correlates with the positions of transcription start sites; namely a CG skew peak. We are presenting an explanation of this phenomenon and a method of prediction of transcription start site based on the statistical properties of promoter and untranslated regions.

Long Abstract

 

 

164B. Toward the Ubiquitous Access to the Biological Resources.

Satoru Miyazaki1, Yasumasa Shigemoto2, Masahito Yamaguchi2 and Hideaki Sugawara1. 1 Center for Information Biology and DDBJ, National Institute of Genetics, 2 Life Science and Material Division, FUJITSU LIMITED.

smiyazak@genes.nig.ac.jp

 

We developed Web services that allow users to retrieve a sub-set of the International Sequence Databases on-the-fly to analyze it. The service is composed of several SOAP servers and series of classes for JAVA and Perl language to develop the client application. The Web service is available at http://xml.nig.ac.jp/.

Long Abstract

 

 

165B. Incidence and Expression of Alternatively Spliced Cancer Genes.

Janet Kelso1, Alan Christoffels1, Soraya Bardien1, Johann Visagie2, Tania Hide2, Andrew Simpson3, Jamborestes Consortium3 and Winston Hide1. 1SANBI. University of the Western Cape, South Africa. 2Electric Genetics Pty Ltd, Cape Town, South Africa and 3Ludwig Cancer Institute & collaborating universities, Sao Paulo, Brazil.

janet@sanbi.ac.za

 

Successful mining of expression information can be facilitated by the use of standardised nomenclature. This nomenclature needs to capture and present available data in an appropriate manner in order to allow for the extraction of expression profiles relevant to disease phenotypes. We have developed a system which integrates transcript information and genomic sequence for the identification of alternatively spliced cancer genes, and incorporates a controlled vocabulary for the description of the expression state of alternatively spliced candidates.

Long Abstract

 

 

166B. Genome-Wide Search for Composite Clusters of Transcription Factor Binding Sites.

Deineko I. and Kel A. Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia, BIOBASE GmbH, Halchtersche Strasse 33, 38304 Wolfenbuettel, Germany.

ake@biobase.de

 

Our study focused on genome-wide searching of clusters of transcription binding sites using statistical approach. The main goal is searching for clusters that contain different types of binding sites. Application on a human genome yields a number of highly significant clusters some of them in a proximity to the known genes.

Long Abstract

 

 

167B. Integrating Sequence Similarity and Literature.

Tor-Kristian Jenssen, Torbjørn Rognes and Eivind Hovig. PubGene AS, Forskningsveien 2A, Box 180 Vinderen, Oslo, NA, NO-0319, Norway.

tkj@pubgene.com

 

The PubGene analysis system links genes based on their co-citation in the scientific literature and also extracts gene relationships based on gene expression patterns from microarray experiments, facilitating discovery of biological links between expression patterns and the biomedical literature. We are now expanding the concept, by integrating gene sequence similarity associations into the PubGene analysis system.

Long Abstract

 

 

168B. No poster.

 

169B. The Genostar Platform: a Software Environment for Genomic Annotation and Exploration.

Christophe Bruley1, Veronique Dupierris1, Gilles Faucherand2, Alain Viari1 and François Rechenmann1. 1INRIA Rhône-Alpes, 655 avenue de l'Europe, Montbonnot, 38334 Saint Ismier Cedex, France, 2 GENOME express, 11 chemin des prés, 38944 Meylan, France.

Francois.Rechenmann@inrialpes.fr

 

Genostar is a modular software platform dedicated to genome annotation and exploration. It currently includes three main modules: GenoAnnot, Genolink and GenoBool, running on top of an object-oriented kernel that is responsible for data representation and for methodological integration. Genostar is written in JavaTM and runs on most operating systems.

Long Abstract

 

 

170B. The GenoAnnot Module:  an Innovative Software for Genome Annotation.

Hélène Rivière-Rolland1, Stéphane Declère1, Gilles Faucherand, Pierre-Emmanuel Ciron2, Christophe Bruley2, Anne Morgat2, Francois Rechenmann, Alain Viari2 and Yves Vandenbrouck1. 1GENOME Express, 11 chemin des prés, 38944 Meylan – France and 2INRIA Rhône-Alpes, 655 avenue de l’Europe, 38334 Saint Ismier Cedex – France.

y.vandenbrouck@genomex.com

 

GenoAnnot is one of the three modules embedded in the Genostar platform (www.genostar.org). It focuses on genome annotation or re-annotation and aims at the identification of features on genomic sequences. It provides a comprehensive set of sequence analysis methods, embedded into customizable strategies, and a powerful cartographic interface.

Long Abstract

 

 

171B. The GenoLink module: an Exploratory Software for Functional Annotation of Genomes.

Patrick Durand1, Laurent Labarre1,2, Alain Meil1, Jean-Louis Divol1, Vincent Schächter1, Claudine Médigue2, Christophe Bruley4, Yves Vandenbrouck3, Alain Viari4, François Rechenmann4 and Jérôme Wojcik1. 1 Hybrigenics SA, 3-5 impasse Reille, 75014 Paris, France, 2 CNRS UMR8030/Génoscope, 2 rue Gaston Crémieux, BP 5706, 91057 Evry Cedex, France, 3 GENOME Express, 11 chemin des prés, 38944 Meylan, France and 4 INRIA Rhône-Alpes, 655 avenue de l'Europe, Montbonnot, 38334 Saint Ismier Cedex, France.

pdurand@hybrigenics.fr

 

GenoLink is one of the three modules embedded in the Genostar platform (www.genostar.org). It has been designed to combine, explore and visualize heterogeneous sources of information viewed as a large network of objects and associations. It extends the annotation process towards the characterization of the functions of genes products.

Long Abstract

 

 

172B. The GenoBool Module: a Statistical Exploratory Software for Genomic Data Mining.

Agnès Iltis1, Christophe Bruley1, François Rechenmann1 and Alain Viari1. 1 INRIA Rhône-Alpes, 655 avenue de l’Europe, 38334 Saint Ismier Cedex – France.

Alain.Viari@inrialpes.fr

 

GenoBool is one of the three modules embedded in the Genostar platform (www.genostar.org). It is devoted to the statistical analysis of genomic data. To this purpose GenoBool provides several multifactorial analysis and clustering tools together with biological coders (e,g. codon usage) and plotting facilities.

Long Abstract

 

 

173B. CluSTr - the Database of Clusters of SWISS-PROT+TrEMBL Proteins.

E.V. Kriventseva1, F. Servant1, T. Bruls2 and R. Apweiler1. 1EMBL Outstation - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK and 2Genomining, Montrouge France. Contact Email: zhenya@ebi.ac.uk.

zhenya@ebi.ac.uk

 

CluSTr (http://www.ebi.ac.uk/clustr) offers automatic classification of SWISS-PROT+TrEMBL proteins, based on all pair-wise comparisons using Smith-Waterman algorithm. Provided interface allows interactive analysis of protein clusters. CluSTr, covering more than 500,000 proteins, is a useful resource for protein space analysis and has already been used for the analysis of complete proteomes. (http://www.ebi.ac.uk/proteome).

Long Abstract

 

 

174B. Integrating Plant Genome Data for Knowledge Transfer and Discovery: What a Closer Look can Reveal from Comparing Complete and Genomeless plant Genomes.

H. Schoof, S. Rudd, H. Gundlach, G. Haberer, W. Karlowski, V. Nazarov, P. Kosarev, W. Mewes and K.F.X. Mayer. GSF Research Center for Environment and Health, Institute for Bioinformatics (MIPS), Munich, Germany. http://mips.gsf.de.

h.schoof@gsf.de

 

The Arabidopsis genome has been the model dataset and backbone for integration of data from various plant species, e.g. the draft rice genomes or the EST sequences available from a plethora of species as "genomeless genomes“. Automated procedures for data integration and analysis were utilized to fuel knowledge discovery.

Long Abstract

 

                                                                                                                                       

175B. Analysis of Synteny in Mycobacterium tuberculosis.

Martin Edwards. School of Crystallography, Birkbeck College, University of London.

m.edwards@mail.cryst.bbk.ac.uk

 

Synteny has been shown to be a useful tool in operon prediction and genome annotation. Previous analyses have concentrated on identification of conserved gene pairs between genomes, higher order connectivities have not been examined. This work uses dynamic programming to find clusters of homologous genes between organisms.

Long Abstract

 

 

176B. ViralGeneDB: a Database of Reannotated Viral Genomes.

Ryan Mills1, Michael Rozanov2, Tatiana Tatusov2 and Mark Borodovsky1,3. 1School of Biology and 3School of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA 2National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

millsr@amber.gatech.edu

 

We have reannotated more than 1600 virus and phage genomes available in GenBank and compiled a new database: ViralGeneDB. The interactive database interface allows the user to verify supporting evidence for each predicted/annotated protein using BLASTP, BLink, or SMART. The ViralGeneDB database is available at http://opal.biology.gatech.edu/GeneMark/ .

Long Abstract

 

 

177B. Oryza sativa Full-Length cDNA Genome Mapping: Searching for Alternative Splicing Sites.

Nobuyuki Kawagashira1, Yasuhiro Ohtomo2, Kazuo Murakami2, Kenichi Matsubara2, Jun Kawai3, Piero Carninci3, Yoshihide Hayashizaki3 and Shoshi Kikuchi1. 1National Institute of Agrobiological Sciences, Japan , 2Foundation for Advancement of International Science and 3RIKEN (The Institute of Physical and Chemical Research).

kawagash@affrc.go.jp

 

In the Rice Full-Length cDNA Sequencing Project, we sequenced 27,055 cDNA clones of Oryza sativa ssp. it japonica (Nipponbare). We identified 9,793 / 27,055 (35.9%) locus positions of cDNA clones by genome mapping using 1,519 BAC's/PAC's genomic sequences from IRGSP (Phase 2). 45 alternative splicing sites are found in this research.

Long Abstract

 

 

178B. Detecting the Domain Structure of Proteins from Sequence Information.

Niranjan Nagarajan and Golan Yona. Cornell University.

niranjan@cs.cornell.edu

 

One of the first steps in analysing proteins is to detect the constituent domains, or the domain structure of the protein. We developed a technique for detecting the domain structure of a protein from sequence information alone. The method is based on information theory principles and combines a diversity of domain-information-content measures with a neural network to predict the most likely positions of domain boundaries.

Long Abstract

 

 

179B. MGI Gets Loaded: Storage and Integration of Sequences as Database Objects in Mouse Genome Informatics.

Richard M. Baldarelli, Joel E. Richardson, Jim A. Kadin, Benjamin L. King, Lori E. Corbani, Sharon Cousins, Jon S. Beal, Jill Lewis, David B. Miers, Carol J. Bult, Judith A. Blake, Martin Ringwald, Janan T. Eppig and the Mouse Genome Informatics Group. The Jackson Laboratory, Bar Harbor, ME, USA 04609.

rmb@informatics.jax.org

 

Adapting to the needs of the genomics communities, MGI will soon represent all sequence data as distinct database objects and store actual sequence information for all mouse, human and rat sequence data. The powerful relational integration that bridges phenotype, expression and gene information in MGI will now extend to sequences.

Long Abstract

 

 

180B. Protein-Based Exploration of Alternative Splicing in the Human Genome.

Ann E. Loraine, Gregg A. Helt, Melissa Cline and Michael A. Siani-Rose. Affymetrix, Inc.

ann_loraine@affymetrix.com

 

Understanding how alternative splicing affects gene function is an important challenge facing modern-day molecular biology. To investigate this, a data-mining technique was used to identify genes in which alternative transcript structure affects conserved regions in the encoded proteins. Results and limitations of this approach will be presented.

Long Abstract

 

 

181B. Human-Mouse Genomic Analysis by Integration of Comparative Evidence.

Lingang Zhang, Vladimir Pavlovic, Charles R. Cantor and Simon Kasif. Dept. of Biomedical Engineering and Bioinformatics Program, Boston University, MA, 02215

vladimir@bu.edu

 

We analyzed a number of comparative features of human-mouse orthologs and demonstrated a modular gene prediction framework for integration of comparative evidence with prediction from GENSCAN. Results show that, if selected from orthologs at a right evolutionary distance, the comparative evidence can positively complement an ab initio prediction system.

Long Abstract

 

 

182B. Annotating the Human Genome with Word Frequencies.

Elizabeth Thomas, John Healy and Mike Wigler. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY.

thomase@cshl.org

 

We have developed an algorithm which generates the frequencies for all words of any length N for a given region of the genome. We are studying the patterns in these frequencies to gain new insights into discrete features of the human genome and the evolutionary processes that created the genome.

Long Abstract

 

 

183B. In Silico Cloning and Computational Analysis Lead to Discovery and Characterization of Lgi Gene Family.

Pavel Morozov, Sergey Kalachikov and Conrad Gilliam. Columbia Genome Center, Columbia University, New York, USA.

pm259@columbia.edu

 

The work represents a successful attempt of data mining leading to discovery of previously uncharacterized and not sequenced directly genes of a new Lgi gene family. Comprehensive computational analysis provided guidance for further experimental study of the gene family.

Long Abstract

 

 

184B. Exon Prediction by Comparison.

Anton Nekrutenko, Wen-Yu Chung and Wen-Hsiung Li. University of Chicago, 1101 East 57th Street, Chicago, IL6063, 7USA

anton@uchicago.edu

 

We developed a simple method for prediction of protein-coding regions from conserved genomic blocks. This is a new strategy for genome annotation, which can be used for prediction of new genes and for biological validation of computationally predicted exons (http://nekrut.uchicago.edu/eev/). Results of the genome wide application of our technique add a new twist to the gene number controversy.

Long Abstract

 

 

185B. DIGITized Genes: Yet Another Archive of Putative Human Gene Sequences.

Tetsushi Yada1, Yasushi Totoki2, Yoshio Takaeda3 and Toshihisa Takagi1. 1Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan, 2Genomic Sciences Center, RIKEN, Yokohama, Japan) and 3Mitsubishi Research Institute, Inc., Tokyo, Japan.

yada@ims.u-tokyo.ac.jp

 

DIGITized Genes are archives of putative human gene sequences which are expected to abundantly contain unknown true genes. DIGITized Genes are identified by using our novel ab initio gene-finder named DIGIT. It is known that DIGIT significantly improves exon level specificity in comparison with existing ab initio gene-finders.

Long Abstract

 

 

186B. No Poster.

 

Sequence Comparison.

 

187B. Multiple G-proteins Coupling to a Single G-protein-coupled Receptor in the Same Cell-Mechanistic Implications. 107

188B. Mauve: Multiple Genome Alignments. 107

189B. Analysis of Serine Proteases by Pattern Discovery Using the Tupleware Algorithm. 108

190B. Phylogenetic Tree-Based Hidden Markov Model. 108

191B. Parameter Landscape Analysis for Improving the Performance of Common Motif Detection Algorithms. 108

192B. Systematic Genomic Comparison of Three Brucella Spp. and a Data Model for Feature-Based Multiple Genome Analysis. 108

193B. BLink - Integrated Approach to BLAST Alignments Graphical Display. 108

194B. Identification of Functional Single Nucleotide Polymorphisms in Regulatory Elements. 109

195B. Clustering Sequences in a Metric Space - the MoBIoS project. 109

196B. Fast Identification of Tight Clusters of Proteins. 109

197B. Variance of Neural Network Algorithms to Gene Expression Classification. 109

198B. Intron Sequence Conservation in Complex Eukaryotic Genomes. 109

199B. Blasting the Exons Out of Introns. 110

200B. MGAlign, a Tool for Aligning mRNA Sequences to Genomic Sequences. 110

201B. A Tool for the Investigation of Conserved Noncoding Sequences. 110

202B. Smith-Waterman vs. BLAST: a Comparison using the WP76 C. elegans Protein Sequence Database. 110

203B. An EM Approach to Protein Multiple Sequence Alignment by the Partial Order Alignment (POA) Algorithm and Information Theory Based Scoring Scheme. 110

204B. Applying Protein Multiple Structural Alignments to Sequence Search and Fold Recognition. 111

205B. A Comparative Genomics via Wavelet Analysis for Closely Related Bacteria. 111

206B. Computational Identification of a Spo0A-phosphate Regulon that is Essential for Cellular Differentiation and Development in Gram-Positive bacteria. 111

 

187B. Multiple G-proteins Coupling to a Single G-protein-coupled Receptor in the Same Cell-Mechanistic Implications.

Irit Iitzhaki Van-Ham, Maya Tayer, Sagit Peleg, Nathan Dascal, Hagit Shapira and Yoram Oron. Department of Physiology and Pharmacology, Sackler Faculty of Medicine, Tel Aviv University, Ramat Aviv 699978.

Irity@post.tau.ac.il

 

In certain model systems, different G-protein-coupled receptors (GPCRs) exhibit apparently indiscriminate coupling to many G proteins diverse families. We have depleted endogenous Ga proteins in Xenopus oocytes in order to study more about their physiological role. We have studied the involvement of GPs in rapid homologous desensitization in Xenopus oocytes.

Long Abstract

 

 

188B. Mauve: Multiple Genome Alignments.

Aaron Darling1,2 Bob Mau2,3 Frederick R. Blattner4 and Nicole T. Perna2. Departments of 1Computer Science, 2Animal Health and Biomedical Sciences, 3Oncology and 4Genetics, University of Wisconsin-Madison.

#The first two authors contributed equally to this work.

darling@cs.wisc.edu

 

Mauve is a new system for multiple whole genome alignment employing algorithmic techniques whose complexity scales gracefully in the amount of sequence being aligned and have very sequential memory access patterns.  We align 7 enterobacterial genomes identifying chromosomal rearrangements and sequences unique to subsets of the organisms.

Long Abstract

 

 

189B. Analysis of Serine Proteases by Pattern Discovery Using the Tupleware Algorithm.

W.T.Rogers1, D.J. Underwood1, A.R. Moser1, D.R.Argentar2, K.M.Bloch2 and A.G. Vaidyanathan2. 1Bristol-Myers Squibb Co. and 2DuPont Company, Wilmington, DE, USA.

wade.rogers@bms.com

 

The Tupleware pattern discovery algorithm is used to analyze serine protease sequences whose intra-family identities range from 20.4% to 43.6% and inter-family identities range from 4.3% to 7.5%. Results are shown that illustrate functional and structural motifs inherent in widely distributed, sparse patterns which Tupleware is uniquely suited to discover.

Long Abstract

 

 

190B. Phylogenetic Tree-Based Hidden Markov Model.

Bin Qian and Richard A. Goldstein. Biophysics Research Division, University of Michigan, Ann Arbor, Michigan, U.S.A.

bqian@umich.edu

 

Using a tree-HMM scheme, we are able to explicitly include the phylogenetic information (both the phylogenetic tree topology and branch length) in the hidden Markov model. We test this idea on GPCR classification and are able to classify the GPCRs into functional groups with high accuracy.

Long Abstract

 

 

191B. Parameter Landscape Analysis for Improving the Performance of Common Motif Detection Algorithms.

Natalia Polulyakh, Michiko Konno, Toshihisa Takagi and Kenta Nakai. HGC, Tokyo University/Ochanomizu University.

nata@ims.u-tokyo.ac.jp

 

Detection of regulatory elements from the upstream sequences of potentially co-regulated genes now becomes a practically important problem. We systematically explored the optimal parameter setting of some motif detection programs with various practical situations, aiming for establishing a practical guide to significantly enhance their performance than their default value.

Long Abstract

 

 

192B. Systematic Genomic Comparison of Three Brucella Spp. and a Data Model for Feature-Based Multiple Genome Analysis.

David Sturgill and Cynthia Gibas. Department of Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA.

dsturgil@vt.edu

 

We have compared the genomes of three recently sequenced species of Brucella and identified a superset of known and hypothetical coding sequences. We present the details of our comparative analysis process and a data model for a database-backed multiple genome comparison pipeline.

Long Abstract

 

 

193B. BLink - Integrated Approach to BLAST Alignments Graphical Display.

Tatiana Tatusova, Eugene Yaschenko, Roman Tatusov and David Lipman. National Center for Biotechnology Information.

tatiana@ncbi.nlm.nih.gov

 

BLink - BLAST Links Integrated Kit is a web application that offers a new multi- dimensional view of NCBI's pre-computed protein neighbors in the context of color-coded BLAST alignments, editable taxonomic trees, conserved protein classes, protein domains, and 3D structures. Blink report is available for any of the proteins in Entrez.

Long Abstract

 

 

194B. Identification of Functional Single Nucleotide Polymorphisms in Regulatory Elements.

Boris Lenhard, Salim Mottagui-Tabar, Claes Wahlestedt and Wyeth W. Wasserman. Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden.

Boris.Lenhard@cgb.ki.se

 

Subtle variation within the human genome accounts for significant phenotypic effects. We describe a new approach to pinpoint single nucleotide polymorphisms (SNPs) with potential effect on gene regulation. Potential transcription factor binding sites are detected by cross-species comparison and screened for known SNPs predicted to alter function.

Long Abstract

 

 

195B. Clustering Sequences in a Metric Space - the MoBIoS project.

Rui Mao, Daniel P. Miranker, Jacob N. Sarvela and Weijia Xu. Department of Computer Sciences, University of Texas Austin, TX 78712.

rmao@cs.utexas.edu

 

Local sequence alignment using standard similarity matrices fails to form a metric distance function. We present the development of a metric PAM matrix. Using metric PAM we induce a hierarchical clustering of biological sequences; suggesting that tree-based indexing methods, typical of commercial databases may be extended to biological sequences.

Long Abstract

 

 

196B. Fast Identification of Tight Clusters of Proteins.

Boris Kiryutin and Roman Tatusov. National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health

tatusov@ncbi.nlm.nih.gov

 

Fast and easy to implement algorithm to find tight protein clusters is described. The procedure uses the generalization of the conception of reciprocal organism-specific best hit as clustering criteria. Each proteins accumulates the worst rank in BLAST hits list. The employment of dynamic programming technique leads to significant performance improvement.

Long Abstract

 

 

197B. Variance of Neural Network Algorithms to Gene Expression Classification.

Min Su, Mitra Basu and A. Toure. Dept. of Electrical Eng, The City College and The Graduate Center of CUNY.

basu@ccny.cuny.edu

 

We explore the ability of neural network algorithms to classify gene expression data (ged). We have used three types of network; backpropagation, radial basis, and projection pursuit. An average performance of 70% on test data was observed, while 100% training was obtained always. To improve performance, we combined all network outputs into a committee of network. Information extraction was thus amplified; an improved 85% accuracy was obtained.

Long Abstract

 

 

198B. Intron Sequence Conservation in Complex Eukaryotic Genomes.

Richard McCaman1, Tae-Wan Ryu1, Wan-Ying Chang1, Douglas Eernisse1, Bradley Akitake2, Anna Bruett2, Ganesh Devendra2, Ambereen Kurwa2, Ana Lizama-Price2, Grace Tancaktiong2 and Nicholas Schisler2. 1Biology Department, California State University at Fullerton, Fullerton, CA, U.S.A. and 2Biology Department, Pomona College, CA, U.S.A.

NJS04747@pomona.edu

 

All-against-all BLAST comparisons (e = 10^-10) of spliceosomal intron sequences reveal an unexpected high level of sequence conservation within and among species that appears to be associated with introns at the 5’ ends of genes. Such conserved sequences apparently include transcription factors, repeated sequences, and other sequences of unknown function.

Long Abstract

 

 

199B. Blasting the Exons Out of Introns.

Richard McCaman1, Aya Kataoka1, Roger Francis1, Anh-Huy Le1, Paulette Acheson1, David Chan1, Kyoung-Seop Shin1, Tae Ryu1, Wan-Ying Chang1, Douglas Eernisse1 and Nicholas Schisler2. 1California State University at Fullerton and 2Pomona College.

rmccaman@fullerton.edu

 

Discovery of a gene embedded in another gene’s intron raises important and fundamental questions regarding transcriptional regulation and control of expression. Analysis of the BLAST output from a comparison of exon sequences with intron sequences from the same species, represent the first systematic study of “genes within genes”.

Long Abstract

 

 

200B. MGAlign, a Tool for Aligning mRNA Sequences to Genomic Sequences

T. K. B. Lee1, T. W. Tan1, H. P. Too1 and S. Ranganathan1,2. 1Department of Biochemistry and 2Department of Biological Sciences, National University of Singapore.

bernett@bic.nus.edu.sg

 

MGAlign is a program for aligning mRNA sequences to genomic sequences using a novel algorithm. MGAlign is available at http://origin.bic.nus.edu.sg/mgalign/index.html.

Long Abstract

 

 

201B. A Tool for the Investigation of Conserved Noncoding Sequences.

Choudhuri, J., Schmitt-John, T. and Giegerich, R. Bielefeld University, Faculty of Technology.

jomuna@techfak.uni-bielefeld.de

 

Comparative genomics is an effective way to determine conserved noncoding sequences. Our system identifies potential conserved noncoding sequences in two input sequences, like bacterial genomes or eucaryotic chromosomes. It automates several computational steps and stores the data in a database.

Long Abstract

 

 

202B. Smith-Waterman vs. BLAST: a Comparison using the WP76 C. elegans Protein Sequence Database.

Steffen Durinck1, Luc Ducazu2 and Yves Engelborghs1. 1K.University Leuven and 2DevGen, zwijnaarde.

stefdurinck@hotmail.com

 

The Smith-Waterman and BLAST algorithms were compared to each other. Each of the 20.399 sequences of the C. elegans WP76 database was used as query sequence to find similar sequences in the database. The output from both algorithms was analyzed.

Long Abstract

 

 

203B. An EM Approach to Protein Multiple Sequence Alignment by the Partial Order Alignment (POA) Algorithm and Information Theory Based Scoring Scheme.

Catherine Grasso, Christopher Lee and Golan Yona. Department of Computer Science, Cornell University.

cgrasso@cam.cornell.edu

 

We present an EM-based version of the POA algorithm that progressively builds up a multiple sequence alignment (MSA) by recursively aligning graph representations of MSAs using pairwise dynamic programming. The program uses an information theory based scoring scheme, and iteratively optimizes the guide tree and the resulting MSA.

Long Abstract

 

 

204B. Applying Protein Multiple Structural Alignments to Sequence Search and Fold Recognition.

Eric D. Scheeff, Ilya N. Shindyalov and Philip E. Bourne. San Diego Supercomputer Center, University of California at San Diego, La Jolla, California, USA.

escheeff@sdsc.edu

 

We are combining protein multiple structural alignments generated with the Combinatorial Extension method with sequence search using Hidden Markov Models (HMMs). Superfamily-level multiple alignments are used to combine sequence alignments generated using HMMs, and a global HMM for the superfamily is constructed. Preliminary benchmarking suggests the method shows promise.

Long Abstract

 

 

205B. A Comparative Genomics via Wavelet Analysis for Closely Related Bacteria.

Jiuzhou Song1, Tony Ware2, and Shu-Lin Liu1. Departments of 1Microbiology and Infectious Diseases and 2Mathematics and Statistics, University of Calgary, 3330 Hospital Dr. NW Calgary, AB, T2N 4N, Canada.

songj@ucalgary.ca

 

Comparative genomics has been a valuable method for extracting and extrapolating genome information. We propose using wavelet analysis to do comparative genomics, the global comparison gives the difference at a quantitative level. The strategy is described in detail by comparisons of two closely related strains.

Long Abstract

 

 

206B. Computational Identification of a Spo0A-phosphate Regulon that is Essential for Cellular Differentiation and Development in Gram-Positive bacteria.

Jiajian Liu and Gary D. Stormo. Department of Genetics, Washington University Medical School, St. Louis, Missouri 63110, USA.

jjliu@ural.wustl.edu

 

A comparative sequence analysis combined with statistical test of microarray expression profile was applied to identify the Spo0A-phosphate regulon that is essential for the cellular differentiation and development in Gram-Positive, endo spore-forming bacteria.

Long Abstract

 

 

 

Predictive Methods.

 

207B. Computational Screening for Peroxisomal Proteins Using New Methods, Old Methods, and Human Expertise. 111

208B. Predictive Modeling of Hepatotoxicants Using Microarrays and a Linear Discriminant Modeling Approach. 112

209B. Genome-Wide Searching for Pseudouridylation guide snoRNAs. 112

210B. Promoter Classification Based on Contextual Word Similarity Clustering Followed by Positional Motif Filtering. 112

211B. In silico Identification of Presumptive Regulatory Sites in Unaligned Promoter Sequences by a Combined Enumerative-Alignment Method. 112

212B. Computational Analysis of Voltage-gated Potassium Channels. 112

213B. Learning from Probabilistic Segmentation : A dictionary-based approach to promoter prediction in the human genome. 113

214B. Cancer, SNPs and Machine Learning. 113

215B. Learning to Predict Protein Function from Sequence. 113

216B. Improving Regulatory Element Prediction with Weakly Labeled Training Data. 113

217B. Proteome Analyst - High Throughput Protein Function Prediction. 113

218B. Analysis of Amino Acid Type Statistics of Gap Regions: Towards Improvement of Multiple Sequence Alignment. 114

219B. IslandPath: Integrated Analysis and Display of Features of Genomic Islands in Prokaryotes. 114

220B. Automated Diagnosis of IEMs using High-Throughput Quantitative NMR Spectroscopy. 114

221B. Sequence Prediction for Novel Proteins by Applying Multiple Proteases. 114

222B. A Key Residue Approach for the Definition of Ligand Binding Interfaces. 114

223B. A New Approach to Identify Conserved RNA Secondary Structural Motifs in Homologous Sequences. 115

224B. Prediction of Co-regulated Bacterial Genes by Phylogenetic Footprinting. 115

225B. Side-Chain Conformation Prediction with Support Vector Machines. 115

226B. Transmembrane Beta-barrel Statistics and Prediction Approach. 115

227B. Prediction of Transcription Regulatory Sites with Dependency-Reflecting Decomposition Model. 115

228B. Multipoint Disequilibrium Mapping with DMLE+. 116

229B. Signal Integration in Transcription: Use of Synthetic promoters for independent measures of RNAP-sigma factor activity. 116

230B. The use of Cytosolic and Cell Surface Functional Domains to Correct Sidedness Errors in TMHMM. 116

231B. Effective Discrimination of Native Protein Structures using an Atom-Atom Contact Potential. 116

232B. No poster. 116

233B. A Tree Kernel to Analyze Phylogenetic Profiles. 116

234B. Single Nucleotide Polymorphisms: Novel Methods to predict their Functional Effects using Decision Trees and Support Vector Machines. 117

235B. No poster. 117

 

207B. Computational Screening for Peroxisomal Proteins Using New Methods, Old Methods, and Human Expertise.

Olof Emanuelsson1, Arne Elofsson1, Gunnar von Heijne1 and Susana Cristobal2. 1Stockholm Bioinformatics Center, Stockholm University and 2Cell and Molecular Biology, Uppsala University.

olof@sbc.su.se

 

We have developed a method to predict peroxisomal location based on sequence. We use our method to screen seven eukaryotic genomes for peroxisomal candidates, and use Pfam-based phylogenetic profiling to reinforce predictions.

Long Abstract

 

 

208B. Predictive Modeling of Hepatotoxicants Using Microarrays and a Linear Discriminant Modeling Approach.

K R Johnson, A L Castle, M W Porter, M Elashoff and D Mendrick. Gene Logic, Inc., Gaithersburg, MD, USA.

kjohnson@genelogic.com

 

The Affymetrix GeneChip RGU_34 series microarray was used to collect expression information from over 1,000 rat liver samples treated with known hepatotoxicants and vehicle controls. Linear Discriminant Analysis (LDA) was used to identify over 500 predictive general hepatotoxicity markers and more than 1,000 compound/class-specific markers. Results show successful prediction of hepatotoxicity when using the markers identified.

Long Abstract

 

 

209B. Genome-Wide Searching for Pseudouridylation guide snoRNAs.

Peter Schattner and Todd M. Lowe. Center for Biomolecular Science and Engineering, University of California, Santa Cruz.

schattner@cse.ucsc.edu

 

We have developed a program for genome-wide screening for pseudouridylation guide snoRNAs. The program implements a probabilistic model of H/ACA snoRNAs including primary sequence motifs, target-RNA complementary sequences, stem-loop structure data and interval length distributions. We present initial results of applying this program in the search for pseudouridylation guide snoRNAs in the Saccharomyces cerevisiae genome.

Long Abstract

 

 

210B. Promoter Classification Based on Contextual Word Similarity Clustering Followed by Positional Motif Filtering.

Pieter J. De Bleser and Frans Van Roy. Bioinformatics Core, Department for Molecular Biomedical Research, Ghent University and Flemish Institute for Biotechnology (VIB), Ghent, Belgium.

pieterdb@dmb.rug.ac.be

 

We address the question whether, given a set of genes, we can predict which genes have the potential to be co-expressed based on the information contained in their upstream sequences. First, all possible clusters are constructed, based on contextual word similarities. Next, the clusters are eliminated that do not exhibit clear co-localized motifs. A nice agreement of predicted with real cluster data is observed.

Long Abstract

 

 

211B. In silico Identification of Presumptive Regulatory Sites in Unaligned Promoter Sequences by a Combined Enumerative-Alignment Method.

Pieter J. De Bleser and Frans Van Roy. Bioinformatics Core, Department for Molecular Biomedical Research, VIB-Ghent University, Ghent, Belgium.

pieterdb@dmb.rug.ac.be

 

The methods for the detection of consensus patterns common to the promoter sequences of co-expressed genes, can be divided into enumerative (word counting) and alignment (e.g. MEME, Gibb's) methods, both with their own limitations. We present a new, combined approach that eliminates most of the limitations of the individual methods.

Long Abstract

 

 

212B. Computational Analysis of Voltage-gated Potassium Channels.

Bin Li1, Steven D. Buckingham1, Jonathan Schaeffer1, Andrew N. Spencer2 and Warren J. Gallin1. 1Department of Biological Sciences and 2Department of Computing Sciences, University of Alberta, Edmonton, Canada.

bli@ualberta.ca

 

Voltage-gated potassium channels (VKC) are structurally and functionally diverse in all animals. We are developing computational approaches to extract structure-functional information for this protein family. Our database, VICDB, contains 319 VKC entries, and hyperlinks to over 20 biological databases. Several computational tools are being developed to work with VICDB data.

Long Abstract

 

 

213B. Learning from Probabilistic Segmentation : A dictionary-based approach to promoter prediction in the human genome.

Tom Hadfield and Hao Li. UCSF.

tomh@itsa.ucsf.edu

 

We present a novel approach to unsupervised learning and pattern recognition problems, based around a free energy, expectation maximization approach which builds extensively on the Mobydick algorithm of Li, Bussemaker and Siggia. Of particular interest is our methods' application to the problem of promoter prediction in the Human genome.

Long Abstract

 

 

214B. Cancer, SNPs and Machine Learning.

Brett Poulin1, Jennifer Listgarten2, Russell Greiner1, Sambasivarao Damaraju2, Thomas Kolacz1, Xiang Wan1, David Wishart3 and Brent Zanke2. 1Department of Computing Science, University of Alberta, Edmonton, 2Cross Cancer Institute, Alberta Cancer Board (www.polyomx.org), Edmonton and 3Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton.

poulin@cs.ualberta.ca

 

Single nucleotide polymorphisms (SNPs) are the most common genetic variations in humans. An

individual's susceptibility to disease or response to a drug may be explained by analyzing populations for large numbers of SNPs. We discuss the applicability, accuracy and efficiency of various machine-learning techniques to analyze complex biological data. http://www.polyomx.org.

Long Abstract

 

 

215B. Learning to Predict Protein Function from Sequence.

Zhiyong Lu, Xiaomeng Wu and Russ Greiner. Department of Computing Science, Univeristy of Alberta, Edmonton, T6G 2E8, Canada.

zhiyong@cs.ualberta.ca

 

Proteome Analyst (PA) is a web-based tool designed to autonomously infer the functional characterics of each protein in a proteome. We investigate various machine learning algorithms to obtain the optimal proteome function prediction.

Long Abstract

 

 

216B. Improving Regulatory Element Prediction with Weakly Labeled Training Data.

Joseph Bockhorst1 and Mark Craven2. Departments of 1Computer Sciences and 2Biostatistics and Medical Informatics, University of Wisconsin.

joebock@cs.wisc.edu

 

We consider learning predictive models for promoters, terminators and transcription units from training examples. Well-defined relationships among certain instances in each class enable us to acquire "weakly labeled" promoter and terminator instances from known transcription units. Our experiment results indicate these examples can lead to significant improvements in accuracy.

Long Abstract

 

 

217B. Proteome Analyst - High Throughput Protein Function Prediction.

Roman Eisner, Brett Poulin, Duane Szafron, Paul Lu, Russell Greiner, Bahram Habibi-Nazhad and David Wishart. Department of Computing Science and Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta.

eisner@cs.ualberta.ca

 

Proteome Analyst (PA) is a web-based tool for predicting the functions of each sequence in a proteome. For example, one or more classification-based function predictors can be applied to any sequence. More importantly, PA users can easily train their own custom classification-based predictors and apply them to their sequences - www.cs.ualberta.ca/~bioinfo/PA.

Long Abstract

 

 

218B. Analysis of Amino Acid Type Statistics of Gap Regions: Towards Improvement of Multiple Sequence Alignment.

James O. Wrabl and Nick V. Grishin. HHMI, University of Texas Southwestern Medical Center, Dallas TX 75390-9038.

jowrabl@chop.swmed.edu

 

The results of a large-scale analysis of the gap regions in structually similar pairs of proteins are reported. Strategies are discussed to use this information for improvement of gap penalties for pairwise (or multiple) sequence alignment and to predict regions of single sequences likely (or unlikely) to contain insertions or deletions.

Long Abstract

 

 

219B. IslandPath: Integrated Analysis and Display of Features of Genomic Islands in Prokaryotes.

William Hsiao and Fiona S. L. Brinkman. Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, Canada.

wwhsiao@sfu.ca

 

Horizontal transfer of clusters of genes, or genomic islands, can be a major driving force in microbial evolution and pathogenesis. We have analyzed currently known genomic islands and have developed a web-based application that facilitates the identification of such islands in bacterial and archaeal genomes.

Long Abstract

 

 

220B. Automated Diagnosis of IEMs using High-Throughput Quantitative NMR Spectroscopy.

Ajit Singh1, Russ Greiner1, Victor Dorian2 and Brent Lefebvre2. 1Dept. of Computing Science, University of Albert and 2 Chenomx Inc.

ajit@cs.ualberta.ca

 

We examine the utility of high throughput Nuclear Magnetic Resonance (NMR) data for the diagnosis of inborn errors of metabolism. We propose a two-layer Bayesian network with the novel addition of Bayesian error bars, which provide confidence intervals for our diagnoses.

Long Abstract

 

 

221B. Sequence Prediction for Novel Proteins by Applying Multiple Proteases.

Seok-Hyun Moon1, Doheon Lee1, Kwang-Hyung Lee1, Kwang-Hwi Cho2, Soon-I Cho3 and Yonggwan-Won3. 1Department of Biosystems, KAIST, 2National Cancer Institute, National Institute of Health, Bethesda, Malyland, U.S.A. and 3Department of Computer Engineering, Chonnam Nat’l Univ., Gwangju, Korea.

shmoon@if.kaist.ac.kr

 

In this paper, we present an alternative sequencing algorithm for the novel protein sequence prediction from the data of mass spectrometry with multiple proteases. In the experiments, the algorithm would be applied for the protein entries of SWISS-PROT database.

Long Abstract

 

 

222B. A Key Residue Approach for the Definition of Ligand Binding Interfaces.

Waqas Ahmed Awan.UK MRC HGMP Resource Center, Genome Campus, Hinxton Hall, Hinxton, Cambridgeshire, CB10 1RQ, UK.

wawan@hgmp.mrc.ac.uk

Funky is a program that can identify key ligand binding sites based on interatomic distances calculated from PDB structures. We have used this program to create a database of key ligand binding residues mapped to SCOP domains within the PDB. We are using this data in an attempt to characterise ligand binding surfaces and develop a scheme to predict such interactions in novel sequences.

Long Abstract

 

 

223B. A New Approach to Identify Conserved RNA Secondary Structural Motifs in Homologous Sequences.

Yongmei Ji and Gary D. Stormo. Department of Genetics, Washington University Medical School, St. Louis, MO 63110, USA.

yji@ural.wustl.edu and stormo@ural.wustl.edu

We present a new approach to predict conserved RNA secondary structural motifs in regulatory regions of homologous genes. It is based on stem comparison and applies a graph theoretical approach. It allows detection of pseudoknot structures. Our results have shown that this method is promising.

Long Abstract

 

 

224B. Prediction of Co-regulated Bacterial Genes by Phylogenetic Footprinting.

Yuko Makita1,3, Goro Terai2,3 and Kenta Nakai3. 1Tokyo University of Agriculture and Technology, Tokyo, Japan, 2INTEC Web and Genome Informatics Corp., Tokyo, Japan and 3Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan.

makita@ims.u-tokyo.ac.jp, terai@ims.u-tokyo.ac.jp and knakai@ims.u-tokyo.ac.jp

 

With the phylogenetic footprinting analysis of closely related bacterial genomes, we could extract conserved upstream elements of Mycoplasma genitalium, Chlamydia trachomatis, and Mycobacterium tuberculosis. Based on their sequence similarity, we could even predict regulons of these bacteria. Our data will be useful for analyzing the evolutionary relationship between transcriptional networks.

Long Abstract

 

 

225B. Side-Chain Conformation Prediction with Support Vector Machines.

Chi-Hung Tsai, Huai-Kuang Tsai and Cheng-Yan Kao. Department of CSIE, National Taiwan University.

d7526010@csie.ntu.edu.tw

 

We use the support vector machine (SVMs) to predict side-chain conformation directly from protein sequences. The results of predicted side-chain conformation can provide useful side-chain information during ab initio prediction of protein structure. Experimental results indicate that our approach performs very robustly and is very competitive with the naïve method.

Long Abstract

 

 

226B. Transmembrane Beta-barrel Statistics and Prediction Approach.

Henry Bigelow and Burkhard Rost. Biochemistry and Molecular Biophysics Department, Columbia University, New York, New York, USA.

bigelow@maple.bioc.columbia.edu

 

A specialized Bayesian Network architecture is proposed for the prediction of transmembrane beta-barrel proteins from sequence. The modular structure of the model combines predictions for individual structural elements. Sequence statistics of these elements are presented for the barrel proteins and, for comparison, similar elements in the PDB.

Long Abstract

 

 

227B. Prediction of Transcription Regulatory Sites with Dependency-Reflecting Decomposition Model.

Ki-Bong Kim and Kiejung Park. SmallSoft Co., Ltd., Daeduk BioCommunity, Jonmin-Dong 461-6, Yusung-Gu, Daejeon305-811, Korea (ROK).

 

kbkim@bioinfo.smallsoft.co.kr

 

We make a new model of core transcription regulatory sites including the –10 region and transcription initiation site, that is, Dependency-Reflecting Decomposition Model (DRDM), which captures the most significant dependencies between positions (allowing for non-adjacent as well as adjacent dependencies). The method can be used effectively to predict transcription regulatory sites in long genomic contigs.

Long Abstract

 

 

228B. Multipoint Disequilibrium Mapping with DMLE+.

Jeff Reeve and Bruce Rannala. University of Alberta, 8-08 Medical Sciences Building, Edmonton, Alberta, T6G 2H7, Canada.

jreeve@ualberta.ca

 

The program DMLE+ uses multipoint disequilibrium mapping to estimate mutation location and age, using Markov chain Monte Carlo methods. Bayesian inference allows the incorporation of prior information from an annotated DNA sequence. Other features, such as haplotype inference from genotype data, are also available. It can be downloaded from http://dmle.org.

Long Abstract

 

 

229B. Signal Integration in Transcription: use of Synthetic Promoters for Independent Measures of RNAP-sigma Factor Activity.

Kanti Pabbaraju and Michael G. Surette. Microbiology and Infectious Diseases, University of Calgary.

pabbaraj@ucalgary.ca

 

The role of sigma factors in trancription regulation independent of other activators and repressors was studied using synthetically designed promoters cloned into a lux expression vector. This system enables us to monitor continuous gene expression profiles under a variety of environmental conditions from the seven E.coli sigma factors.

Long Abstract

 

 

230B. The use of Cytosolic and Cell Surface Functional Domains to Correct Sidedness Errors in TMHMM.

Emily Wei Xu, Dan Brown and Paul Kearney. School of Computer Science, University of Waterloo, Canada.

ewxu@uwaterloo.ca

 

We study sequence patterns of transmembrane proteins which are useful as predictors of whether a sequence of amino acids is on the cell surface or on the cytoplasmic side and discuss how to incorporate this information into a HMM based on the TMHMM predictor of Krogh et al.

Long Abstract

 

 

231B. Effective Discrimination of Native Protein Structures using an Atom-Atom Contact Potential.

Brendan McConkey. Department of Plant Sciences, Weizmann Institute of Science, Rehovot 76100, Israel.

brendan.mcconkey@weizmann.ac.il

 

A method for quantifying inter-atomic contacts within proteins has been developed based on a Voronoi tessellation procedure. A scoring function was generated from a statistical assessment of contacts within known protein structures, and was effective at distinguishing native proteins from established decoy sets in 97% of the cases tested.

Long Abstract

 

 

232B. No poster.

Long Abstract

 

 

233B. A Tree Kernel to Analyze Phylogenetic Profiles.

Jean-Philippe Vert. Bioinformatics Center, Institute for Chemical Research, Kyoto University.

Jean-Philippe.Vert@mines.org

 

We present an approach to measure the similarity between phylogenetic profiles, which consists of mapping the profiles to a high-dimensional vector space whose dimensions correspond to biologically relevant features, and to work implicitly in that space. A SVM trained to predict gene functions benefits from this representation.

Long Abstract

 

 

234B. Single Nucleotide Polymorphisms: Novel Methods to predict their Functional Effects using Decision Trees and Support Vector Machines.

Vidhya Gomathi Krishnan and David Robert Westhead. School of Biochemistry and Molecular Biology, University of Leeds, Leeds, LS2 9JT, UK.

bmbkv@bmb.leeds.ac.uk

 

A comparative study of two different Machine Learning tools is used to create a novel method to predict the functional effect of non-synonymous substitutional Single Nucleotide Polymorphisms of Caenorhabditis elegans genome. The prediction results from Decision Trees (C4.5) and Support Vector Machines (SVM ) are presented here. Some of the intricacies involved in these two tools are discussed.

Long Abstract

 

 

235B. No poster.

 

 

New Frontiers.

 

236B. Identifying Protein Sequence Tags (PSTs): Reducing Complexity of Peptide Mixtures for MS Analysis. 117

237B. Atomic Reconstruction of Metabolism. 117

238B. Developing a Lead Discovery Informatics System. 117

239B. Computational Studies on Amino Acid Transport Pathway in Yeast. 118

240B. Metabolomics of E. Coli Using High-throughput Quantitative 1H-NMR Spectroscopy. 118

241B. DORA: A Database for Comprehensive Tumor Profiling. 118

242B. A Technology for Integration of Databases with Common Subject Domain. 118

243B. E-Neuron Project: A Kinetic Simulation of Neurite Outgrowth Using the E-CELL system. 118

244B. From GARD to Grid - Simulation of a Protocell. 119

245B. Creating an Online Dictionary of Abbreviations from MEDLINE. 119

246B. GC Content is correlated with Protein Nitrogen Content. 119

247B. The Dynamics of the Interactions Between Solid Tumors and Lymphocytes. 119

248B. A Numerical Study of a Porous Media Model for Water Transport and Drug Diffusion through the Stratum Corneum. 119

 

236B. Identifying Protein Sequence Tags (PSTs): Reducing Complexity of Peptide Mixtures for MS Analysis.

R. Moraga, U. Bauer, C. Hamon and K. Kuhn. Xzillion GmbH & CoKG, Bioinformatics/PST Technology.

Roger.Moraga@xzillion.com

 

Identifying Protein Sequence Tags: PST is a novel approach to the analysis of complex protein mixtures, in which the number of peptides obtained from enzymatic digestion is reduced to a singular representative fragment for each protein. Those fragments are then identified by comparing simple MS data to a pre-processed database.

Long Abstract

 

 

237B. Atomic Reconstruction of Metabolism.

Masanori Arita. Computational Biology Research Center, Koto-ku Aomi 2-41-6, Tokyo, 112-0002, Japan.

m-arita@aist.go.jp

 

ARM is a project to provide a database and its management tools of bacterial metabolism in an atomic scale. Three Java programs are provided to analyze our curated metabolic data for E.coli and B.subtilis. They can be accessed freely at http://www.metabolome.jp.

Long Abstract

 

 

238B. Developing a Lead Discovery Informatics System.

Kaisheng Chen, Jeff Janes, Steve Wilkens, Dimitri Petrov, Andrey Santrosyan, Shumei Jiang, Max Chang, Kathy Zhou, Robert Downs and Yingyao Zhou. Genomics Institute of the Novartis Research Foundation.

zhou@gnf.org

 

Genomics Institute of the Novartis Research Foundation (GNF) is developing a web-based lead discovery informatics system. The system hosts one million druggable compounds and over 16 million QC-ed screening data points. This integrated system provides data visualization and analysis tools to facilitate GNF lead discovery process.

Long Abstract

 

 

239B. Computational Studies on Amino Acid Transport Pathway in Yeast.

Yu Chen1, Yutao Liu1, Keith M. Goldstein2, Jeffrey M. Becker1,2, Ying Xu1,3 and Dong Xu1,3#. 1 UT-ORNL Graduate School of Genome Science and Technology, Knoxville, TN. 2 Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN. 3 Protein Informatics Group, Life Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN. #Corresponding author

yum@ornl.gov

 

In this poster, we present a consensus approach (with gene expression/regulatory region analysis, protein-protein interaction analysis and protein sequence analysis and predictions) to model the amino acid/peptide signal transduction pathway in yeast. Based on this approach, we constructed the first global model for the pathway.

Long Abstract

 

 

240B. Metabolomics of E. coli using High-throughput Quantitative 1H-NMR Spectroscopy.

Brent Lefebvre1, Ashenafi Abera2, Noah Epstein1, Richard Rothery3, Dr. Joel Weiner3, Dr. David Wishart1,2 and Dr. Russell Greiner4. 1Chenomx Inc., Edmonton AB; Departments of 2Pharmacy and Pharmaceutical Sciences, 3Biochemistry and 4Department of Computing Science. University of Alberta.

blefebvre@chenomx.com

 

Metabolomics or metabolic profiling involves the analysis of metabolites of a cell or organism such as E. coli. NMR spectroscopy is well suited for this task, with the help of advanced software developed by Chenomx Inc. Automated, quantitative and high-throughput chemical analysis of complex biofluid mixtures is now possible.

Long Abstract

 

 

241B. DORA: A Database for Comprehensive Tumor Profiling.

Adrian Driga1, David Wishart2 and Brent Zanke1. 1Cross Cancer Institute, Alberta Cancer Board, Edmonton, Canada and 2Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, Canada.

adriandr@cancerboard.ab.ca

 

DORA (Database for Online Retrieval and Analysis) is the database in which clinical, microarray, SNP, and metabonomic information is integrated for every cancer patient enrolled into the PolyomX program (www.PolyomX.com). DORA is a flexible platform that our software uses to correlate patient molecular data with clinical outcomes of cancer treatments.

Long Abstract

 

 

242B. A Technology for Integration of Databases with Common Subject Domain.

A. Pisarev, M. Blagov and M. Samsonova. Bioinformation Systems Lab, St.Petersburg State Technical University, St.Petersburg, Russia.

pisarev@fn.csa.ru

 

We present a novel approach to the integrated retrieval of molecular biology data, which consists of an adaptive natural language interface with application of multiagent technology. Our approach permits to integrate any information resources, which have a common data domain. The implemented prototype is available at http://urchin.spbcas.ru/NLP/NLP.htm.

Long Abstract

 

 

243B. E-Neuron Project: A Kinetic Simulation of Neurite Outgrowth Using the E-CELL system.

Mari Takada1, Shinichi Kikuchi1, Michiko Abe1, Noriyuki Kitagawa1, Kazuhide Sekiyama1, Kohtaro Takei1,2 and Masaru Tomita1. 1Institute for Advanced Biosciences, Keio University and 2Department of Physiology, Toho University School of Medicine.

s01522mt@sfc.keio.ac.jp

 

Formation of neurocircuitry depends on control of neurite outgrowth. To systematically understand its molecular mechanisms, we developed and simulated a computer model of signal transduction using E-CELL System, our cell simulator. We found intracellular calcium within growth cone acting as key regulator for balance of complicated pathways and neurite outgrowth.

Long Abstract

 

 

244B. From GARD to Grid - Simulation of a Protocell.

Barak Shenhav and Doron Lancet. Department of Molecular Genetics, Weizmann Institute of Science, Israel.

barak.shenhav@weizmann.ac.il

 

A grid based simulation of prebiotic chemistry derived from the Graded Autocatalytic Replication Domain (GARD) model for early evolution is presented. A novel stochastic chemistry algorithm is portrayed. The simulation is a step towards in silico emergence of a protocell. With minor changes a whole cell simulation is in sight.

Long Abstract

 

 

245B. Creating an Online Dictionary of Abbreviations from MEDLINE.

Jeffrey T Chang1, Hinrich Schutze, Ph.D.2 and Russ B Altman, M.D., Ph.D.1. 1Stanford Biomedical Informatics and 2Novation Biosciences.

jchang@smi.stanford.edu

 

To cope with the rapid introduction of new abbreviations in biomedical text, we have developed a statistical learning algorithm to identify them automatically, achieving 84% recall at 81% precision. We scanned all of MEDLINE and found 781,632 high-scoring abbreviation definitions. We are making these available publically at http://abbreviation.stanford.edu/.

Long Abstract

 

 

246B. GC Content is correlated with Protein Nitrogen Content.

Bryan A. Keith and Jeremy Gibson-Brown. Washington University in St. Louis, Department of Biology.

bakeith@artsci.wustl.edu

 

Realising that a GC base pair contains more nitrogen than AT, a survey of bacterial sequences found that DNA GC content is correlated with protein nitrogen content (r^2 = 0.68). Interestingly, as protein nitrogen content increases, %Arg increases (r^ = 0.72), and %Lys & %Asn decrease (r^2 =0.76 & 0.62), suggesting Lys and Asn substitute for Arg in low nitrogen conditions.

Long Abstract

 

 

247B. The Dynamics of the Interactions Between Solid Tumors and Lymphocytes.

Amy H. Lin. UCSF.

amyhlin@math.vanderbilt.edu

 

A predator-prey model of solid tumor cells and lymphocytes offers qualitative description of behavior and allows for an interpretation with immunotherapy.

Long Abstract

 

 

248B. A Numerical Study of a Porous Media Model for Water Transport and Drug Diffusion through the Stratum Corneum.

Tatiana Marquez Lago, Bob Russell, David Muraki and Diana Allen. Dept. of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6.

ttm@math.sfu.ca

 

Stratum corneum, the uppermost set of layers in skin, is mainly responsible for skin's impermeability. In this work an analytical approach and a numerical porous media model for water transport and drug diffusion is presented. As such approach had never been used before, a discussion of its capabilities and extensions is included.

Long Abstract