Posters.

 

Microarrays. A. B.

Systems Biology. A. B.

Functional Genomics. A. B.

Structural Biology. A. B.

Data Visualization. A. B.

Phylogeny and Evolution. A. B.

Data Mining. A. B.

Genome Annotation. A. B.

Sequence Comparison. A. B.

Predictive Methods. A. B.

New Frontiers. A. B.

 

 

 

Session A.

 

 

Microarrays.

 

1A. Normalization Methods for cDNA Microarrays. 17

2A. Comparative Data Mining for Microarrays: A Case Study Based on Multiple Myeloma. 17

3A. A Model of Congruence in E. coli, from Microarray Data to Literature Knowledge. 18

4A. Decision Tree Learning-Based Characterization of the Global Effects of Cocaine Abuse on Gene Expression in the Rat Brain. 18

5A. Performance Analysis of an Optimal Estimator of Gene Expression Ratio. 18

6A. Density Estimator Self-Organizing Map for Gene Expression Analysis. 18

7A. Analyzing Single-slide Microarray Gene Expression Data via a Bayesian Approach. 19

8A. FOREL Clustering Algorithms for Functional Genomics. 19

9A. Nonlinear Correlation for the Analysis of Gene Expression Data. 19

10A. An Empirical Bias Model for Normalization of Microarray Data. 19

11A. EMMA - ESTs Meet Microarrays. 19

12A. A Gene Expression Database for Immune Cells Transcriptomes. 20

13A. General Optimisation Approach for Normalising cDNA Microarray Data with Replicates. 20

14A. A High Throughput Pipeline for Validating Novel Splice Variants Discovered Using Whole-Genome Junction Arrays. 20

15A. LIMS for DNA Sequence and Microarray Analysis Based on AceDB. 20

16A. Application of Resampling-Based Multiple Testing in the Analysis of Gene Expression in Human Peripheral Nerve Injury. 20

17A. Variance Stabilization, Normalization, and Power Calculations of Affymetrix Microarray Data with Application to Autism. 21

18A. GenMAPP: A Tool for Viewing and Analyzing Microarray Data on Biological Pathways. 21

19A. Use of a Native XML-Based Database and Emerging Public Standards (MAGE-ML, MIAME) in Gene Expression Array Analysis. 21

20A. Quantitative Treatment of cDNA Microarray Data. 21

21A. ROSO : A Software to Search Optimized Oligonucleotide Probes for Microarrays. 21

22A. Expressionist Refiner - a Software Solution for Assessment of Quality and Correction of Gene Expression Data. 22

23A. Exploration of the Expression and Functional Annotation of Genes Identified by Representational Difference Analysis and Global Microarrays. 22

24A. Optimal Design of Oligos for Micro Array Gene Expression Profiling. 22

25A. On the Integration of Normalization Steps and SAM in cDNA Microarray Data Analysis. 22

26A. Partially Supervised Clustering: A Useful Tool for Investigating Coexpression of Gene Microarray Data. 23

27A. Osprey: An Application for a Wide Range of Oligonucleotide Design Tasks. 23

28A. An Image-Based Visualization of Microarray Features and Classification Results. 23

29A. Preprocessing Microarray Data to Improve Power of Multiple Testing. 23

30A. Classification of Spot Profile in Microarray Image Data by using Statistical Characteristics. 23

31A. R-MDAT: Development of GUI-based Microarray Data Analysis Tool Using R-Language. 24

32A. Client-Server Solution for Large Scale Gene Expression Data Mining. 24

33A. A Comparison of Clustering Techniques for Gene Expression Data. 24

34A. Can Molecular Mechanisms of Biological Processes Be Extracted from Expression Profiles? Case Study: Endothelial Contribution to Tumor-Induced Angiogenesis. 24

35A. MAC: A Dynamic Program that Tracks Samples between Microplates and Microarrays. 24

36A. Immunotranscriptomics: Analysis of Gene Expression in the Immune System. 25

37A. Filtering and Normalization Strategies for Microarray Generated Gene Expression Profiles. 25

38A(i). Blind Gene Classification : An ICA-based Gene Classification/Clustering Method. 25

38A(ii) Recovering Reproducible and Biologically Valuable Clusters from Noisy Array Data.

38A(iii) Evaluation of Data Reduction and Testing Methods for Oligonucleotide Arrays.

 

 

1A. Normalization Methods for cDNA Microarrays.

James J. Chen, Yi-Ju Chen and Chen-an Tsai. National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas, 72079, USA.

jchen@nctr.fda.gov

 

We present normalization methods for different cDNA microarray experiments. We consider both two-color (Cy3 and Cy5 fluorescence) and single color (or radioisotope) array images. A data set from a dye-swap experiment and another data set with several treatments and replicates are presented to illustrate the methods.

Long Abstract

 

 

2A. Comparative Data Mining for Microarrays: A Case Study Based on Multiple Myeloma.

David Page1,2, Fenghuang Zhan3, James Cussens4, Michael Waddell2, Johanna Hardin5, Bart Barlogie6 and John Shaughnessy, Jr.3. 1Dept. of Biostatistics and Medical Informatics, 2 Dept. of Computer Sciences University of Wisconsin Madison, WI 53706, 3Lambert Laboratory of Myeloma Genetics University of Arkansas for Medical Sciences Little Rock, AR 72205, 4Computer Science Dept. University of York Heslington, York, YO10 5DD, United Kingdom, 5Southwest Oncology Group Fred Hutchinson Cancer Research Center, Seattle, WA 98109 and 6Myeloma Institute for Research and Therapy, University of Arkansas for Medical Sciences, Little Rock, AR 72205.

mwaddell@biostat.wisc.edu

 

These studies compare SVMs, Bayesian networks, decision trees, boosted decision trees and voting (ensembles of decision stumps) on a new microarray data set for cancer (multiple myeloma) with over 100 samples. They provide evidence for several important lessons about how these techniques should be used for mining microarray data.

Long Abstract

 

 

3A. A Model of Congruence in E. coli, from Microarray Data to Literature Knowledge.

Rosa Maria Gutiérrez-Ríos1, David Rosenblueth2, Araceli Huerta-Moreno1 and Julio Collado-Vides1. 1CIFN, UNAM, A.P. 565-A, Cuernavaca, Morelos 62100, Mexico and 2IIMAS, UNAM, Ciudad Universitaria C.P. 04510, Mexico.

rmaria@cifn.unam.mx

 

Using the knowledge available in RegulonDB on regulation of transcription in E.coli, we evaluated the congruence of transcriptome experiments under different conditions. This qualitative comparison is based on a discrete model of transcriptional regulation involving direct and indirect effects following the regulatory network of interactions.

Long Abstract

 

 

4A. Decision Tree Learning-Based Characterization of the Global Effects of Cocaine Abuse on Gene Expression in the Rat Brain.

Changqing Ma1, Vanathi Gopalakrishnan2, David G. Peters3 and Robert E. Ferrell3. 1Department of Pathology, University of Pittsburgh School of Medicine, 2Department of Medicine, University of Pittsburgh School of Medicine and 3Department of Human Genetics, University of Pittsburgh School of Public Health.

chmst40@pitt.edu

 

A decision tree learning method was applied successfully to the microarray gene expression data obtained from cocaine-treated and normal tissue samples of the rat brain. The learned, highly accurate, and human-understandable model depicted a global change in gene expression among three brain regions in response to an acute dose of cocaine.

Long Abstract

 

 

5A. Performance Analysis of an Optimal Estimator of Gene Expression Ratio.

David Seale and Stephen W. Davies. Institute of Biomaterials and Biomedical Engineering and the Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto.

seale@ecf.utoronto.ca

 

An optimal estimator of gene expression has been developed using a stochastic signal model for DNA microarray images. Through simulation, the performance of the estimator is analyzed and compared with traditional methods of estimating gene expression ratios. We also explore the robustness of the estimator to model inaccuracies.

Long Abstract

 

 

6A. Density Estimator Self-Organizing Map for Gene Expression Analysis.

A.D. Pascual-Montano1, A. Sesto2, A.J. Rodríguez-Sánchez1, M. Navarro2, J.L. Jorcano2 and J.M. Carazo1. 1Biocomputing Unit. Centro Nacional de Biotecnología. Campus Universidad Autónoma de Madrid. Cantoblanco 28049. Madrid, Spain and 2Dept. Molecular and Cell Biology and Gene Therapy. CIEMAT. Av. Complutense 22, 28040 Madrid, Spain.

pascual@cnb.uam.es

 

We describe the application of a new variant of a Self-Organizing Map (KerDenSOM) in the context of microarray data analysis. KerDenSOM is specially designed to find a set of representative code vectors with a probability density as similar as possible to that of the input data. KerDenSOM is available at http://www.engene.cnb.uam.es.

Long Abstract

 

 

7A. Analyzing Single-slide Microarray Gene Expression Data via a Bayesian Approach.

Grace Shieh, T. H. Fan and B.C. Chung. Inst. of Statistical Sci., Academia Sinica, Dept. of Statistics, National Central Univ. and Inst. of Molecular Biology, Academia Sinica.

gshieh@stat.sinica.edu.tw

 

Log-scaled red channel intensities from a single-slide array were fitted to their corresponding green channel intensities via a regression model. The residuals, capturing information of differentially expressed genes were modeled by a mixture distribution, and its posterior distribution was obtained to identify differential expressions.

Long Abstract

 

 

8A. FOREL Clustering Algorithms for Functional Genomics.

Andrey Ptitsyn. Pennington Biomedical Research Center.

ptitsyaa@pbrc.edu

 

A new algorithm for clustering expression profiles has been developed, particularly for analysis of irregularly shaped patterns in multidimensional space. The algorithm combines computational effectiveness with versatility, and accepts a wide variety of distance and cluster quality metrics.

Long Abstract

 

 

9A. Nonlinear Correlation for the Analysis of Gene Expression Data.

Karen M. Bloch1 and Gonzalo R. Arce2. 1DuPont Company, Wilmington, Delaware, USA and 2University of Delaware, Newark, Delaware, USA.

Karen.M.Bloch@usa.dupont.com

 

Gene expression analysis techniques that rely on linear correlation metrics only detect relationships with a large linear component. This poster illustrates the ability to determine the relationships between gene patterns based on nonlinear correlation measurements. Our results indicate that improved clustering of genes can be achieved via the proposed method.

Long Abstract

 

 

10A. An Empirical Bias Model for Normalization of Microarray Data.

Qingwei Zhang1, Nobukazu Ono2, Yoshiyuki Takahara2 and Hiroshi Tanaka1. 1Bioinformatics Dept., Medical Research Institute of Tokyo Medical and Dental University and 2Pharmaceutical Research Laboratories, Ajinomoto Co., Inc.

zhang.com@mri.tmd.ac.jp

 

We propose an empirical bias model in data normalization for microarray data, in which we normalize the original data only by subtraction of background and a constant bias followed by principle component analysis and consequent coordinate system rotation. Data showed that the method is simple and practical.

Long Abstract

 

 

11A. EMMA - ESTs Meet Microarrays.

Michael Dondrup and Alexander Goesmann. Center for GenomeResearch - Bielefeld University.

michael.dondrup@Genetik.Uni-Bielefeld.de

 

We have developed an open source system for efficient storage and analysis of large scale microarray data. The EMMA system is based on an object oriented API-layer that encapsulates a relational (SQL) database compliant to the MIAME standard. Besides the integration of comprehensive normalization and data analysis methods EMMA offers an interface for the GenDB annotation system.

Long Abstract

 

 

12A. A Gene Expression Database for Immune Cells Transcriptomes.

A. Splendiani, C. Vizzardelli, N. Pavelka, M. Pelizzola, M. Capozzoli, F. Granucci and P. Ricciardi-Castagnoli. Univ. Milano Bicocca.

andrea.splendiani@unimib.it

 

We are generating a gene expression database for immune cells transcriptomes. This database is complemented by a collaborative environment and upload facility.

Long Abstract

 

 

13A. General Optimisation Approach for Normalising cDNA Microarray Data with Replicates.

Ilana Saarikko1, Timo Viljanen2, Riitta Lahesmaa3, Tapio Salakoski2 and Esa Uusipaikka4. 1 Turku Centre for Biotechnology, University of Turku, Finland, 2 Department of Information Technology and Turku Centre for Computer Science, University of Turku, Finland, 3Turku Centre for Biotechnology, University of Turku, Åbo Akademi University, Finland and 4Department of Statistics, University of Turku, Finland.

ilana.saarikko@btk.utu.fi

 

We study the normalisation of cDNA microarrays based on replicated experiments. We introduce the normalisation as an optimisation problem. The general target function complies with the most commonly used normalisation methods but also allows more complicated approaches. We evaluate the normalisation by applying the analysis of variance (ANOVA) to the data and calculating the standard deviation of replicated genes.

Long Abstract

 

 

14A. A High Throughput Pipeline for Validating Novel Splice Variants Discovered Using Whole-Genome Junction Arrays.

Patrick Loerch, Chris Armour, Phil-Garrett-Engele, Ralph Santos, Zhengyan Kan, Jason Johnson and Daniel Shoemaker. Rosetta Inpharmatics.

patrick_loerch@merck.com

 

Recent studies indicate that over half of all human genes undergo alternative splicing. A high-throughput microarray-based pipeline was developed to monitor and validate alternative splicing on a genome-wide scale. Here we present in detail the validation strategy to confirm array-based splicing predictions and optimize analysis algorithms.

Long Abstract

 

 

15A. LIMS for DNA Sequence and Microarray Analysis Based on AceDB.

1Ikjin Kim, 2Yoongang Hur, 3Hyunseung Lee, 3Youngsoo Park, 3Changpyo Hong, 3Yong-pyo Rim, and 1Jueson Maeng. 1Department of Life Science, Sogang University, Seoul 121-742, Korea, Department of 2Biology and 3Horticulture, Chungnam National University, Daejon 305-764, Korea.

griffin@sogang.ac.kr

 

LIMS for DNA sequence and microarray analysis based on AceDB are proposed and have been processed with 2,688 Brassica rapa ESTs and 10 DNA microarray experiments. All the analysis tools and information have been integrated in Brassica rapa database upon Arabidopsis genome database, and can be processed through GUI and internet.

Long Abstract

 

 

16A. Application of Resampling-Based Multiple Testing in the Analysis of Gene Expression in Human Peripheral Nerve Injury.

Yuanyuan Xiao1, Donglei Hu1, C. Anthony Hunt1, Mark R. Segal2, Andrew H. Ahn3, Douglas Rabert4 and Lakshmi Sangameswaran4 and Praveen Anand5. 1Dept. of Biopharmaceutical Sciences, University of California, San Francisco, CA, USA, 2Dept. of Epidemiology and Biostatistics, University of California, San Francisco, CA, USA, 3Dept. of Anatomy and Neurology, University of California, San Francisco, CA, USA and 4Roche Bioscience, Palo Alto, CA, USA and 5Dept. of Neurology, Imperial College School of Medicine, Hammersmith Hospital, UK.

yxiao@itsa.ucsf.edu

 

In this first microarray study of human brachical plexus injury, we applied and compared the performance of two recently proposed algorithms for tackling the multiple testing problems in microarrays. We illustrated the use of appropriate multiple testing methods for microarrays in monitoring differential gene expression between different biological states.

Long Abstract

 

 

17A. Variance Stabilization, Normalization, and Power Calculations of Affymetrix Microarray Data with Application to Autism.

Sue Geller, Jeff Gregg, Paul Hagerman and David M. Rocke. University of California, Davis.

scgeller@ucdavis.edu

 

We present a method of transforming (see Durbin, Hardin, Hawkins, and Rocke) and normalizing microarray data to roughly constant variance and normal errors. This is useful since many statistical methods are based these assumptions, as are standard power calculations. The method requires only a few chips/slides of biological replicates and can be thought of as machine calibration.

Long Abstract

 

 

18A. GenMAPP: A Tool for Viewing and Analyzing Microarray Data on Biological Pathways.

Kam D. Dahlquist, Nathan Salomonis, Karen Vranizan, Scott W. Doniger, Steven C. Lawlor, and Bruce R. Conklin. Gladstone Institute of Cardiovascular Disease, University of California, San Francisco.

kdahlquist@gladstone.ucsf.edu

 

GenMAPP (Gene MicroArray Pathway Profiler) is a free, stand-alone computer program for viewing and analyzing gene expression data on MAPPs representing biological pathways or other functional grouping of genes. GenMAPP automatically color-codes the genes on the MAPP according to criteria supplied by the user. GenMAPP is available from http://www.GenMAPP.org.

Long Abstract

 

 

19A. Use of a Native XML-Based Database and Emerging Public Standards (MAGE-ML, MIAME) in Gene Expression Array Analysis.

Ronald Taylor. Center for Computational Pharmacology, School of Medicine, University of Colorado.

ronald.taylor@uchsc.edu

 

The Center for Computational Pharmacology at the University of Colorado has created a database and web site for support of neuroscience research; in particular, to handle microarray data. A novel native XML database approach is used, in combination with the emerging public MIAME and MAGE-ML standards for gene expression data.

Long Abstract

 

 

20A. Quantitative Treatment of cDNA Microarray Data.

Tomokazu Konishi. Biotechnology Institute, Akita Prefectural University.

konishi@agri.akita-pu.ac.jp

 

A common data distribution is confirmed in many types of DNA tips used in microarray experiments. Based on the distribution, data can be processed in a quantitative manner. The processing method also identifies each experiment’s noise level, which determines the limit of signal detection, and provides information for data fidelity.

Long Abstract

 

 

21A. ROSO : A Software to Search Optimized Oligonucleotide Probes for Microarrays.

Nancie Reymond, Hubert Charles and Jean-Michel Fayard. Laboratory of Functional Biology, Insects and Interactions (BF2I), UMR INRA / INSA of Lyon, Villeurbanne, France.

nancie.reymond@jouy.inra.fr

 

The ROSO software helps searcher to design optimal oligonucleotide probes according to several criteria. The software calculates the oligonucleotide specificity and the value of Tm and it identifies the absence of secondary structures. The best probes are finally selected in regard to the localization and the stability criteria.

Long Abstract

 

 

22A. Expressionist Refiner - a Software Solution for Assessment of Quality and Correction of Gene Expression Data.

A. Goryachev, H. Rehrauer, M. Wendt, D. Bittner, J. Nickolenko, S. Bellamy, D. Ferguson, H. Vogel, T. Wormus and P. Norman. GeneData AG, Maulbeerstrasse 46, Basel, BS, CH-4016, Switzerland.

andrew.goryachev@genedata.com

 

Array-based methods are the core of high-throughput gene expression profiling in academic research and industry alike. Expressionist Refiner is specifically designed to bridge the gap between raw technology-dependent data and high-level data mining analysis. We present a systematic statistical approach for extracting true expression values from the raw microarray data.

Long Abstract

 

 

23A. Exploration of the Expression and Functional Annotation of Genes Identified by Representational Difference Analysis and Global Microarrays.

Tove Andersson1, Per Unneberg1, Peter Nilsson1, Jacob Odeberg1, John Quackenbush2 and Joakim Lundeberg3. 1Department of Biotechnology, Royal Institute of Technology, Stockholm, Sweden, 2The Institute for Genomic Research, Rockville, Maryland, USA and 3Department of Biotechnology, Royal Institute of Technology, Stockholm, Sweden.

tove@biotech.kth.se

 

We demonstrate how the mapping of elements in global microarrays to UniGene database entries, together with the visual projection of genes onto different classification systems, can be used to explore the differential expression and functional annotation of genes identified by representational difference analysis in a macrophage/foam cell model for atherosclerosis.

Long Abstract

 

 

24A. Optimal Design of Oligos for Micro Array Gene Expression Profiling.

Niels Tolstrup, Annett M. Frankel, Eivind Tøstesen, Jens G. Kolberg, Søren M. Echwald, Peter Stein Nielsen, Sakari Kauppinen and Henrik Vissing. Biomolecular Informatics and Expression Microarrays, Exiqon A/S.

tolstrup@exiqon.com

 

We present a system for the design of LNA modified oligos for micro arrays. It features LNA modified oligonucleotide secondary structure prediction, LNA spiked oligo melting temperature prediction, genome wide cross hybridization prediction and secondary structure prediction of the target. The system is available at http://lnatools.com/.

Long Abstract

 

 

25A. On the Integration of Normalization Steps and SAM in cDNA Microarray Data Analysis.

Daewoo Choi1, Hyo Sung Kim1 and Yong Sung Lee2. 1Department of Statistics, Hankuk Univ. of FS, Yongin, Korea and 2Department of Biochemistry, Hanyang Univ., Seoul, Korea.

3banjang@dreamwiz.com

 

In this study, we find some relationship between normalization steps and SAM using simulated data. Also, we propose an improved version of scaled normalization. As another result of our research, the algorithm of determining the asymmetric cut-points is discussed for increasing the power.

Long Abstract

 

 

26A. Partially Supervised Clustering: A Useful Tool for Investigating Coexpression of Gene Microarray Data.

R. Baumgartner1, R. Somorjai1, C. Bowman1, R. Summers1 and S. Booth2. 1Institute for Biodiagnostics, National Research Council Canada and 2National Microbiology Laboratory, Health Canada, Winnipeg, Manitoba, Canada.

christopher.bowman@nrc.ca

 

We present a hybrid gene expression study from an experiment of genes differentially expressed in response to a neurotoxic peptide. We show that efficient incorporation of a small amount of prior knowledge about gene labels significantly improves clustering results for very low contrast-to-noise ratios, especially beneficial for the gene microarrays.

Long Abstract

 

 

27A. Osprey: An Application for a Wide Range of Oligonucleotide Design Tasks.

Paul Gordon and Christoph Sensen. University of Calgary, Sun Center of Excellence for Visual Genomics, Faculty of Medicine, Department of Biochemistry and Molecular Biology, 3330 Hospital Drive NW, Calgary, Alberta, Canada, T2N 4N1.

gordonp@ucalgary.ca

 

Osprey is an application that efficiently automates the design of oligonucleotides for amplification, genome sequencing (walking and polishing), differential expression display and microarrays. Through a single program, users may design for any of these experiments, in the context of single clone to genome-scaled data. Constraints are based on thermodynamic models of annealing, rather than potentially inaccurate rules-of-thumb.

Long Abstract

 

 

28A. An Image-Based Visualization of Microarray Features and Classification Results.

Peter Bajcsy, Ph.D.1 and Lei Liu, Ph.D2. 1National Center for Supercomputing Applications, 605 East Springfield Avenue, Champaign, IL 61820 and 2The W. M. Keck Center for Comparative and Functional Genomics, University of Illinois at Urbana-Champaign, 330 ERML, 1201 W. Gregory Dr., Urbana, IL 6180.

pbajcsy@ncsa.uiuc.edu and leiliu@uiuc.edu

 

We present a novel image-based visualization approach for fast screening and inspection of DNA microarray data. The DNA microarray data, including laser scanned imagery, extracted features and labeled classification results, are displayed as high-dimensional images. Four types of data visualization are demonstrated, such as, visualization with spatial information pattern, multi-feature visualization, visualization of labeled classification results and multi-grid isualization.

Long Abstract

 

 

29A. Preprocessing Microarray Data to Improve Power of Multiple Testing.

Tzulip Phang and Larry Hunter. Center for Computational Pharmacology, University of Colorado Health Sciences Center.

tzu.phang@uchsc.edu

 

We have developed preprocessing steps to eliminate meaningless genes in microarray data analysis. The gene reduction process is important to enhance the power of multiple comparison correction procedures. These preprocessing methods will screen out low variance, low mRNA level, and inconsistent genes prior to standard statistical analysis.

Long Abstract

 

 

30A. Classification of Spot Profile in Microarray Image Data by using Statistical Characteristics.

Masaru Takeya1, Masao Iwamoto1, Takehiro Matsuda2, Norimichi Tsumura2 and Yoichi Miyake2. 1National Institute of Agrobiological Sciences and 2Chiba University.

katu@affrc.go.jp

 

The scratched spot or the additive noise spot in microarray image can be detected automatically. The mean, variance, skewness, and kurtosis of intensity values on each spot are used as characteristics of classification. This method is applied to classify spots of rice DNA microarray into four groups based on shape.

Long Abstract

 

 

31A. R-MDAT: Development of a GUI-based Microarray Data Analysis Tool Using R-Language.

Sang-Cheol Kim, Jee-Hyub Kim, Charny Park and Cheol-Goo Hur. National Center for Genome Information, KRIBB.

hurlee@mail.kribb.re.kr

 

R-MDAT is a GUI-based tool for DNA microarray analysis and was developed using R-language. R-MDAT integrated projection and clustering algorithms provided from R-project, and was designed to adopt new algorithms. Now, it supports PCA, SVD, Hierarchical, K-means, SOM, and plotting functions for quality control, and will support classification analysis in the future.

Long Abstract

 

 

32A. Client-Server Solution for Large Scale Gene Expression Data Mining.

Alexander Sturn, Bernhard Mlecnik, Roland Pieler, Johannes Rainer, Thomas Truskaller and Zlatko Trajanoski. Institute of Biomedical Engineering, Graz University of Technology, Krenngasse 37, 8010 Graz, Austria.

alexander.sturn@tugraz.at

 

We have developed a platform independent, flexible and scalable Java suite for large scale gene expression data mining, which integrates various computational intensive hierarchical and non hierarchical clustering algorithms. The suite includes a powerful client for data preparation and results visualization, an application server for computation and additional administration tools.

Long Abstract

 

 

33A. A Comparison of Clustering Techniques for Gene Expression Data.

Michiel de Hoon, Seiya Imoto and Satoru Miyano. University of Tokyo.

mdehoon@ims.u-tokyo.ac.jp

 

We have implemented several clustering algorithms (hierarchical, k-means, and Self-Organizing Maps) in a

C subroutine library, available at http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/software.html. We assessed the suitability of these methods by applying them to expression data, using several distance measures, and comparing the clustering solutions to each other and to existing biological knowledge.

Long Abstract

 

 

34A. Can Molecular Mechanisms of Biological Processes Be Extracted from Expression Profiles? Case Study: Endothelial Contribution to Tumor-Induced Angiogenesis.

Maria Novatchkova, Alexander Schleiffer and Frank Eisenhaber. Research Institute of Molecular Pathology (IMP).

novatchkova@imp.univie.ac.at

 

The applicability of gene expression data in obtaining mechanistic information rather than diagnostic profiles was studied using expression analysis of tumor-induced angiogenesis. The interpretation of a gene expression set using advanced sequence analysis methods allowed the description of molecular processes implementing angiogenesis but did not reveal key regulatory molecules.

http://mendel.imp.univie.ac.at/SEQUENCES/TEMS

Long Abstract

 

 

35A. MAC: A Dynamic Program that Tracks Samples between Microplates and Microarrays.

Kei-Hoi Cheung1, Janet Hager2,3, Kevin White4, Kenneth Williams2,3, Kenneth Nelson5, Michael Snyder3,5, Yu Li1 and Perry Miller1,5. 1Center for Medical Informatics, 2Keck Biotechnology Resource Laboratory at Yale, 3Department of Molecular Biophysics, 4Department of Genetics and 5Department of Molecular, Cellular and Developmental Biology Yale University, New Haven, CT 06520, USA.

kei.cheung@yale.edu

 

MAC is a Web-based program that dynamically maps coordinates between microplates and spotted microarrays. Not only is the program platform-independent, but it also allows users to enter a set of parameters that cover a wide range of array configurations. MAC is available at http://yam.med.yale.edu/cgi-bin/cgiwrap/kei/kc_mac_dev8.pl.

Long Abstract

 

 

36A. Immunotranscriptomics: Analysis of Gene Expression in the Immune System.

Helen J. Kirkbride, Josef A. Walker and Darren R. Flower. The Edward Jenner Institute for Vaccine Research.

helen.kirkbride@jenner.ac.uk

 

Microarrays are being used to determine the genes involved in regulation of the human immune system. Software has been developed to cluster genes using various criteria within their expression profiles, and display biological information concerning their putative function from external sources. Regulatory interactions between genes are investigated using Bayesian statistics.

Long Abstract

 

 

37A. Filtering and Normalization Strategies for Microarray Generated Gene Expression Profiles.

Jennifer Listgarten, Kathryn Graham, Sambasivarao Damaraju, John Mackey, Carol Cass and Brent Zanke. Cross Cancer Institute, Alberta Cancer Board, Edmonton.

jennilis@cancerboard.ab.ca

 

Little consensus exists in the microarray community on methods of normalization and filtering of data. An attempt at systematic comparison of various filtering and normalization schemes was made on a set of 30 tumor samples collected through the PolyomX project. Additionally, the interaction between normalization and filtering schemes was examined.

Long Abstract

 

 

38A(i). Blind Gene Classification : An ICA-based Gene Classification/Clustering Method.

Gen Hori, Masato Inoue, Shin-ichi Nishimura and Hiroyuki Nakahara. Brain Science Institute, RIKEN, Saitama, Japan.

hori@bsp.brain.riken.go.jp

 

Blind gene classification is a method of gene classification/clustering based on the independent component analysis (ICA) of gene expression data. It finds typical expression patterns from gene expression data, exploiting higher order statistical structure of the data, and classifies genes according to those typical expression patterns.

Long Abstract

 

 

38A(ii) Recovering Reproducible and Biologically Valuable Clusters from Noisy Array Data.

Donna Slonim1, Andrew Hill1, Ryan Baugh2 and Craig Hunter2. 1Department of Genomics, Wyeth Research, 35 CambridgePark Drive, Cambridge, MA 02140, USA and 2Dept. of Molecular and Cellular Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02139.

dslonim@wyeth.com

 

Clustering microarray data to determine global expression patterns in a biological system can be done by a variety of methods. Analyzing data from a C. elegans embryonic development series, we use promoter characterization to evaluate the practical impact of guaranteeing cluster quality through robustness estimation.

Long Abstract

 

 

38A(iii) Evaluation of Data Reduction and Testing Methods for Oligonucleotide Arrays.

Andrew Hill, Donna Slonim and Yizheng Li. Wyeth Research, 35 Cambridge Park Drive, Cambridge, MA 02140, USA.

ahill@genetics.com

 

Oligonucleotide arrays simultaneously monitor the expression levels of ~1e4 genes. Using replicate data and datasets from spiking experiments carried out by Affymetrix, we assess the statistical properties of array readouts. Comparison of normalization and probe summary schemes illustrates that recently described data reduction methodologies can outperform current methods.

Long Abstract

 

 

Systems Biology.

 

39A. Computational Model of the Mammalian Cell Cycle using Hybrid Petri Net. 25

40A. Modeling Mutations, Abnormal Processes, and Disease Phenotypes, using a Workflow/Petri Net Model. 26

41A. FCModeler: Dynamic Graph Display and Fuzzy Modeling of Regulatory and Metabolic Maps. 26

42A. CELLML: a Language for the Definition and Exchange of Cellular Models. 26

43A. BIOCAD for Constructing Gene Regulatory Networks. 26

44A. Probabilistic Boolean Networks as Models of Gene Regulatory Networks: From Inference to Intervention. 26

45A. In silico Studies of Integrated Gene Expression, Protein, and Metabolite Profiles. 27

46A. Modeling and Visualization of the Pattern Formation in Drosophila melanogaster by Genomic Object Net. 27

47A. Generating Petri Nets for Metabolic Network Modelling. 27

48A. Design and Implementation of a Knowledge-Base for Pharmacology. 27

49A. Identifying Functionally Important Protein Regions via Protein Family Correlation Analysis and Atomistic Energy Calculations. 27

50A. Regulatory Network of Transcriptional Regulation in E. coli. 28

51A. An In Silico Experimental Device for Drug Transport Research. 28

52A. A Domain-Specific Ontology and Knowledge Base for Signal Transduction. 28

53A. Pathway Reconstruction and Semantic Data Integration. 28

54A. Genomic Object Net (Ver.1.0): A Platform for Biopathway Modeling and Simulation. 28

55A. Modeling the Blood-Brain Barrier and Transporter Expression. 29

56A. Distributed Agent-Based Software Architectures for Bio-Pathway Simulation. 29

57A(i). Robustness in Models of the MAPK-cascade. 29

57A(ii). The BioCyc Collection of Pathway/Genome Databases.

 

39A. Computational Model of the Mammalian Cell Cycle using Hybrid Petri Net.

Takashi Yoshioka1, Shuji Kotani2 and Akihiko Konagaya2,3. 1NTT Data Co. Ltd., Kayabacho Tower, Shinkawa 1-21-2, Chuo-ku, Tokyo 104-0033, Japan, 2RIKEN Genomic Sciences Center (GSC), Suehiro-cho 1-7-22-W519, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan and 3Graduate School of Knowledge Science, Japan Advanced Institute of Science and Technology, Hokuriku (JAIST), Asahidai 1-1, Tatsunokuchi, Ishikawa 923-1292, Japan.

yossie@rd.nttdata.co.jp

 

To describe the molecular mechanism of the cell cycle, we developed a new computational model that makes use of hybrid-Petri nets. This model can easily reproduce “knockout” or overexpression of specific genes, so this model is a potentially useful means of exploring relations between gene functions and diseases.

Long Abstract

 

 

40A. Modeling Mutations, Abnormal Processes, and Disease Phenotypes, using a Workflow/Petri Net Model.

Mor Peleg, Irene S. Gabashvili and Russ B. Altman. Stanford Medical Informatics, Stanford University, Stanford, CA, USA .

peleg@smi.stanford.edu

 

We developed a qualitative model of molecular function that links genetic polymorphisms of tRNA to affected cellular processes and disease phenotypes. Our model is based on a Workflow model that is mapped to Petri Nets, and incorporates controlled biomedical vocabularies. It enables querying and simulation. The model is available at http://www.smi.stanford.edu/people/peleg/Process_Model.htm.

Long Abstract

 

 

41A. FCModeler: Dynamic Graph Display and Fuzzy Modeling of Regulatory and Metabolic Maps.

Julie Dickerson1, Zach Cox1 and Andy Fulmer2. 1Iowa State University and 2Proctor & Gamble.

julied@iastate.edu

 

The FCModeler tool uses fuzzy methods for modeling signal transduction, gene regulatory, and metabolic networks and interprets the results using fuzzy cognitive maps. The front end of the system is a dynamic graph visualization system that uses graph theoretic models to analyze the structure of metabolic maps and search for pathway interactions.

Long Abstract

 

 

42A. CELLML: a Language for the Definition and Exchange of Cellular Models.

Bullivant, DP, Cuellar, AA, Hedley, WJ, Hunter, PJ, Nelson, MR and Nielsen, PF. The Bioengineering Institute, The University of Auckland, New Zealand and Physiome Sciences, Inc, Princeton, New Jersey.

a.cuellar@auckland.ac.nz

 

CellML is an XML-based exchange format for describing the structure and underlying mathematics of cellular level models. The language specification has been developed by the Bioengineering Institute at the University of Auckland, in collaboration with Physiome Sciences, Inc. The language specification, tools, and examples are available on http://www.cellml.org.

Long Abstract

 

 

43A. BIOCAD for Constructing Gene Regulatory Networks.

Hiroyuki Kurata. Kyushu Institute of Technology.

kurata@bse.kyutech.ac.jp

 

BIOCAD is a powerful software suit with GUI for constructing a large-scale map of complicated biochemical reaction networks, where chemical reaction equations with detailed attribute tags were employed in an XML-based common representation. Novel notation for designing the map is developed by improving Kohn's method.

Long Abstract

 

 

44A. Probabilistic Boolean Networks as Models of Gene Regulatory Networks: From Inference to Intervention.

Ilya Shmulevich1, Edward Dougherty2, Seungchan Kim3 and Wei Zhang1. 1University of Texas MD Anderson Cancer Center, 2Texas A&M University and 3NHGRI/NIH.

is@ieee.org

 

We introduce Probabilistic Boolean Networks (PBN) as models of gene regulatory networks. PBNs incorporate rule-based dependencies between genes, allow the systematic study of global network dynamics, are able to cope with uncertainty, and permit the quantification of the relative influence of genes on other genes.

Long Abstract

 

 

45A. In silico Studies of Integrated Gene Expression, Protein, and Metabolite Profiles.

Matej Oresic and Tom Plasterer. Beyond Genomics, Inc.

moresic@beyondgenomics.com

 

We present a systems biology platform for analysis of metabolite, protein, and gene expression data. Our approach includes unique normalization and integration of data, as well as data- and knowledge-driven pathway analysis, which in turn, leads to new hypotheses and further experiments.

Long Abstract

 

 

46A. Modeling and Visualization of the Pattern Formation in Drosophila melanogaster by Genomic Object Net.

Hiroshi Matsuno1, Rie Yamane1, Sachie Fujita1, Naoyuki Yamasaki1, Ryutaro Murakami1 and Satoru Miyano2. 1Faculty of Science, Yamaguchi University and 2Human Genome Center, Institute of Medical Science, University of Tokyo.

matsuno@sci.yamaguchi-u.ac.jp

 

Genomic Object Net is the biosimulation system based on hybrid functional Petri net architecture and XML technology. With this system, we model two pattern formations by Notch signaling in Drosophila melanogaster and give intuitive visualization of these simulation results. Genomic Object Net can be accessed through http://www.GenomicObject.Net.

Long Abstract

 

 

47A. Generating Petri Nets for Metabolic Network Modelling.

Rainer König, Marco Weismüller and Roland Eils. Intelligent Bioinformatics Systems, German Cancer Research Center, 69120 Heidelberg, Germany.

r.koenig@dkfz.de

 

To define organism-specific detailed metabolic Petri nets, we compile a metabolic net by combining the metabolic KEGG-database, taking enzymatic reactions and classifications of metabolic subnets, with the sequence based databases Swissprot and Embl/Genbank, getting enzyme locations and organism specific abundancies. Furthermore, we compare the net's connectivity with scale free networks.

Long Abstract

 

 

48A. Design and Implementation of a Knowledge-Base for Pharmacology.

George Acquaah-Mensah and Larry Hunter. Center for Computational Pharmacology, University of Colorado School of Medicine, Denver, Colorado, USA.

George.Acquaah-Mensah@uchsc.edu

 

We present an object-oriented knowledge representation that captures key pharmacological concepts: ligands, proteins, processes and anatomy. Ligands interact in myriad ways with biomolecules (such as proteins), initiating, altering or even terminating diverse physiological and behavioral processes. These interactions lie at the heart of events of significance to pharmacology.

Long Abstract

 

 

49A. Identifying Functionally Important Protein Regions via Protein Family Correlation Analysis and Atomistic Energy Calculations.

Manish C. Saraf, Gregory L. Moore and Costas D. Maranas. Dept. of Chemical Engineering, Pennsylvania State University.

costas@psu.edu

 

A key challenge in using combinatorial libraries for protein engineering is that most of the library members are non-functional and often do not even fold correctly. We elucidate favorable/unfavorable residue combinations by (i) analyzing protein families for correlation and (ii) constructing sequence ensembles based on internal energy calculations with CHARMM.

Long Abstract

 

 

50A. Regulatory Network of Transcriptional Regulation in E. coli.

Gama-Castro S., A. Martínez-Antonio, R. Gutiérrez-Ríos, H.P. Salgado, M. Spínola, A. Santos-Zavaleta and J. Collado-Vides. Program of Computational Genomics, CIFN, UNAM, A,P, 565-A, Cuernavaca, Morelos 62100, México.

sgama@cifn.unam.mx

 

Making use of the information contained in RegulonDB database, the regulatory network of transcriptional regulators in E. coli was obtained. The interactions found in the network were analyzed in terms of known operons, promoters, and in terms of the conditions of the expression and repression of the regulated genes.

Long Abstract

 

 

51A. An in silico Experimental Device for Drug Transport Research.

Yu Liu, Carolyn Cummins and C. Anthony Hunt. UCSF/UCB Joint Graduate Group in Bioengineering, University of California, Berkeley, CA 94720, USA Department of Biopharmaceutical Sciences, University of California, San Francisco, CA 94143, USA.

yuliu@socrates.berkeley.edu

 

We use an agent-based model as an in silico experimental device that represents a mammalian intestinal epithelium. It mimics key features of the in vitro Caco-2 epithelial cell model. This innovative device has been designed to yield drug transport data that matches experimental data when given the drug’s physicochemical properties.

Long Abstract

 

 

52A. A Domain-Specific Ontology and Knowledge Base for Signal Transduction.

Jens Eberlein and Lawrence Hunter. Center for Computational Pharmacology, University of Colorado School of Medicine.

jens.eberlein@uchsc.edu

 

We have developed a rich knowledge base of known signal transduction pathways in order to (1) facilitate the interpretation of gene expression data, (2) to provide prior probabilities over the structure and content of probabilistic network models, and (3) to provide knowledge useful in natural language understanding.

Long Abstract

 

 

53A. Pathway Reconstruction and Semantic Data Integration.

Roland Carel, Ph.D., Krzysztof Jezak, Russ Green and Jack Pollard, Ph.D. 3rd Millennium.

jpollard@3rdmill.com

 

We will describe a semantic integration technology driven by ontologies for researchers analyzing biological pathways. This technology is capable of extracting and integrating data from a variety of genomic, interaction, and pathway sources and allows users to define their own integration processes.

Long Abstract

 

 

54A. Genomic Object Net (Ver.1.0): A Platform for Biopathway Modeling and Simulation.

Nagasaki, M., A. Doi, M. Sasaki, C.J. Savoie, H. Matsuno and S. Miyano. University of Tokyo.

miyano@ims.u-tokyo.ac.jp

 

Genomic Object Net is revised and implemented from scratch with JAVA so that it shall work as a general platform for biopathway modeling and simulation. The new version employs an extension of hybrid functional Petri net architechture, XML pathway/data documentation, and a GUI realizing more biologically intuitive usage.

Long Abstract

 

 

55A. Modeling the Blood-Brain Barrier and Transporter Expression.

Amina Qutub1, Tomoki Hashimoto2 and C. Anthony Hunt1,3. 1UCSF/UCB Joint Graduate Group in Bioengineering, Berkeley and San Francisco, 2UCSF Center for Cerebrovascular Research and 3Department of Biopharmaceuticals, UCSF San Francisco, CA, USA.

aminaq@socrates.berkeley.edu

 

This research presents a new computer-based experimental model that simulates transporter expression and function at the blood-brain barrier membrane. Specifically, this project focuses on modeling the essential parameters in the transport of glucose to the brain through the GLUT1 membrane transporter.

Long Abstract

 

 

56A. Distributed Agent-Based Software Architectures for Bio-Pathway Simulation.

Doheon Lee1, Kwang-Hyung Lee1 and Yonggwan Won2. 1Department of BioSystems, KAIST, Daejeon, Korea and 2Department of Computer Engineering, Chonnam Nat’l Univ. Gwangju, Korea.

dhlee@mail.kaist.ac.kr

 

This paper proposes software architectures for bio-pathway simulation based on distributed agent technology. The fundamental advantages of distributed agents are their effectiveness in handling heterogeneous pathway information and efficiency in scalability. They also utilize XML to represent semi-structured data such as quantitative reaction rules, and adopt Petri nets to model concurrent pathway executions.

Long Abstract

 

 

57A(i). Robustness in Models of the MAPK-cascade.

Nils Bluethgen and Hanspeter Herzel. Theoretical Biology, Humboldt University Berlin.

nils.bluethgen@itb.biologie.hu-berlin.de

 

Using dynamical models we analyse the robustness of features of signaling cascades. This allows us to predict in silico the function of a cascade. We show that the well conserved MAPK-cascade shows highly robust ultrasensitivity suggesting that the cascade works as a switch in the intracellular signaling network.

Long Abstract

 

 

57A(ii). The BioCyc Collection of Pathway/Genome Databases.

Peter D. Karp, Cindy Krieger, Suzanne Paley, John Pick and Pedro Romero. SRI International.

pkarp@ai.sri.com

 

BioCyc is a collection of Pathway/Genome Databases (PGDBs) that are available at the SRI Web site (http://BioCyc.org), and for local installation. The BioCyc collection includes the EcoCyc E. coli database, the MetaCyc database containing 450 metabolic pathways from 150 different organisms, and PGDBs from 12 additional microbes.

Long Abstract

 

 

Functional Genomics.

 

58A. No poster. 29

59A. The European Comparative Genetic Resource (EuroCOMP). 29

60A. Haplotype Variation in Human G Protein-Coupled Receptor (GPCR) Genes. 30

61A. Automated Analysis of MALDI-TOF Mass Spectrometry SNP Genotyping Data. 30

62A. Prediction of Protein Function and Interaction from Complete Genomes. 30

63A. In-silico Genomics : A Bioinformatic Analysis of Retinoblastoma-specific RB1 Mutational Spectra. 30

64A. Detection of Regulatory Circuits by Integration of Protein-Protein and Protein-DNA Interaction Data. 30

65A. GoPArc. 31

66A. ProDB: Integrating Proteome and Genome Data. 31

67A. GTKrio: an Open Source Environment for Functional Genomics. 31

68A. Interaction Generality, a Measurement to Assess Reliability of Protein-Protein Interaction. 31

69A. GelScape: An Interactive Web-based Gel Viewing and Annotation System. 31

70A. ESDB: A Web-Based Application for Analyzing and Managing DNA Sequences Disrupted By Tagged Sequence Mutagenesis From Mouse Embryonic Stem Cells. 32

71A. caCORE: A Package of Object Models, Databases, Controlled Vocabularies, and APIs for Genomic and Clinical Application Development. 32

72A. IPPRED: Server for Protein Interactions Inference. 32

73A. "Gene Discovery": Search for Regularities in Gene Promoters. 32

74A. The International Rice Information System: A Platform for Meta-Analysis of Rice Data. 33

75A. How many SNPs Do We Need for Whole-Genome Linkage Disequilibrium Mapping?  33

76A. Bioinformatics in a Fully Automated Cellular Perturbation Environment for the Identification of Medically Relevant Genes. 33

77A. The Rice Growth Monitoring System for Phenotypic Functional Analysis. 33

78A. Mapping DNA Regulatory Sequences to a Metabolic Network. 33

79A. ADDA: A Novel Method for Partitioning Protein Sequences into Domains. 33

80A. Protein-Protein Interaction Analysis of Transcription Factors and Its use for the Identification of Cooperatively Acting Transcription Factors. 34

81A. The Eukaryotic Core Proteome. 34

82A. Alternative Splice Variants as Natural Competitive Inhibitors of Known Proteins. 34

 

58A. No poster.

 

59A. The European Comparative Genetic Resource (EuroCOMP).

Jitka Sengerova1, Hrabe de Angelis M.2, Beckers J.2, Blanquet V.2, Fuchs H.2, Hahn A2., Schaeble K2, Schneider R2, Soewarto D.2, Tiedemann H2, Werner T.2, Peters J.3, Greenfield A.3, Nolan P.3, Herault Y.4, Scherf M5. 1EMBL-EBI, The Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK, 2GSF Munich, 3MRC Harwell, 4CDTA-CNRS Orleans and 5Genomatix Munich.

jitka@ebi.ac.uk

 

The mouse became an important model organism for understanding the function of novel genes and is an invaluable source for the investigation of human disease processes. The EuroCOMP aims to provide the scientific community with new mutations created in a large-scale phenotype-driven mutagenesis program. Relevant information will be accessible through the WWW.

Long Abstract

 

 

60A. Haplotype Variation in Human G Protein-Coupled Receptor (GPCR) Genes.

Ruhong Jiang, Debra Tanguay, Zongliang Mu, Ping Zhan, Manish Pungliya, Julie Schneider, Min Wei, Carole Harris-Kerr, Jicheng Duan, Krishnan Nandabalan, J. Claiborne Stephens, and Chuanbo Xu. Genaissance Pharmaceuticals, Inc. New Haven, CT 06511, USA.

r.jiang@genaissance.com

 

We have investigated haplotype variation in human GPCR genes by using a bioinformatics approach. Our findings showed that GPCR genes have a substantial amount of genetic variability in both coding and noncoding regions. An understanding of the nature of this genetic variability has important implications for drug development and optimization.

Long Abstract

 

 

61A. Automated Analysis of MALDI-TOF Mass Spectrometry SNP Genotyping Data.

Jan-Henner Wurmbach1, Jens Decker2, Thomas Schott1, Markus Kostrzewa2, Herbert Thiele1 and Wolfgang Pusch1. 1Bruker Daltonik Fahrenheitstrasse 4 28359 Bremen Germany and 2Bruker Saxonia Analytik Permoser Strasse 15 04318 Leipzig Germany.

WPU@bdal.de

 

Bioinformatic tools can follow various approaches for automated SNP genotyping from MALDI-TOF mass spectrometry raw data. Typically, the mass spectra displaying allele-specific detector molecules are analyzed by classical peak-picking based algorithms. However, for complex multiplex spectra we pursue also fuzzy-logic and fast Fourier transformation/correlation approaches.

Long Abstract

 

 

62A. Prediction of Protein Function and Interaction from Complete Genomes.

Bingding Huang and Yixue Li. Bioinformatics Center, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences.

markero@263.net

 

The fully sequenced genomes of over 50 organisms have led to the rapid growth of the sequence or its related databases, which leaves a vast amount of genes unannotated. For genome projects to be successful there should be fast and reliable ways to identify the functions of unknown proteins. Currently various new computational methods have been proposed to predict protein function and protein-protein interaction from genome sequences. Here,we would like to review several approaches for the functional assignment of uncharacterized proteins and then present a novel and effectively method to detect protein-protein interaction from complete sequenced genomes based on gene fusion events.

Long Abstract

 

 

63A. In-silico Genomics : A Bioinformatic Analysis of Retinoblastoma-specific RB1 Mutational Spectra.

S. Lithwick1, A. Fadiel2, A. J. Cuticchia1 and B. Gallie3. 1University of Toronto, Toronto, Canada, 2Centre for Computational Biology, Hospital for Sick Children, Toronto, Canada and 3Ontario Cancer Institute, Princess Margaret Hospital, Toronto, Canada.

stuart.lithwick@utoronto.ca

 

Non-coding regions of the RB1 gene have been examined using bioinformatics techniques to identify regulatory sequences that might be subject to mutation in certain retinoblastoma tumors, which lack coding sequence changes. Also, germline and somatic RB1 mutational datasets have been compared and contrasted to identify tissue-specific patterns.

Long Abstract

 

 

64A. Detection of Regulatory Circuits by Integration of Protein-Protein and Protein-DNA Interaction Data.

Esti Yeger-Lotem 1,2 and Hanah Margalit2. 1Department of Computer Science, Technion, Haifa 32000, Israel and 2Department of Molecular Genetics and Biotechnology, Faculty of Medicine, The Hebrew University, POB 12272, Jerusalem 91120.

estiy@cs.technion.ac.il

 

A major post-genomic challenge is to reveal the interplay between genes and proteins within a living cell. Using a novel application of classical graph algorithms we integrate data of yeast protein-protein and protein-DNA interactions, and exploit it for the discovery of simple and complex multi-level regulatory circuits.

Long Abstract

 

 

65A. GoPArc.

Daniela Bartels, Alexander Goesman and Folker Meyer. Bielefeld University, 33594 Bielefeld, Bielefeld, NRW, 33594, Germany.

daniela.bartels@genetik.uni-bielefeld.de

 

We present a comprehensive framework for the integration of gene ontologies and metabolic pathways. GoPArc provides an object oriented API to view genome, transcriptome and proteome data from the perspective of GO categories, TIGR roles, Monica Riley categories and KEGG pathways. The system is based on a relational database and offers an extensible interface for GenDB, EMMA and other systems.

Long Abstract

 

 

66A. ProDB: Integrating Proteome and Genome Data.

Andreas Wilke, Christian Rückert and Folker Meyer. Center for Genome Research, Bielefeld University, Germany.

Andreas.Wilke@genetik.uni-bielefeld.de

 

We have developed an open source system that acts as a connection layer between mass spectrometry data and the GenDB (http://gendb.genetik.uni-bielefeld.de) annotation system. The system allows analysis of the data with Mascot and results are automatically presented to user. The system is based on a relational database backend for storage mass spectra together with experimental data.

Long Abstract

 

 

67A. GTKrio: an Open Source Environment for Functional Genomics.

A.Splendiani, C.Vizzardelli, N.Pavelka, M.Pelizzola, M.Capozzoli, F.Granucci, P.Ricciardi-Castagnoli, E.Virzi and P.Fantucci. Univ. Milano Bicocca.

andrea.splendiani@unimib.it

 

We propose an open source project for gene expression data analysis and functional genomics. It will be built around a set of technologies that will allow cross-platform capabilities without sacrificing performance, and will be open and allow the integration of software written in many languages.

Long Abstract

 

 

68A. Interaction Generality, a Measurement to Assess Reliability of Protein-Protein Interaction.

Harukazu Suzuki, Rintaro Saito and Yoshihide Hayashizaki. Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center (GSC), Yokohama, Japan.

harukazu@gsc.riken.go.jp

We introduce the “interaction generality” measure, which can be used to computationally assess the reliability of the protein-protein interaction data by using only a list of interactions. We also report the results of networks of interaction data that we made more reliable by applying this method.

Long Abstract

 

 

69A. GelScape: An Interactive Web-based Gel Viewing and Annotation System.

Nelson Young, Zhan Chang and David S. Wishart. Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta.

ny1@ualberta.ca

 

GelScape is an interactive Java-based application that offers a wide variety of tools to annotate, view, and archive 1D or 2D gels. In addition, GelScape has comprehensive image manipulation capabilities that permit spot quantification as well as warping and matching of gels.

Long Abstract

 

 

70A. ESDB: A Web-Based Application for Analyzing and Managing DNA Sequences Disrupted By Tagged Sequence Mutagenesis From Mouse Embryonic Stem Cells.

Liu S-Y, Mou Y, Delange L, Tsuyuki D, Arapovic D and Hicks GG. Manitoba Institute of Cell Biology, University of Manitoba, Winnipeg, Manitoba.

soliu@cc.umanitoba.ca

 

ESDB was developed to organize information on thousands of new ES cell clones and improve the sequence analysis strategy to identify target genes from short sequence tags. The application will automate data entry, data validation, Blast results, keywords, and hyperlinks to gene-specific identifiers in GenBank, LinkOut, Pubmed, and OMIM.

Long Abstract

 

 

71A. caCORE: A Package of Object Models, Databases, Controlled Vocabularies, and APIs for Genomic and Clinical Application Development.

Peter A. Covitz1, Himanso Sahni2, Scott Gustafson2, Frank Hartel1, Sherri De Coronado1, Gilberto Fragoso1, Jean-Jaques Maurer3, Lisa Chatterjee3, Carl Schaefer1 and Kenneth Buetow1. 1National Cancer Institute Center for Bioinformatics, Rockville, MD, USA; 2Science Applications International Corporation, Annapolis, MD, USA; 3Oracle Corporation, Reston, VA, USA.

covitzp@mail.nih.gov

 

caCORE is a combination of largely open source technologies that support data management, access and vocabulary control for genomic, biological pathway, and clinical trials research. caCORE is intended to support biomedical applications that bridge genomics to clinical research. caCORE and an example of such a bridging application will be presented.

Long Abstract

 

 

72A. IPPRED: Server for Protein Interactions Inference.

Nicolas Goffard, Virginie Garcia, Alexis Groppi and Antoine de Daruvar. Centre de Bioinformatique Bordeaux, Université V. Segalen, Bordeaux 2, Bordeaux, France.

nicolas.goffard@pmtg.u-bordeaux2.fr

 

IPPRED is a WEB based server to infer protein interactions. This simple inference by homology allows to propose or to validate potential interactions. In some cases, the inference also gives indications concerning the domains involved in the interaction. IPPRED is available at http://cbi.labri.fr/ippred.

Long Abstract

 

 

73A. "Gene Discovery": Search for Regularities in Gene Promoters.

Yury L. Orlov, Mikhail A. Pozdniakov, Nikolay A. Kolchanov and Eugenii E. Vityaev. Institute of Cytology and Genetics, Novosibirsk, Russia.

orlov@bionet.nsc.ru

 

The PC software system "Gene Discovery" discovers regularities connecting nucleotide sequences of promoter regions with the functional class of corresponding genes. The system constructs specific oligonucleotide patterns as first-order logic expressions. These patterns selected for co-regulated genes annotated in the TRRD database (http://www.mgs.bionet.nsc.ru/mgs/gnw/trrd/) predict promoters with high specificity.

Long Abstract

 

 

74A. The International Rice Information System: A Platform for Meta-Analysis of Rice Data.

R. Bruskiewich, A. Cosico, W. Eusebio, A. Portugal, L. Ramos, T. Reyes, V. Ulat and C. G. McLaren International Rice Research Institute (IRRI) DAPO 7777, Metro Manila, Philippine. http://www.irri.org

r.bruskiewich@cgiar.org.

 

The International Rice Information System (IRIS, http://www.iris.irri.org) is the rice implementation of the International Crop Information System (ICIS, http://www.cgiar.org/icis), a database system for the management and integration of global information on breeding pedigrees and field characterization for any crop. IRIS is now being extended to rice functional genomics.

Long Abstract

 

 

75A. How many SNPs Do We Need for Whole-Genome Linkage Disequilibrium Mapping?

Maido Remm and Andres Metspalu. Estonian Biocentre and University of Tartu, ESTONIA.

mremm@ebc.ee

 

Using Chr21 data from Patil et al. Science 294:1719, we calculated haplotype block length distributions. Using these distributions, we simulated random haplotype blocks and estimated how many haplotype blocks and how many SNPs would be required to cover all EXONS in the whole human genome.

Long Abstract

 

 

76A. Bioinformatics in a Fully Automated Cellular Perturbation Environment for the Identification of Medically Relevant Genes.

S. Röhrig, A. Spychaj, R. Korn, A. Felber, R. Köckerbauer, B. Kesper and C. Hergersberg. Xantos Biomedicine AG, Fraunhoferstr. 22, D-82152, Martinsried, Germany.

s.roehrig@xantos.de

 

Discovery of medically relevant genes is our foremost interest. To this end we combine the screening of human cDNA libraries using our cell based and fully robotics assisted XantoScreen™ technology with expression profiling techniques and the development of an innovative bioinformatics platform.

Long Abstract

 

 

77A. The Rice Growth Monitoring System for Phenotypic Functional Analysis.

Takanari Tanabata1, Toru Ishizuka2, Makoto Takano3 and Tomoko Shinomura2. 1Hitachi Research Laboratory, 2Hitachi Central Research Laboratory and 3National Institute of Agrobiological Sciences.

ttanaba@hrl.hitachi.co.jp

 

We are developing an automatic digital imaging system for acquiring plant growth measurements necessary for detailed physiological / phenotypic analysis of gemmating rice seedling. By comparing WT and a phyA mutant, we were able to calculate differential growth rates of coleoptile, 1st, and 2nd leaves.

Long Abstract

 

 

78A. Mapping DNA Regulatory Sequences to a Metabolic Network.

Laurence Ettwiller, Johan Rung and Ewan Birney. EBI, genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

ettwille@ebi.ac.uk

 

We present a method which uncovers mappings between yeast regulatory sequences to protein function. About a hundred Patterns have significant overlap beween the network of genes linked by a pattern and the metabolic network suggesting that proteins acting on the same compounds have significantly higher chances to be regulated in synergy.

Long Abstract

 

 

79A. ADDA: A Novel Method for Partitioning Protein Sequences into Domains.

Andreas Heger and Liisa Holm. EMBL-EBI.

heger@ebi.ac.uk

 

ADDA is an algorithm for delineating domain boundaries in protein sequences. Domain boundaries in a query sequence are optimised in context with its BLAST neighbours. This allows avoidance of oversplitting due to truncated local alignments. http://www.ebi.ac.uk/sgg.

Long Abstract

 

 

80A. Protein-Protein Interaction Analysis of Transcription Factors and Its use for the Identification of Cooperatively Acting Transcription Factors.

Ricardo Bringas. Centro de Ingenieria Genetica y Biotecnologia, Ave 31 e/ 158 y 190, Cubanacan, Playa, Havana, Havana, 10600, Cuba.

bringas@cigb.edu.cu

 

The public available information on protein-protein interaction of yeast Saccharomyces cerevisiae is analyzed focusing in transcription factors. Networks of transcription-factors interactions are identified as well as potential pairs of transcription factors that co-regulate gene expression. Additionally we have clustered genes according to the interactions pattern they have and identified transcription regulation complexes.

Long Abstract

 

 

81A. The Eukaryotic Core Proteome.

Roland Krause, Karin Schleinkofer, Anne-Claude Gavin and Georg Casari. Cellzome, AGMeyerhofstr., 1, Heidelberg, 69117, Germany.

roland.krause@cellzome.com

 

Recent large scale studies of protein-protein interactions in Saccharomyces cerevisiae have expanded our knowledge to a comprehensive map of protein cooperation. Using the genome sequence of many eukaryotes we can extrapolate the shared eukaryotic core proteome to other organisms. In particular we studied human disorde rs for potential points of intervention.

Long Abstract

 

 

82A. Alternative Splice Variants as Natural Competitive Inhibitors of Known Proteins.

Erez Levanon, Dvir Dahary and Zurit Levine. Compugen LTD, Tel-Aviv, Israel.

erez@compugen.co.il

 

Alternative splicing variants that lack functional domains that exist in other variants of the same gene may act as natural competitive inhibitors. We use Compugen's LEADS platform with the human genome and the EST database to find such alternative splicing variants.

Long Abstract

 

 

 

Structural Biology.

 

83A. The SRS 3D Module: a New View of Structures, Integrating Sequences and Annotations. 34

84A. Consistency Matrices: Quantified Structure Alignments for Sets of Related Proteins. 34

85A. Exploration of Functional Sites in Complex RNA Folds and Macromolecular Assemblages. 35

86A. Structural Modeling for the Exploration of the Evolution of the Basic Helix-Loop-Helix Proteins. 35

87A. A Pattern-based Approach to Protein Feature Space: Use in Discrimination of Protein Fold. 35

88A. Side-Chain Freedom Analysis of Protein-Protein Interactions. 35

89A. ICBS: A Database of Protein-protein Interactions Mediated by Interchain Beta-sheet Formation. 35

90A. An Investigation of Domain: Domain Interactions Using The Pfam Database. 36

91A. TOPS: The database of the Topology of Protein Structures. 36

92A. Side chain flexibility for 1:n protein-protein docking. 36

93A. Protein Substructure Comparison: An Efficient Combinatorial Approach. 36

94A. Identification and Automating Calculation of Homologous Core Structures. 36

95A. Kernel Model Derived from Simplicial Contact Edges for Protein Folding. 37

96A. MALECON: Multiple  Protein Structural Alignment by a Step-Wise Multi-Solution Approach that Maximizes the Number of Spatially Equivalent Residues. 37

97A. Homology Modelling of the AdoMetDC Domain of the Bi-functional Ornithine-Decarboxylase / S-adenosylmethionine Decarboxylase Enzyme from Plasmodium falciparum. 37

98A. An Analysis of Protein Domain Linkers: their Classification and Role in Protein Folding. 37

99A. Chemical Shift Threading – A Direct Approach to Determining Protein Structure from Chemical Shift Data. 37

100A. Hidden Markov Models for Protein Recurrent Core Packing Arrangements. 38

101A. ConSurf: A Server for the Identification of Functional Regions in Proteins by Surface-Mapping of Phylogenetic Information -. 38

102A. Incorporating Sequence and Biochemical Information in Topological Models of Protien Structure Towards The Structural and Functional Genomics. 38

103A. Incorporation of Biochemical Knowledge in Geometric Hashing: a Statistical Assessment. 38

 

83A. The SRS 3D Module: a New View of Structures, Integrating Sequences and Annotations.

Sean O’Donoghue, Joachim E.W. Meyer, Andrea Scafferhans and Karsten Fries. LION Bioscience AG, Waldhoferstr. 98, Heidelberg, 69117, Germany.

sean.odonoghue@lionbioscience.com

 

SRS 3D allows users to easily find all structural data for a target sequence, select appropriate structures, and visualize structures with annotations from other databases. Currently, SRS 3D provides structural information for 120,000 sequences, 330,000 SwissProt sequence features, and 1.5 million InterPro domain annotations. Other annotation databases can be integrated.

Long Abstract

 

 

84A. Consistency Matrices: Quantified Structure Alignments for Sets of Related Proteins.

Ivo Van Walle1, I. Lasters2 and L. Wyns1. 1Dept. of Ultrastructure, Vrije Universiteit Brussel, Paardenstr. 65, Sint-Genesius-Rode, 1640, Belgium, 2Algonomics NV, www.algonomics.com.

ivwalle@vub.ac.be

 

Consistency matrices describe the comparability of 2 proteins in a more informative way than an alignment of their residues. They are derived from a pseudo multiple structure alignment and can quantify the spatial conservation of residue positions. Among other things, they can be used for threading and protein structure classification.

Long Abstract

 

 

85A. Exploration of Functional Sites in Complex RNA Folds and Macromolecular Assemblages.

D. Rey Banatao. UCSF/Stanford University, 19 Belleau Ave., Atherton, CA, 94027, USA.

banatao@smi.stanford.edu

 

We describe a novel approach for characterization of functional sites, particularly metal binding sites in complex RNA structures and assemblies. Identifying metal binding sites in RNA is crucial to understanding its structure and function. This method could potentially be applied to characterization of other RNA sites such as RNA-protein and RNA-small molecule interactions.

Long Abstract

 

 

86A. Structural Modeling for the Exploration of the Evolution of the Basic Helix-Loop-Helix Proteins.

Michael J. Buck and William R. Atchley. Department of Genetics, North Carolina State University, Raleigh, NC 27695-7614.

mjbuck@unity.ncsu.edu

 

The problem we are addressing is can we determine a detailed function using 3D models for uncharacterized members of the bHLH family, beyond what can be learned by using sequence alone. We have developed two structural comparison techniques to compare models, which allow us to detect structural/functional relationship hidden when using only sequence.

Long Abstract

 

 

87A. A Pattern-based Approach to Protein Feature Space: Use in Discrimination of Protein Fold.

Josef Panek. Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia.

j.panek@imb.uq.edu.au

 

An approach to feature space, for automated investigation of properties of protein groups was developed. The approach uses patterns of feature attributes of protein sequences based on the physical, chemical and structural properties of amino acids to construct the feature space. The approach allows fold recognition and produces folding rules.

Long Abstract

 

 

88A. Side-Chain Freedom Analysis of Protein-Protein Interactions

Christian Cole and Jim Warwicker. Department of Biomolecular Sciences, UMIST,

Manchester, United Kingdom.

c.cole@umist.ac.uk

 

Protein-protein interactions are crucial to many biological processes. Our understanding, however, of specific contacts is limited and difficult to predict. Side-chain conformational entropy is determined and employed to extend current shape complementarity methods. This yields further information and potential predictive power regarding the driving forces and specificity of these interactions.

Long Abstract

 

 

89A. ICBS: A Database of Protein-protein Interactions Mediated by Interchain Beta-sheet Formation.

Pierre-Francois Baisnée1, Gianluca Pollastri1, Yann Pécout2, James S. Nowick3and Pierre Baldi1. 1Department of Information and Computer Science, University of California, Irvine, CA 92697-3430, 2IUP Génie Physiologique et Informatique, University of Poitiers 86000 Poitiers, France and 3Department of Chemistry, University of California, Irvine, CA 92697-2025.

1pbaisnee@uci.edu, 2gpollast@uci.edu, 3upecout@ics.uci.edu, 4jsnowick@uci.edu, 5pfbaldi@uci.edu.

 

Contacts between the edges of protein beta-sheets play a role in protein-protein interactions that are central to healthy biological function and diseases ranging from AIDS to Huntington's disease. The ICBS database identifies, characterizes and ranks interchain beta-sheet interactions within entries in the Protein Data Bank. The database is available at: http://www.igb.uci.edu/servers/icbs/.

Long Abstract

 

 

90A. An Investigation of Domain: Domain Interactions Using The Pfam Database.

Robert D Finn, Mhairi Marshall and Alex Bateman. The Wellcome Sanger Institute, The Wellcome Genome Campus, Hinxton, Cambs, England CB10 1SA.

rdf@sanger.ac.uk

We have used domains defined in Pfam, a protein domain family database, to investigate structurally interacting domains. The interaction data has been incorporated into Pfam to allow investigation of potentially interacting sequences and the visualisation of interacting residues within a multiple sequence alignment.

Long Abstract

 

 

91A. TOPS: The database of the Topology of Protein Structures.

Ioannis Michalopoulos1, David R. Gilbert2, Gilleain M. Torrance2 and David R. Westhead1. 1School of Biochemistry and Molecular Biology University of Leeds, Leeds LS2 9JT, UK, and 2Department of Computing Science, University of Glasgow, Glasgow G12 8QQ, UK.

ioannis@bioinformatics.leeds.ac.uk

 

TOPS is a database containing information on the Protein Structure Topology. Our relational database of extended protein topology data of all solved structures automatically generates TOPS cartoons (topological abstractions of protein structures) and enables a machine-learning technique to define target topological patterns and match domains. TOPS is available at http://www.tops.leeds.ac.uk/.

Long Abstract

 

 

92A. Side chain flexibility for 1:n protein-protein docking.

Kerstin Koch, Steffen Neumann, Frank Zoellner and Gerhard Sagerer. Technical Faculty, Applied Computerscience Department, Bielefeld University.

kerstin@techfak.uni-bielefeld.de

 

During docking, proteins undergo conformational changes. We are investigating bound and unbound structures of proteins to introduce a new measurement for flexibility. New rotamer libraries for complexed and unbound structures are compiled. The probabilities for rotamer changes are investigated according to the likelihood of the unbound rotamer, secondary structure and rotamericity.

Long Abstract

 

 

93A. Protein Substructure Comparison: An Efficient Combinatorial Approach.

Andrew Binkowski, Bhaskar DasGupta and Jie Liang. University of Illinois at Chicago.

dasgupta@cs.uic.edu

 

An alternative approach to heuristic methods such as Monte Carlo search is developed for detecting substructure similarity of proteins. Based on the two-phase algorithm, this combinatorial method has a theoretical performance guarantee and runs quickly. Examples from the PDB will be shown illustrating the effectiveness of this algorithm.

Long Abstract

 

 

94A. Identification and Automating Calculation of Homologous Core Structures.

Jie Chen, Yanli Wang, Aron Marchler-Bauer and Steve H. Bryant. NCBI/NLM/NIH.

chenj@ncbi.nlm.nih.gov

 

Homologous Core Structure is defined based on comparative method as an indicator of evolutionary distance. The goal of an automatic calculation of a HCS is to allow fully automatic distinction of homologs and analogs.

Long Abstract

 

 

95A. Kernel Model Derived from Simplicial Contact Edges for Protein Folding.

Changyu Hu, Xiang Li and Jie Liang. Dept of Bioengineering, University of Illinois at Chicago.

jliang@uic.edu

 

Pairwise contact potentials cannot stabilize native proteins against decoys. Using edge simplices from alpha shape, we have developed a kernel model by SVM training. Our method succeeds in stabilizing a set of 456 proteins against 15 million decoys. It also has good performance on a test set of 204 proteins.

Long Abstract

 

 

96A. MALECON: Multiple  Protein Structural Alignment by a Step-Wise Multi-Solution Approach that Maximizes the Number of Spatially Equivalent Residues.

María Elena Ochagavía1,2 and Shoshana Wodak2. 1Center for Genetic Engineering and Biotechnology, Apartado Postal 6162. Ave. 31 e/ 158 y 190, Cubanacán, La Habana 10600, Cuba and 2Service de Conformation de Macromolecules Biologiques et Bioinformatique, Av. F.D. Roosevelt 50, P2- CP 160/16, B-1050 Brussels, Belgium.

ocha@cigb.edu.cu

 

MALECON is a new combinatorial procedure for multiple structural alignments, yielding several alternative solutions. In comparison to other methods, it produces improved definitions of the common structural core in structurally diverse proteins, and if the proteins are too diverse, distinct cores are automatically derived for different protein subsets.

Long Abstract

 

 

97A. Homology Modelling of the AdoMetDC Domain of the Bi-functional Ornithine-Decarboxylase / S-adenosylmethionine Decarboxylase Enzyme from Plasmodium falciparum.

G. Wells, F. Joubert, LM. Birkholtz and A.I. Louw. Department of Biochemistry, University of Pretoria.

gordon@tuks.co.za

 

The two regulatory activities of polyamine biosynthesis (ornithine decarboxylase / S-adenosylmethionine decarboxylase) are usually present in separate proteins. However in Plasmodium falciparum both activities occur as part of a bi-functional enzyme. The AdoMetDC domain has been modelled using modeller 6v1 based on the human crystal structure.

Long Abstract

 

 

98A. An Analysis of Protein Domain Linkers: their Classification and Role in Protein Folding.

Richard A. George and Jaap Heringa. National Institute for Medical Research, The Ridgeway, London, NW7 3RY, UK.

rgeorge@nimr.mrc.ac.uk

 

Recent advances in protein engineering have come from creating multi-functional chimeric proteins containing modules from various proteins. These modules are typically joined via an oligopeptide linker. Here we analyse the properties of naturally occurring inter-domain linkers with the aim to design linkers for domain fusion. A database of linkers is available via the Internet at http://mathbio.nimr.mrc.ac.uk

Long Abstract

 

 

99A. Chemical Shift Threading – A Direct Approach to Determining Protein Structure from Chemical Shift Data.

Haiyan Zhang, Albert Leung and David Wishart. Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Canada.

hzhang@redpoll.pharmacy.ualberta.ca

 

A method for the rapid determination of protein structures that uses only chemical shift data is described. This approach extends the concept of sequence threading and comparative model building to the realm of NMR spectroscopy. The program (known as THRIFTY) is available as a web-based server at http://redpoll.pharmacy.ualberta.ca.

Long Abstract

 

 

100A. Hidden Markov Models for Protein Recurrent Core Packing Arrangements.

Xin Yuan and Christopher Bystroff. Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180-3590, USA.

yuanx2@rpi.edu

 

Recurrent non-local packing arrangements in proteins (hydrophobic cores) can be modeled using modified Hidden Markov Models (HMM) with self-avoiding state pathways, state pair emissions and multiple-emission states. Using a simulated annealing approach, state-state connectivities are defined so that the states have three-dimensional meaning.

Long Abstract

 

 

101A. ConSurf: A Server for the Identification of Functional Regions in Proteins by Surface-Mapping of Phylogenetic Information -

Fabian Glaser1, Tal Pupko2, Inbal Paz1, Eric Martz3 and Nir Ben-Tal1.1Department of Biochemistry, George S.Wise Faculty of Life Sciences, Tel Aviv University, Israel, 2The Institute of Statistical Mathematics, Minami-Azabu, Minato-ku, Tokyo, Japan and 3Department of Microbiology, University of Massachusetts, Amherst MA, USA. http://consurf.tau.ac.il.

fabian@ashtoret.tau.ac.il

 

ConSurf is a web server for the identification of functional regions in proteins of known 3D-structure. It uses advanced phylogenetic algorithms to estimate the evolutionary rate of each amino acid site; functional regions are usually comprised of slow evolving residues. ConSurf is available at http://consurf.tau.ac.il

Long Abstract

 

 

102A. Incorporating Sequence and Biochemical Information in Topological Models of Protien Structure Towards The Structural and Functional Genomics

Mallika Veeramalai and David Gilbert. Bioinformatics Research Centre, Department of Computing Science, Glasgow Univeristy.

mallika@brc.dcs.gla.ac.uk

 

Significant algorithm development for TOPS Database to enhancing topological protein models with sequence (Structure-based annotated sequence) and important biochemical features such as ligand binding site, active site and that will lead to structure-sequence-function relationships. Interesting results would be valuable information to predict protein structure and function from sequence, as these problems remain key challenges of direct relevance to projects in structural and functional genomics.

Long Abstract

 

 

103A. Incorporation of Biochemical Knowledge in Geometric Hashing: a Statistical Assessment.

Jiménez-Lozano N1, Rodríguez A2, Chagoyen M1, Pascual-Montano A1, Carazo JM1 and Trelles O2. 1Unidad de Biocomputación, Centro Nacional de Biotecnología-CSIC and 2Departamento de Arquitectura de Computadores, Universidad de Málaga.

natalia@cnb.uam.es

 

Geometric Hashing is a structural comparison algorithm based only in geometrical criteria. Our work is based on the improvement of this method through the introduction of three environmental parameters: area buried, polar fraction and local secondary structure. Our objective is to reduce the number of similarities lacking reliable structural meaning.

Long Abstract

 

 

 

Data Visualization.

 

104A. Homograph: A Genome-Wide Protein Homology Visualizer. 39

105A. Visalization Techniques for Genomic Data. 39

106A. A Fast Algorithm for Visualizing and Analyzing Protein-Protein Interactions. 39

107A. A Partitioned Approach to Protein Interaction Mapping. 39

108A. XdomView: A Graphical Tool for Protein Domain and Exon Position Visualization. 39

109A. GoSurfer: A visualized Tool to Utilize Gene Ontology in Comparative Gene Analysis. 40

110A. Pattern Matching NMR Metabolic Profiling Data. 40

111A. WebGen-Net: A System for Support of Genetic Network Construction. 40

112A. A Scoring Algorithm for Ontology Information Extraction. 40

113A. No poster. 40

114A. Web-Based Biological Discovery using an Integrated Database. 40

 

104A. Homograph: A Genome-Wide Protein Homology Visualizer.

Cei Abreu-Goodger and Enrique Merino. Instituto de Biotecnologia, Universidad Autonoma Nacional de Mexico, Av. Universidad 2001, Cuernavaca, Morelos, 62210. Mexico.

cei@ibt.unam.mx

 

Homograph is an X-windows graphic interface for visualizing genome-wide protein homology. A dot-plot is used to represent every pair of proteins that pass a certain similarity threshold. The dots can be selected and colored by user determined categories, by searching the gene descriptions, or by a similarity score. Homograph is available at http://www.ibt.unam.mx/paginas/cei/homograph.html.

Long Abstract

 

 

105A. Visalization Techniques for Genomic Data.

Ann E. Loraine and Gregg A. Helt. Affymetrix, Inc.

ann_loraine@affymetrix.com

 

The high frequency of alternative splicing in human genes requires specialized visualization tools that reveal how variations in transcript structure affect the encoded proteins. Techniques for visualizing alternative splicing are presented, including semantic zooming, visual encoding of translation frame, and display of protein domains in the context of genomic sequence.

Long Abstract

 

 

106A. A Fast Algorithm for Visualizing and Analyzing Protein-Protein Interactions.

Byong-Hyon Ju, Byungku Park, Kyungsook Han and Jong H. Park. Department of Computer Science and Engineering, Inha University, Inchon 402-751, South Korea.

khan@inha.ac.kr

 

We have developed a new algorithm for visualizing large-scale protein-protein interactions, and implemented it in a program called InterViewer. InterViewer provides an integrated framework for querying databases and directly visualizes the query results. InterViewer is an order of magnitude faster than other force-directed programs, yet generates aesthetically pleasing drawings.

Long Abstract

 

 

107A. A Partitioned Approach to Protein Interaction Mapping.

Yanga Byun, Euna Jeong and Kyungsook Han. Department of Computer Science and Engineering, Inha University.

khan@inha.ac.kr

 

A common problem with many graph-drawing programs is that they become very slow when dealing with large-scale graphs such as protein interaction networks. We propose a new algorithm for efficiently visualizing large-scale protein interaction networks. It partitions nodes into three groups based on their interaction characteristics. An implementation of the algorithm is available at http://wilab.inha.ac.kr/protein.

Long Abstract

 

 

108A. XdomView: A Graphical Tool for Protein Domain and Exon Position Visualization.

Gopalan Vivek1, Tin Wee Tan1 and Shoba Ranganathan1,2. 1Department of Biochemistry and 2Department of Biological Sciences, National University of Singapore, Singapore 119260.

vivek@bic.nus.edu.sg

 

XdomView is a web-based graphical tool that maps protein structural domains and intron positions in eukaryotic homologues to the tertiary structure of a given protein. Since it visualizes the association of sequence signals to 3D structure in XdomView provides a valuable visualization environment for scientists working on eukaryotic gene organization, gene evolution, protein folding and protein structure classification. XdomView is available http://surya.bic.nus.edu.sg/xdom.

Long Abstract

 

 

109A. GoSurfer: A visualized Tool to Utilize Gene Ontology in Comparative Gene Analysis

Sheng Zhong1, Ovidiu Lipan1, Kai-Florian Storch3, Charles J. Weitz3, Wing H. Wong1,2. 1 Department of Biostatistics, Harvard School of Public Health, 655 Huntington Avenue, Boston, MA, 02115, USA., 2 Department of Statistics, Harvard University and 3 Department of Neurobiology, Harvard Medical School

szhong@hsph.harvard.edu

 

GoSurfer software uses Gene Ontology (GO) structured vocabulary to perform comparative gene analysis. GoSurfer visualizes gene ontology information as a tree, with nodes and branches representing GO terms and paths. Different sets of genes can be mapped onto the tree with different colors. GoSurfer is available at http://biosun1.harvard.edu/~szhong/GoSurfer.htm.

Long Abstract

 

 

110A. Pattern Matching NMR Metabolic Profiling Data.

Robert Stones, Adrian Charlton, Paul Brereton and Sarah Oehlschlager. Central Science Laboratory, Sand Hutton YO41 1LZ UK.

r.stones@csl.gov.uk

 

Metabolomics provides a powerful new tool for acquiring insight into functional biology. Snapshots of the levels of abundant small molecules within a cell, and how those levels change under different conditions, are very complementary to gene expression and proteomic studies.  We are currently developing computer tools for acquisition of NMR metabolic profiling data, and utilising computational approaches to analyse this type of data.

Long Abstract

 

 

111A. WebGen-Net: A System for Support of Genetic Network Construction.

Mikio Yoshida1, Yukari Shibagaki1, Hideaki Shimano1, Mariko Shima1, Tatsuo Kitahashi1, Yasutaro Fujita2 and Takashi Ito3. 1INTEC Web and Genome Informatics Corporation, Tokyo, Japan, 2Faculty of Engineering, Fukuyama University, Hiroshima Japan and 3Cancer Research Institute, Kanazawa University, Ishikawa, Japan.

yoshida@gic.intec.co.jp

 

WebGen-Net is a system for supporting construction of genetic networks. This system provides a graphical user interface to allow its users to interactively reconstruct genetic networks via referring biological relations collected from public databases and experimental results. A prototype system of WebGen-Net is freely available from http://genome.c.kanazawa-u.ac.jp/webgen.

Long Abstract

 

 

112A. A Scoring Algorithm for Ontology Information Extraction.

David Outteridge. Department of Pharmacology University of Colorado Health Sciences Center.

david.outteridge@uchsc.edu

 

Associating genes with ontology entries enables a reversed association from entries to genes. Extracting subsets of interesting entries, each describing many genes, is achieved by scoring. These scores are mapped to visual effects (coloured graphs) for clear identification of interesting entries.

Long Abstract

 

 

113A. No poster.

 

114A. Web-Based Biological Discovery using an Integrated Database.

D.F. Pinney, the Allgenes.org Development Group, the EPConDB Development Group, the Plasmodium Genome Database Collaborative and C.J. Stoeckert. Computational Biology and Informatics Laboratory, University of Pennsylvania, Philadelphia, Pennsylvania.

pinney@pcbi.upenn.edu

 

Allgenes.org, PlasmoDB, and EPConDB are web-based discovery tools relying on a single platform, GUS, which warehouses and integrates biological data from heterogeneous sources. Allgenes.org and PlasmoDB provide access to data for the human, mouse and Plasmodium falciparum genomes, respectively. EPConDB provides access to data for genes expressed in endocrine pancreas.

Long Abstract

 

 

 

Phylogeny and Evolution.

 

115A. The Relative Importance of Segmental and Tandem Duplications in Gene Family Evolution in Arabidopsis thaliana. 41

116A. Prediction of DNA-protein Interaction Domains for Transcription Factors using an Evolutionary Filtering Technique. 41

117A. Intra-genomic Comparison of Plant Genomes. 41

118A. Nucleotide Bias Affects Amino Acid Composition in Angiosperms. 41

119A. Aspartyl Proteases in Human and Model Organisms. 42

120A. RTKdb: Database of Receptor Tyrosine Kinase. 42

121A. Identification of Human-Mouse Orthologs at Evolutionary Conserved Locations from Pairwise Genome Comparison. 42

122A. Determining Factors for the Distribution of Simple (AC)n Microsatellites in the Rat Genome. 42

123A. Use LumberJack to Create and Compare a Forest of Phylogenetic Trees. 42

 

115A. The Relative Importance of Segmental and Tandem Duplications in Gene Family Evolution in Arabidopsis thaliana.

Steven B. Cannon, Andrew Baumgarten, Georgiana May and Nevin D. Young. University of Minnesota, USA.

cann0010@tc.umn.edu

 

We describe software to determine which genes in Arabidopsis thaliana have arisen through large segmental or local tandem duplications. We find that contributions made by these two processes differ greatly among gene families. We discuss the possible biological significance of these differences in gene family evolution. http://www.tc.umn.edu/~cann0010.

Long Abstract

 

 

116A. Prediction of DNA-protein Interaction Domains for Transcription Factors using an Evolutionary Filtering Technique.

Li Jia, Michael Clegg and Tao Jiang. Department of Computer Science, Department of Botany and Plant Science, University of California, Riverside, CA 92521.

lijia@cs.ucr.edu

 

R2R3-AtMYB is one of the largest transcription factor gene families in Arabidopsis. Using inferred ancestral sequences we have found that several lineages in the R2R3-AtMYB phylogeny were subjected to excess nonsynonymous substitutions which show the evidence of positive selection episodes.

Long Abstract

 

 

117A. Intra-genomic Comparison of Plant Genomes.

Aoife McLysaght, Steve Hampson, Brandon Gaut and Pierre Baldi. Department of Ecology and Evolutionary Biology, Department of Information and Computer Science Institute for Genomics and Bioinformatics, University of California, Irvine.

amclysag@uci.edu

 

LineUp is a heuristic algorithm designed to tackle the computationally intensive problem of identifying collinear regions within or between complex genomes. The method makes allowances for map error in the genome, and for the existence of multiple paralogues. LineUp was applied to the maize genome and results are shown.

Long Abstract

 

 

118A. Nucleotide Bias Affects Amino Acid Composition in Angiosperms.

Huai-chun Wang, Greg Singer and Donal Hickey. Department of Biology, University of Ottawa.

dhickey@uottawa.ca

 

We compared the amino acid composition of homologous protein sequences between rice and Arabidopsis and found that amino acid substitution pattern is predictable from the overall differences in G+C content between these two genomes. We also found corresponding, predictable differences in synonymous codon usage between the two genomes. The results demonstrate that changes in nucleotide composition have significant effects on the protein evolution pattern.

Long Abstract

 

 

119A. Aspartyl Proteases in Human and Model Organisms.

Alla M. Karnovsky and Cara L. Ruble. Pharmacia Corporation.

alla.karnovsky@pharmacia.com

 

Aspartyl proteases are a widely distributed and diverse protein family involved in a variety of cellular and biochemical processes ranging from digestion to cleavage of amyloid precursor protein. We used blast and profile HMMs to identify aspartyl proteases in human, worm, fly, and other model organisms and inferred the intron-exon structure of the aspartyl protease genes in C. elegans, D. melanogaster and H. sapiens. We use protein homology and splicing pattern to investigate the evolution of aspartyl proteases.

Long Abstract

 

 

120A. RTKdb: Database of Receptor Tyrosine Kinase.

Julien Grassot1, Guy Perrière2 and Guy Mouchiroud1. 1Centre de Génétique Moléculaire et Cellulaire, UMR CNRS 5534, Université Claude Bernard – Lyon 1, 43 bd. du 11 Novembre 1918, 69622 Villeurbanne Cedex, France and 2Laboratoire de Biométrie et Biologie Évolutive UMR CNRS 5558, Université Claude Bernard – Lyon 1, 43 bd. du 11 Novembre 1918, 69622 Villeurbanne Cedex, France.

grassot@biomserv.univ-lyon1.fr

 

Collecting RTK sequences would provide a good starting point as a new model for comparative and evolutionary studies applying to multigene families. In this context, we are developing the Tyrosine Kinase Receptors database (RTKdb), which is the only database on these proteins currently available, and can be accessed at http://pbil.univ-lyon1.fr.

Long Abstract

 

 

121A. Identification of Human-Mouse Orthologs at Evolutionary Conserved Locations from Pairwise Genome Comparison.

Fu Lu, Xiangqun Holly Zheng, Zhenyuan Wang, Zhong Fei, Jian Wang, Eric Zheng, Aaron Halpern, Vivien Bonazzi and Richard Mural. Celera Genomics, 45 W. Gude Dr. Rockville, MD 20850 USA.

fu.lu@celera.com

 

Comparative gene mapping provides an invaluable way to understand human biology and diseases. Here we describe an algorithm to identify mouse human orthologs that are at evolutionary conserved locations by pairwise genome comparison. A total of about 23000 pairs of orthologous genes were identified by this novel approach.

Long Abstract

 

 

122A. Determining Factors for the Distribution of Simple (AC)n Microsatellites in the Rat Genome.

Chin-Fu Chen1, Michael I. Jensen-Seaman2, Michael A. Thomas1, Jian Lu1, Simon N. Twigger1 and Peter J. Tonellato1. 1Bioinformatics Research Center and 2Human and Molecular Genetics Center Medical College of Wisconsin, Milwaukee, WI 53226, USA.

cfchen@mcw.edu

 

The distribution of rat (AC)n microsatellites DNA is bell-shaped. The heterozygosity of (AC)n repeats is positively correlated with the number of repeat, and higher repeat number of (AC)n associates with higher GC content of the surrounding sequences. There are significant differences in the amount of heterozygosity among chromosomes.

Long Abstract

 

 

123A. Use LumberJack to Create and Compare a Forest of Phylogenetic Trees.

Carolyn J. Lawrence1, R. Kelly Dawe1,2, and Russell L. Malmberg1. Departments of 1Plant Biology and 2Genetics, University of Georgia, Athens, GA USA.

carolyn@dogwood.botany.uga.edu

 

The ML heuristic search algorithms currently available are computationally impractical for large datasets (especially those consisting of protein sequences). We are developing a ML heuristic search tool called LumberJack that progressively jackknifes an alignment to generate multiple NJ trees, and then compares them based upon likelihood scores.

Long Abstract

 

 

 

Data Mining.

 

124A. Genomewide Analysis of Bkm Sequences (GATA repeats): Predominant Association with Sex Chromosomes and Potential Role in Higher Order Chromatin Organization and Function. 43

125A. Hierarchical Machine Learning for Characterising Protein Families. 43

126A. In Silico Comparison of the Transcriptome Derived from Purified Normal Breast Cells and Breast Tumor Cell Lines Reveals Candidate Upregulated Genes in Breast Tumor Cells. 43

127A. Extraction and Dynamic View of Biomolecular Interactions in Large Biomedical Text Database. 43

128A. Mining the literature for enzyme-disease associations. 44

129A. Search for Gene Regulatory cis-Elements in Arabidopsis thaliana. 44

130A. Semantic Similarity Measures Across the Gene Ontology: Relating Sequence to Annotation. 44

131A. Patterns, Pairings and Predictions of Catalytic DNA. 44

132A. GIMS a Data Warehouse for Management and Analysis of Complex Biological Data. 44

133A. A Simple Statistical Test for Evaluating Differences between Database Retrieval Methods. 45

134A. Proteome Databases: An Information Source for Bacterial Immunology. 45

135A. RED: a web-based system for the analysis, management, and dissemination of expressed sequence tags. 45

136A. Searching Microarray Time Series Data for Yeast Cell-Cycle Regulatory Genes. 45

137A. Application of Relational Database Tools for the Analysis of Large Proteomic Data Sets from Tandem Mass Spectrometry. 45

138A. Hierarchical Cluster Analysis and Classification of SAGE data. 46

139A. GEA:  a Toolkit for Gene Expression Analysis. 46

140A. A Method for Detecting Protein-Protein Interaction Rules. 46

141A. G-language Genome Analysis Environment. 46

142A. Data Handling for Detailed Phenotypic Characterization of Novel Mouse Phenotypes. 46

143A. Willo and Wisp: Data Management Systems for Mouse Genome Mapping and Sequencing. 47

144A. Novel Opportunities and Challenges in the Human Proteome: A Bioinformatics Strategy to Identify Splice Variants of Druggable Gene Targets. 47

145A. DrugBank: An Integrated Database for Drug Discovery and Pharmacogenomics. 47

146A. The Genomics Unified Schema (GUS). 47

147A. Compensation for Nucleotide Bias in a Genome by Representation as a Discrete Channel with Noise. 48

148A. Integrating Eukaryotic Genomes by Orthologous Groups: What is Unique about Apicomplexan Parasites?  48

149A. The CyberCell Database (CCDB). 48

150A. Functional Database System of Olfactory Receptors. 48

151A. A Standard Corpus for Evaluating Extraction of Molecular Interaction Pathway Information from Scientific Abstracts. 48

152A. A First Study of the Central Role of the Analyst in the Knowledge Discovery Process in Biology. 49

153A. Assessing the Compactness and Isolation of Individual Clusters Observed in Microarray Data. 49

154A. An Amino Acid Centered Database to Facilitate Protein Crystallisation. 49

155A. In silico reconstruction of metabolic network from unannotated raw genome sequences  49

156A. AFLP® Nucleotide Sequence Quality Assessment and Improvement Tool. 49

157A(i). Schema Mapping and Data Integration with Clio. 50

157A(ii). The GENIA Corpus: an Annotated Corpus in Molecular Biology Domain. 50

 

124A. Genomewide Analysis of Bkm Sequences (GATA repeats): Predominant Association with Sex Chromosomes and Potential Role in Higher Order Chromatin Organization and Function.

Subbaya Subramanian, R.K. Mishra and L. Singh. Centre for Cellular and Molecular Biology, W413 CCMB, Uppal Road, Hyderabad, Andra Pradesh, 500007, India.

subree@gene.ccmbindia.org

 

Genomewide analysis of GATA repeats revealed that GATA repeats are absent in prokaryotes and have been gradually accumulated in higher organisms during the course of evolution. In humans, the Y chromosome has the highest GATA repeat density, which is predominantly present in the Yq pericentric region. GATA repeats along the Y-chromosome and their close proximity to Matrix Associated Regions (GATA-MAR) may be demarking chromatin domains.

Long Abstract

 

 

125A. Hierarchical Machine Learning for Characterising Protein Families.

Aik Choon Tan and David Gilbert. Bioinformatics Research Centre, Department of Computer Science, University of Glasgow, Glasgow, U.K.

actan@brc.dcs.gla.ac.uk

 

The aim of this research is to construct a novel approach to induce comprehensive patterns from various data sources using knowledge discovery and hierarchical machine learning approach. We have applied this technique to characterise several protein families and our classifiers show higher accuracy and are more informative compared to the conventional methods.

Long Abstract

 

 

126A. In silico Comparison of the Transcriptome Derived from Purified Normal Breast Cells and Breast Tumor Cell Lines Reveals Candidate Upregulated Genes in Breast Tumor Cells.

Leerkes MR, Caballero OL, Mackay A, Torloni H, O'Hare MJ, Simpson AJ, and de Souza SJ. Ludwig Institute for Cancer Research, Rua Prof. Antonio Prudente, 109, 4 andar, Sao Paulo, SP, CEP 01509-010, Brazil.

leerkes@compbio.ludwig.org.br

 

We report here the combined use of ORESTES sequences generated in the FAPESP/LICR Human Cancer Genome Project and information available in the UniGene and SAGE databases to characterize the transcriptome of normal and breast tumor cells. We have identified 154 genes as candidates for overexpression in breast tumor cells.

Long Abstract

 

 

127A. Extraction and Dynamic View of Biomolecular Interactions in a Large Biomedical Text Database.

Yoshihiro Ohta1 and Shigeo Ihara2. 1Hitachi Central Research Laboratory and 2Research Center for Advanced Science and Technology, University of Tokyo.

yoh@crl.hitachi.co.jp

 

We constructed a biomolecular interaction detection system which is practical to handle the recent massive increase in literature on molecular biology. We comprehensively considered every needed elements, large-scale dictionary construction, biomolecular name detection, interaction detection and effective user-interface of network viewer. Our system can extract over 550,000 interactions with these elements.

Long Abstract

 

 

128A. Mining the literature for enzyme-disease associations.

Hofmann O. and Schomburg D. Department of Biochemistry, University of Cologne, Germany.

o.hofmann@smail.uni-koeln.de

 

A network of enzyme and disease correlations was built by automatically extracting relevant information from the abstracts of biomedical literature. The concept-based data and implemented visualization techniques allow easy navigation by researchers to explore knowledge available in literature databases and develop new theories.

Long Abstract

 

 

129A. Search for Gene Regulatory cis-Elements in Arabidopsis thaliana.

Judith Lucia Gomez, Ingo Dreyer and Bernd Mueller-Roeber. University of Potsdam, Institute for Biochemistry and Biology, Dept. Molecular Biology, Karl-Liebknechtstrasse 24/25, Haus 20, 14476 Golm, Germany.

jgomez@rz.uni-potsdam.de

 

The regulation of gene expression in plants is thought to result from the binding of different sets of transcription factors to promoter cis-elements. We tested HMM based methods to search for target genes in the model plant Arabidopsis thaliana, harbouring putative binding sites for transcription factors in their promoter regions.

Long Abstract

 

 

130A. Semantic Similarity Measures Across the Gene Ontology: Relating Sequence to Annotation.

P.W. Lord, R.D. Stevens, A. Brass and C.A.Goble. Dept. of Computer Science, Manchester University.

p.lord@russet.org.uk

 

The Gene Ontology (GO) represents knowledge of a gene product's function, process and location in a computationally amenable form. We present metrics for measuring the similarity between GO terms, and therefore semantic similarity of gene products annotated with them. We validate these metrics by comparing them with measures of sequence similarity, and show several uses for the measure.

Long Abstract

 

 

131A. Patterns, Pairings and Predictions of Catalytic DNA.

Gopinath Ganji 1,2, Yingfu Li 3, T. Chiang 2 and A. Jamie Cuticchia 1. 1 Department of Medical Biophysics, University of Toronto, 610 University Avenue, Toronto, Ontario, CANADA M5G 2M9, 2Center for Computational Biology, Hospital for Sick Children, 555 University Avenue, Toronto, Ontario, CANADA M5G 1X8, 3Department of Biochemistry and Department of Chemistry, McMaster University, 1200 Main St. W., Hamilton, Ontario, CANADA L8N 3Z5.

gopi.ganji@utoronto.ca

 

We hypothesize catalytic nucleic acids containing characteristic structural/functional sequence features can be probabilistically modeled and experimentally verified. By employing pattern discovery algorithms, structure prediction tools and machine learning methods, we have attempted to characterize various classes of SELEX-generated 'DNA kinases' (self-phosphorylating DNA) that recruit specific divalent metal cations and NTPs.

Long Abstract

 

 

132A. GIMS: a Data Warehouse for Management and Analysis of Complex Biological Data.

Michael Cornell, Paul Kirby, Cornelia Hedeler and Norman W Paton. Dept of Computer Science, Kilburn Building, University Of Manchester, M13 9PL.

mcornell@cs.man.ac.uk

 

GIMS is an object database that integrates genome sequence data with functional data (transcriptome, metabolome, metabolic pathway, proteome and protein-protein interactions) in a single data warehouse. GIMS can be browsed or analysed using canned queries. GIMS can be queried remotely using a Java application that can be downloaded from www.cs.man.ac.uk/~norm/gims.

Long Abstract

 

 

133A. A Simple Statistical Test for Evaluating Differences between Database Retrieval Methods.

John L Spouge and Eva Czabarka. National Center for Biotechnology Information, National Institutes of Health, Bethesda MD USA.

spouge@ncbi.nlm.nih.gov

 

One key problem in designing intelligent systems for molecular biology is to determine which of two database retrieval methods is better. We give a simple statistical test based on z-scores to calculate the significance of differences in ROC[n] scores and apply the method to assess putative improvements to PSI-BLAST.

Long Abstract

 

 

134A. Proteome Databases: An Information Source for Bacterial Immunology.

Klaus-Peter Pleissner1, Till Eifert2, Frank Schmidt1, Stefan H.E. Kaufmann1 and Peter R. Jungblut1. 1Max Planck Institute for Infection Biology, 2Algorithmus GmbH.

pleissner@mpiib-berlin.mpg.de

 

A collection of proteome databases which comprises 2-D gel proteins, Isotope Coded Affinity Tag (ICAT) and functional classification databases for Mycobacterium tuberculosis and Helicobacter pylori is presented. Information about genes, proteins and metabolic pathways serves as an information source for bacterial immunology. http://www.mpiib-berlin.mpg.de/2D-PAGE.

Long Abstract

 

 

135A. RED: a Web-based System for the Analysis, Management, and Dissemination of Expressed Sequence Tags.

Everitt R.#, Minnema S.E.#, Koster C.S., Olson R.A., Wride M.A. and Rancourt D.E. Department of Biochemistry and Molecular Biology, University of Calgary, Alberta, Canada.

#These authors contributed equally to this work.

seminnem@ucalgary.ca, reveritt@ucalgary.ca

 

The Rancourt EST Database (RED) is a web-based system for the analysis, management, and dissemination of expressed sequence tags (ESTs). RED represents a flexible template DNA sequence database that can be easily manipulated to suit the needs of other labs undertaking mid-size sequencing projects. Source code for RED and the associated tools is available from reveritt@ucalgary.ca. RED is publicly accessible via www.ucalgary.ca/~rancourt.

Long Abstract

 

 

136A. Searching Microarray Time Series Data for Yeast Cell-Cycle Regulatory Genes.

Holger Hoos, Andrew Kwon and Raymond Ng. Department of Computer Science, University of British Columbia.

tjkwon@cs.ubc.ca

 

We propose a new method for analyzing microarray time series data. We apply the method on yeast cell-cycle time series data to find potential regulatory pairs.The results indicate that our algorithm is able to find different true positive pairs from correlation and edge detection method by Filkov et al.

Long Abstract

 

 

137A. Application of Relational Database Tools for the Analysis of Large Proteomic Data Sets from Tandem Mass Spectrometry.

Ioannis K. Moutsatsos, Yongchang Qiu, Rod Hewick, Joseph Wooters, Steve Howes, Gary Van Domselaar and Patrick Cody. Wyeth Research Inc. 35 Cambridgepark Drive, Cambridge, MA02140, USA.

gvandomselaar@Wyeth.com

 

TurboSEQUEST is a search engine used for protein prediction from MS/MS spectra of protein digests. We have developed a custom application, SequestOnOracle, that extends TurboSEQUEST with the data management and analysis tools of a relational database. SequestOnOracle’s unique capabilities derive from its ability to summarize and compare the protein and peptide content from multiple TurboSEQUEST searches.

Long Abstract

 

 

138A. Hierarchical Cluster Analysis and Classification of SAGE data.

Raymond T. Ng, Jorg Sander, Monica C. Sleumer and Man Saint Yuen. University of British Columbia.

myuen@cs.ubc.ca

 

Under the assumption that although cells can look morphologically similar they may behave very differently at a molecular level, we present method for clustering and classifying SAGE libraries to detect the similarities and differences between various tissue types and neoplastic states.

Long Abstract

 

 

139A. GEA:  a Toolkit for Gene Expression Analysis.

Jessica M. Phan, Raymond Ng and Steve Jones. University of British Columbia.

myuen@cs.ubc.ca

 

We demonstrate the toolkit for Gene Expression Analyzer (GEA) used particularly with high dimensional data such as SAGE. GEA provides a graphical interface with operations for clustering, comparing and contrasting gene expressions in different SAGE clusters. GEA would eventually be linked to various bioinformatics databases for integrated genomic analysis.

Long Abstract

 

 

140A. A Method for Detecting Protein-Protein Interaction Rules.

Takuya Oyama1,4, Kagehiko Kitano1,4, Kenji Satou 2,4 and Takashi Ito3,4. 1INTEC Web and Genome Informatics Corporation, 2School of Knowledge Science, Japan Advanced Institute of Science and Technology, 3Cancer Research Institute, Kanazawa University and 4Institute for Bioinformatics Research and Development (BIRD), Japan Science and Technology Corporation (JST).

oyama@isl.intec.co.jp

 

We studied a method that can discover rules related to protein-protein interactions from accumulated protein-protein interaction data using data mining. The method reveals the relation between the features of mutually interacting proteins like that the protein having the feature F1 interacts with the protein having the feature F2.

Long Abstract

 

 

141A. G-language Genome Analysis Environment.

Kazuharu Arakawa1,2, Koya Mori11,3 and Masaru Tomita1,2. 1 Institute for Advanced Biosciences, Keio University, 2 Department of Environmental Information and 3Graduate School of Media and Governance.

gaou@g-language.org

 

G-language Genome Analysis Environment (G-language GAE) is a generic software package aimed for higher efficiency in bioinformatics analysis. G-language GAE has an interface as a set of Perl libraries for software development, and a graphical user interface for easy manipulation. It is distributed freely under GPL at http://www.g-language.org/.

Long Abstract

 

 

142A. Data Handling for Detailed Phenotypic Characterization of Novel Mouse Phenotypes.

E. C. J. Green1, J. Airey1, R. Cox1, Y. Hashim1, T. Hough1, Z. Lalanne1, K. E. Logan1, P.Nolan1, L.Visor1, A-M. Mallon1, P. Jones1, R. Selley1, A. Blake1, S. Greenaway1, H. J. Kirkbride1, J. Hunter2 and S. D. M. Brown1. 1Mouse Genome Center and Mammalian Genetics Unit, MRC, Harwell, Oxfordshire, OX11 0RD, UK and 2GlaxoSmithKline, New Frontiers Science Park, Harlow, CM19 5AW, UK.

e.green@har.mrc.ac.uk

 

A system is described for the management of data produced from the characterization of novel phenotypes, observed from a large scale ENU mutagenesis programme. A diversity of data is being produced from sources such as microarray technology, in situ hybridization studies, animal husbandry, candidate gene identification, DHPLC and sequencing.

Long Abstract

 

 

143A. Willo and Wisp: Data Management Systems for Mouse Genome Mapping and Sequencing.

M. Simon, S. Greenaway, A-M. Mallon, R. Selley, P. Jones, Z. Tymowska-Lalanne, S. Breeds, S. Smythe, H. Kirkbride, S. Webb, A. Blake, J. Weekes, E. Green, E. Mollison, P. Denny, P. Nolan, M. Goldsworthy, M. Strivens and S.D.M. Brown. Medical Research Council, Harwell, Oxon, Ox11 0RD, England.

m.simon@har.mrc.ac.uk

 

A vital element of high-throughput genetics is to capture the data generated from experimental procedures and to integrate and disseminate these results. Two data management systems have been developed to capture this data at the point of generation - Wisp and Willo. These capture data specifically generated from sequencing and genotyping.

Long Abstract

 

 

144A. Novel Opportunities and Challenges in the Human Proteome: A Bioinformatics Strategy to Identify Splice Variants of Druggable Gene Targets.

Chandra Ramanathan1, Shuba Gopal2, Bob Bruccoleri1, John Feder1, Gabe Mintier1 and Terry Gaasterland2. 1Bristol-Myers Squibb and 2The Rockefeller University.

Chandra.Ramanathan@bms.com

 

Identification, verification and biological characterization of splice variants are challenging tasks but essential to understand the observed biological complexity in humans. A systematic bioinformatics methods is being developed to mine the human genomic and EST data for identifying splice variant forms of druggable gene targets and correlate these variants with disease/tissue expression information available in various proprietary databases.

Long Abstract

 

 

145A. DrugBank: An Integrated Database for Drug Discovery and Pharmacogenomics.

Kavoos Basmenji, Zhan Chang, Bahram Habibi-Nazhad and David Wishart. Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, AB, T6G 2N8.

zchang@ualberta.ca

 

DrugBank is a web-enabled database developed to facilitate drug discovery and drug analysis. It combines drug information with drug target information to allow users the possibility of linking small molecule data with protein sequence/structure data. DrugBank can be accessed freely at http://redpoll.pharmacy.ualberta.ca/~zchang/cgi-bin/welcome.cgi.

 

Long Abstract

 

 

146A. The Genomics Unified Schema (GUS).

V. Babenko, B. Brunk, J.Crabtree, S. Diskin, S. Fischer, G. Grant, Y. Kondrahkin, L.Li, J. Liu, J. Mazzarelli, D. Pinney, A. Pizarro, E. Manduchi, S. McWeeney, J. Schug and C. Stoeckert. Center for Bioinformatics, University of Pennsylvania.

stevef@pcbi.upenn.edu

 

GUS is a comprehensive strongly typed relational schema and object-based software platform for integration, analysis, curation and presentation of sequence based genomics information. It has been used to model and/or mine human, mouse, plasmodium and the pancreas, and is suitable for model organisms in general. It is freely available.

Long Abstract

 

 

147A. Compensation for Nucleotide Bias in a Genome by Representation as a Discrete Channel with Noise.

Mark Schreiber1,2 and Chris Brown1. 1AgResearch NZ, PO Box 50034, Dunedin, New Zealand and 2Dept of Biochemistry, University of Otago, PO Box 56 Dunedin, New Zealand.

mark.schreiber@agresearch.co.nz

 

Calculation of the information content of motifs in genomes highly biased in nucleotide composition leads to overestimates of the amount of useful information in the motif. By treating a biased genome as a discrete channel with noise, in accordance with Shannon Information Theory, we were able to remove both ‘Distortion’ and ‘Noise’ from the motif and recover a more instructive biological ‘signal'.

Long Abstract

 

 

148A. Integrating Eukaryotic Genomes by Orthologous Groups: What is Unique about Apicomplexan Parasites?

Li Li, Brian Brunk, Christian J. Stoeckert Jr and David S. Roos. Department of Biology, University of Pennsylvania, Philadelphia, USA and Center for Bioinformatics, University of Pennsylvania, Philadelphia, USA.

lili4@sas.upenn.edu

 

To integrate eukaryotic sequence data with information on biological process we sought to identify orthologous groups by combining sequence similarity comparisons with graph clustering algorithms. Queries based on user-defined species distribution provide a snapshot of shared/diversified processe, facilitating (for example) the identification of targets for broad-spectrum antibiotics targeting apicomplexan parasites.

Long Abstract

 

 

149A. The CyberCell Database (CCDB).

Bahram Habibi-Nazhad, Melania Ruaini, Kavoos Basmenji and David S. Wishart. Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton AB T6G 2N8, Canada.

bahram@redpoll.pharmacy.ualberta.ca

 

The CyberCell Database (CCDB) is a web-enabled, user-friendly database containing previously published and electronically archived information on nearly every aspect of E. coli molecular biology and enzymology. We have also constructed CC3D which contains E. coli structural proteomic data and CCMD which contains the chemical database of metabolites and other small molecules used to support metabolic analysis.

Long Abstract

 

 

150A. Functional Database System of Olfactory Receptors.

Kazunori Miyazaki and Satoshi Itoh. Advanced Materials and Devices Laboratory, Corporate Research and Development Center, TOSHIBA CORPORATION.

kazun.miyazaki@toshiba.co.jp

 

We have developed a Java/XML-based functional database system of olfactory receptors (OR) from databases which can be accessed via Internet. The feature of our system is analyzing the XML data for OR by using predictive tools on the Web, and then accumulating annotated data in the analyzed one semi-automatically.

Long Abstract

 

 

151A. A Standard Corpus for Evaluating Extraction of Molecular Interaction Pathway Information from Scientific Abstracts.

Soon Heng Tan and See-Kiong Ng. Laboratories for Information Technology, Singapore.

soonheng@lit.org.sg

 

Vast amounts of molecular interaction pathway information can be extracted automatically from MEDLINE's abstracts using natural language processing, but progress has been hindered by a lack of a standard corpus for evaluation. We describe a test corpus we have created from our Pathweaver project that is suitable for such evaluation.

Long Abstract

 

 

152A. A First Study of the Central Role of the Analyst in the Knowledge Discovery Process in Biology.

Sandy Maumus1,2, Amedeo Napoli2, Rafik Taouil2 and Sophie Visvikis1. 1INSERM U525, Université Henri Poincaré (Nancy 1) – Faculté de Pharmacie, 30 rue Lionnois, 54000 Nancy, France and 2LORIA – UMR 7503, B.P. 239, 54506 Vandoeuvre-Lès-Nancy, France.

sandy.maumus@nancy.inserm.fr

 

Based on an application of symbolic data mining methods on a test database, we underline the role played by the analyst in the knowledge discovery process. Encouraged by positive results, we plan to apply these methods on a large database for investigating the relationships between gene polymorphisms and cardiovascular diseases intermediate phenotypes.

Long Abstract

 

 

153A. Assessing the Compactness and Isolation of Individual Clusters Observed in Microarray Data.

Per-Olof Fjallstrom. Affibody.

perfj@affibody.com

 

The ”clusters” returned by standard clustering methods applied to microarray data are not necessarily biologically relevant. We present a method for assessing if such clusters are unusually compact and isolated. The method has been successfully applied to several microarray data sets. It does not require estimates of the variance of experimental error.

Long Abstract

 

 

154A. An Amino Acid Centered Database to Facilitate Protein Crystallisation.

K. MacLeod and E. Westwick. Astex TechnologyLtd.

e.westwick@astex-technology.com

 

An amino acid centered relational database has been designed to store sequences of P450 proteins that have been engineered in order to optimise crystallisation behavior. Amino acids are stored as individual entities, allowing the physical and chemical properties of the residues to be correlated with experimental outcome, using SQL queries.

Long Abstract

 

 

155A. In silico reconstruction of metabolic network from unannotated raw genome sequences

Jibin Sun and An-Ping Zeng. Microbial Systems and Genome Analysis, GBF.

AZE@GBF.de

 

A method is proposed to in silico reconstruct metabolic network directly from unannotated genome sequences. A comparison of data from different sequencing stages (3.9 vs. 7.9 time coverage) for one

organism revealed that a 3.9 time coverage of the genome is sufficient (with 99.3% identity) for reconstructing the metabolic network.

Long Abstract

 

 

156A. AFLP® Nucleotide Sequence Quality Assessment and Improvement Tool.

Antoine Janssen1, Jan van Oeveren1, Pieter Vos1, Gert Vriend2, Roland Siezen2, Rene van Schaik3 and Jack Leunissen2. 1Keygene N.V., Wageningen, The Netherlands and 2Center for Molecular and Biomolecular Informatics, University of Nijmegen, Nijmegen, The Netherlands and 3Organon, Oss, The Netherlands.

antoine.janssen@keygene.com

 

The Keygene/CMBI AFLP® quality assessment and improvement tool is a web based application that automates quality assessment and visualization of (cDNA-)AFLP® data. It improves proprietary data by use of public data. The analysis includes coverage / redundancy calculation, internal contig building, full length discovery and potential SNP discovery. http://www.cmbi.nl/kg_bin/dataset_annotator.pl.

Long Abstract

 

 

157A(i). Schema Mapping and Data Integration with Clio.

Barbara Eckman, Mauricio Hernández, Howard Ho, Felix Naumann and Lucian Popa. IBM.

felix@us.ibm.com

 

Bioinformatics data sources typically have large, complex structures, reflecting the richness of the scientific concepts they model. Clio is an information integration tool the helps users define mappings between disparate schemas, thus providing an integrated view of all related data sources and enabling data transformations between the sources.

Long Abstract

 

 

157A(ii). The GENIA Corpus: an Annotated Corpus in Molecular Biology Domain.

Tomoko Ohta1, Yuka Tateisi2, Jin-Dong Kim2 and Jun-ichi Tsujii1,2. 1Univ. of Tokyo and 2CREST, JST.

okap@is.s.u-tokyo.ac.jp

 

We are developing the necessary resources including domain ontology and annotated corpus from MEDLINE abstracts. We have already annotated 2,500 abstracts with 31 different semantic classes. Part-of-speech annotation to the same set of abstracts annotated for named entities is under way using Penn Treebank set. In this poster, we report on the current status of our corpus.

Long Abstract

 

 

Genome Annotation.

 

158A. An Integrated Approach to High-Throughput Genetic Sequence Analysis. 50

159A. Annotation of Potential Transcription Factor Binding Sites in Orthologous, Paralogous and Pseudogenes by Statistical Analysis Comparing Them with the Sites with Known SNP's-Disease Associations. 50

160A. The Cellular Immune System as a Gene Prediction Resource. 51

161A. A High Throughput System for Mining EST and cDNA Databases. 51

162A. GenDB is an Open Source Framework for Genome Annotation. 51

163A. Biomax PEDANT™ Human Genome Database - Automatic and Manual Functional Annotation of the Human Genome. 51

164A. Expert-system based annotation strategies using GenDB, an open source genome annotation system. 51

165A. Analysis of the Replication Origin of the Middle-Sized Linear Plasmid pSCL2 of Streptomyces clavuligerus. 52

166A. Enhanced Functional Annotation by the EBI Sequence Database Group. 52

167A. An Automated Prediction System for Gene Functions Combined with RiceGAAS (Rice Genome Automated Annotation System). 52

168A. Global Open Biology Ontologies. 52

169A. Exon Structure Analysis, Ortholog Identification, and SNP Candidate Screening by Mapping RIKEN Mouse cDNA Clones to Multiple Genome Assemblies. 52

170A. TESS-II: Describing and Finding Gene Regulatory Sequences with Grammars. 53

171A. An Extensible Gene-centric Architecture for Querying across Multiple Databases Using J2EE. 53

172A. Top-Down EST Clustering Using the Draft Human Genome Map. 53

173A. Analysis of the del(13)SVEA36H Region on Mouse Chromosome13. 53

174A. Construction of a Genome-Wide, Fine-Grained Human-Mouse Synteni Map by Identifying Conserved Stretches in the Translations of Both Genomes. 54

175A. No poster. 54

176A. Sequence and Structural Integration within the InterPro, Proteome Analysis and SWISS-PROT Databases. 54

177A. eProteome: a Resource for Proteome Annotation. 54

178A. PyFACT: A Tool for Function Assignment and Classification to a Sequence using Dictionary-Based Approach. 54

179A. The EnsEMBL Annotation Process. 54

180A. CGP: A Tool for the Selection of Candidate Disease Genes. 55

181A. Literature and its Referents: Analyzing PubMed Citations Across PFAM. 55

182A. No poster. 55

183A. Identification of Potentially Functional Reverse Transcriptase Sequences in the Human Genome. 55

184A. Poxvirus Orthologous Clusters (POCs) Software Package. 55

185A. Selection of SNPs for a Genome-Wide Linkage Disequilibrium Mapping Set. 56

186A. Semantic Integration in the Mouse Genome Informatics System. 56

 

158A. An Integrated Approach to High-Throughput Genetic Sequence Analysis.

Gong-Xin Yu, E. Marland, A. Rodriguez and N. Maltsev. Argonne National Lab., 9700 S. Cass Ave., Argonne, IL, 60439, USA.

gxyu@mcs.anl.gov

 

We report a system that consists of a scalable pipeline for parallel sequence analysis; a rule-based knowledge base; a voting algorithm for functional assignments and a web browser. The knowledge base is the core of the system, which enables users to resolve conflicts among different computational tools, enhance confidence, and avoid over-interpretation.

Long Abstract

 

 

159A. Annotation of Potential Transcription Factor Binding Sites in Orthologous, Paralogous and Pseudogenes by Statistical Analysis Comparing Them with the Sites with Known SNP's-Disease Associations.

Julia V. Ponomarenko, Galina V. Orlova, Tatyana I. Merkulova, Elena V. Gorshkova, Oleg N. Fokin, Gennady V. Vasiliev, Anatoly S. Frolov and Mikhail P. Ponomarenko. Institute of Cytology and Genetics, 10 Lavrentyev Ave., Novosibirsk, 630090, Russia.

jpon@bionet.nsc.ru

 

A database-tools system, rSNP_Guide, http://wwwmgs.bionet.nsc.ru/mgs/systems/rsnp/, analyzes SNPs in regulatory gene regions. Based on seventeen transcription factor (TF) binding sites with disease-associated mutations, we have localized 148 potential TF sites at orthologous, paralogous and pseudogenes. Statistical significance of the strength of each potential site was estimated as "presence", "absence", or "weakness".

Long Abstract

 

 

160A. The Cellular Immune System as a Gene Prediction Resource.

Gila Lithwick, Yael Altuvia and Hanah Margalit. The Hebrew University, Jerusalem.

gilal@md.huji.ac.il

 

We carried out a comprehensive search comparing peptides eluted from major histocompatibility complex (MHC) molecules to human sequence data. Our findings illustrate how these peptides are informative for the identification of new genes, for hypothetical gene verification, for verifying gene expression at the protein level and for supporting splice junctions.

Long Abstract

 

 

161A. A High Throughput System for Mining EST and cDNA Databases.

Jonathan Segal and Hui Huang. Genome Therapeutics Corporation.

jsegal@genomecorp.com

 

We present a high-throughput system for mining EST and cDNA databases (dbEST, gb_new_est, etc.) to find possible extensions for known cDNAs and to find previously unmapped genes over a region of interest. Efficient use of our computational cluster lets us search for extensions for thousands of genes in several hours.

Long Abstract

 

 

162A. GenDB is an Open Source Framework for Genome Annotation.

Folker Meyer, Daniela Bartels, Thomas Bekel, Alexander Goesmann, Burkhard Linke, Alice McHardy and Oliver Rupp. Center for Genome Research, Bielefeld University.

fm@Genetik.Uni-Bielefeld.DE

 

Centered around a relational database management system, the GenDB system provides the software infrastructure for genome annotation projects. URL: http://GenDB.Genetik.Uni-Bielefeld.DE.

Long Abstract

 

 

163A. Biomax PEDANT™ Human Genome Database - Automatic and Manual Functional Annotation of the Human Genome.

Matthias Fellenberg1, Erik Kimpel1, Andreas Fritz1, Oliver Heinrich1, Victor Solovyev2, Michael Firsov3, and Christine M.E. Schüller1. 1Biomax Informatics AG, Lochhamer Straße 11, 82152 Martinsried, Germany, 2Softberry Inc., 116 Radio Circle, Suite 400, Mount Kisco, NY 10549, USA and 3Petrogen Ltd., Engels Pr., 128/A, #4-H, 194356 St. Petersburg, Russia.

matthias.fellenberg@biomax.de

 

The Pedant-Pro™ Sequence Analysis Suite from Biomax was used to perform systematic analysis for in-depth functional and structural characterization of the predicted human proteome resulting in the PEDANT™ Human Genome Database. An expert annotation team continues to refine the analysis data by verifying automatically predicted features, including literature references.

Long Abstract

 

 

164A. Expert-system based annotation strategies using GenDB, an open source genome annotation system.

Alice McHardy, Jan Kleinluetzum and Folker Meyer. Center for Genome Research, Bielefeld University

alice@genetik.uni-bielefeld.de

 

Simulating the decision process of a human expert, rule-based annotation strategies for interpretation of functional evidence can be formulated to automate the annotation process. Using GenDB (http://gendb.genetik.uni-bielefeld.de), an open source genome annotation system relying on a relational database backend, different rule-based strategies allowing (partial) automation of the annotation process are implemented.

Long Abstract

 

 

165A. Analysis of the Replication Origin of the Middle-Sized Linear Plasmid pSCL2 of Streptomyces clavuligerus.

Wei Wu and Kenneth L. Roy. Dept. of Biol. Sci., University of Alberta, Edmonton, Canada.

wwu@ualberta.ca

 

The putative replication origin region of the linear plasmid pSCL2 of Streptomyces clavuligerus has been sequenced and analyzed. Two ORFs, encoding RepC1 and RepC2, downstream of the origin of replication were highly homologous to RepL1 and RepL2 of pSLA2-L of Streptomyces rochei. IS116 was found further downstream of the repC genes.

Long Abstract

 

 

166A. Enhanced Functional Annotation by the EBI Sequence Database Group.

Manuela Pruess, Rolf Apweiler, Ernst Kretschmann and Michele Magrane. European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

mpr@ebi.ac.uk

 

At the EBI, reliable gene and protein databases, indispensable tools for computational analysis and data mining, are developed and maintained. High quality data annotation comprises both manual and automated methods, the latter not being a substitute for manual annotation, but providing strong support. All databases are publicly available at http://www.ebi.ac.uk/.

Long Abstract

 

 

167A. An Automated Prediction System for Gene Functions Combined with RiceGAAS (Rice Genome Automated Annotation System).

Hiroyuki Watanabe1, Yuji Shimizu1, Katsumi Sakata1, Hiroshi Ikawa1, Yoshiaki Nagamura2, Takashi Matsumoto2 and Kenichi Higo2. 1Tsukuba Division, Mitsubishi Space Software Co., Ltd., Tsukuba, Japan and 2National Institute of Agrobiological Sciences, Tsukuba, Japan.

hiroyuki@tkb.mss.co.jp

 

A new annotation program was developed to automatically predict the rice gene functions using homology searches for the integration with RiceGAAS (Rice Genome Automated Annotation System, http://RiceGaas.dna.affrc.go.jp/). The results of prediction were compared with those of manual annotation described in GenBank flat files, and shown at http://alnilam.mi.mss.co.jp/rgadb/.

Long Abstract

 

 

168A. Global Open Biology Ontologies.

Midori Harris. EMBL-EBI.

midori@ebi.ac.uk

 

The Gene Ontology (GO) Consortium supports the development of bio-ontologies describing domains not covered by the three GO vocabularies. Criteria for inclusion and links to existing and planned ontologies are available on the Global Open Biology Ontologies (GOBO) web site: http://www.geneontology.org/doc/gobo.html.

Long Abstract

 

 

169A. Exon Structure Analysis, Ortholog Identification, and SNP Candidate Screening by Mapping RIKEN Mouse cDNA Clones to Multiple Genome Assemblies.

Serge Batalov 1 and Colin F. Fletcher 2. 1 Computational Biology Department and 2 Mouse Genetics Program, Genomics Institute of the Novartis Research Foundation (GNF), San Diego, CA 92121, USA.

batalov@gnf.org

 

Mapping to Celera and public genome assemblies (sequenced from different strains) allows one to determine exon structure, identify alternatively spliced forms and localize the correct human ortholog for functional annotation. Intronless genes can be flagged for determination of retransposition events, pseudogenes, or genomic contamination. High quality sequence discrepancies can lead to SNP identification.

Long Abstract

 

 

170A. TESS-II: Describing and Finding Gene Regulatory Sequences with Grammars.

Jonathan Schug and Christian J. Stoeckert, Jr. Center for Bioinformatics at the University of Pennsylvania.

jschug@pcbi.upenn.edu

 

We present a grammar formalism and a parser to find gene regulatory sites in genomic sequence and annotation from DAS-compliant genome resources. The grammar can match against both sequence and existing annotation and makes it easy to express complex and flexible relationships between binding sites in a very concise form.

Long Abstract

 

 

171A. An Extensible Gene-centric Architecture for Querying across Multiple Databases Using J2EE.

David Block, Serge Batalov and Hilmar Lapp. The Genomics Institute of the Novartis Research Foundation (GNF), San Diego, California.

dblock@gnf.org, batalov@gnf.org and lapp@gnf.org

 

Current genomic databases are sequence-centric, while much of present and future biology is concerned with genes: their functions, roles, modifications, and expression. A gene-centric database architecture is described, along with an implementation using J2EE. A virtual “Platonic” set of genes is created using synonymous, homologous and syntenic relationships.

Long Abstract

 

 

172A. Top-Down EST Clustering Using the Draft Human Genome Map.

Namshin Kim1, Seokmin Shin1 and Sanghyuk Lee2.1School of Chemistry, Seoul National University and 2Division of Molecular Life Science, Ewha Womans University.

deepreds@hanmail.net

 

A new EST clustering algorithm utilizing the draft genome map is developed. Human ESTs are mapped onto the UCSC assembly of human genome (so-called the goldenpath) using the BLAT program, and their alignments are clustered in top-down fashion. The resulting clusters are compared with the UniGene and TIGR Gene Indices.

Long Abstract

 

 

173A. Analysis of the del(13)SVEA36H Region on Mouse Chromosome13.

AM Mallon1, J Weekes1, P Denny1, MRM Botcherby2, P Gautier4, H Hummerich5, S Cross4, V van Heyningen4, R Edgar4, N Leaves2, J Greystrong2, L Greenham2, S Jones2, K Maggott2, S Manjunath2, E Russell2, G Strachan2, M Strivens1, P North2, E Boal2, V Cobley.2, G Hunter2, G Kimberley2, L Cave-Berry2, L Mathews3, S Simms3, S Gregory3, R Evans3. T Hubbard3, R Durbin3, M Cadman1, R Mc Keone1, A Southwell1, C Sellick1, M Iravani5, S White4, P Little6, I Jackson4, J Rogers3, RD Campbell2 and SDM Brown1. 1MRC UK Mouse Genome Centre and Mammalian Genetics Unit, Harwell, Oxfordshire, OX11 0RD, UK, 2MRC UK-HGMP Resource Centre, Hinxton Genome Campus, Cambridge CB10 1SB, UK, 3Sanger Centre, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK, 4MRC Human Genetics Unit, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK, 5Imperial College, Exhibition Road, South Kensington, London SW7 2AZ, UK and 6School of Biochemistry and Molecular Genetics, University of New South Wales, Sydney 2052, Australia.

a.mallon@har.mrc.ac.uk

 

A regional and functional approach has been adopted by the MRC UK mouse-sequencing programme, which will improve the efficiency of mutation scanning and the identification of genes underlying mutations of interest. Detailed annotation of 14Mb of mouse finished sequence from the Del36H region on mouse chromosome 13 will be described.

Long Abstract

 

 

174A. Construction of a Genome-Wide, Fine-Grained Human-Mouse Synteny Map by Identifying Conserved Stretches in the Translations of Both Genomes.

Adrian Bruengger and John Markus. Novartis Pharma AG., WKL-125.13.58, Basel, BS4053, Switzerland.

adrian.bruengger@pharma.novartis.com

 

Using distributed computing, we have identified all those conserved segments in the translations of both genomes, whose ungapped alignments contain a short identical stretch and a score exceeding a certain threshold. The obtained dataset allows the rapid identification of orthologs and homologs and visualizes as a genome-wide fine-grained synteny map.

Long Abstract

 

 

175A. No poster.

 

176A. Sequence and Structural Integration within the InterPro, Proteome Analysis and SWISS-PROT Databases.

Virginie Mittard, Rolf Apweiler, Daniel Barrell, Ujjwal Das, Wolfgang Fleischmann, Alexander Kanapin, Paul Kersey, Evgenia Kriventseva, Phil McNeil, Nicola Mulder and Florence Servant. EMBL Outstation - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

virginie@ebi.ac.uk

 

The challenge for the next decade is to integrate the sequence and structural data and to provide structural and functional annotations for both protein families and entire genome sequences. This project will be applied within the InterPro, Proteome analysis and SWISS-PROT databases.

Long Abstract

 

 

177A. eProteome: a Resource for Proteome Annotation.

Serge Saxonov, Peter Tan and Douglas L. Brutlag. Stanford Biochemistry, 251 Campus Drive X 215Stanford, CA94305, USA

saxonov@smi.stanford.edu

 

eProteomes is a database of motif hits in proteins from the 72 publicly sequenced genomes as well as other sequence collections. The motifs were generated from the Blocks+ database and were represented as both regular expressions and PSSMs. eProteomes can be accessed through a powerful web-based interface at http://fold.stanford.edu/proteome.

Long Abstract

 

 

178A. PyFACT: A Tool for Function Assignment and Classification to a Sequence using Dictionary-Based Approach.

Jee-Hyub Kim1, Sung-Ho Goh1, Cheol-Goo Hur1 and Doil Choi2. 1National Center for Genome Information, KRIBB and 2Plant Diversity Research Center, KRIBB.

hurlee@mail.kribb.re.kr

 

PyFACT was developed to fit with the need of function assignment and classification of ESTs, and it is a dictionary-based approach encoded in Python language. PyFACT provide useful graphic and stats as well as function assignment using MIPS and GO function code. The additional dictionaries will make it more valuable.

Long Abstract

 

 

179A. The EnsEMBL Annotation Process.

M. Clamp1, D. Barker1, E. Birney2, G. Cameron2, Y. Chen2, L. Clarke1, G.Coates1, T. Cox1, J. Cuff1, V. Curwen1, T. Cutts1, T. Down1, R. Durbin1, E. Eyras1, J. Gilbert1, M. Hammond2, A. Kasprzyk2, D. Keefe2, S.Keenan1, H. Lehväslaiho2, C.Melsopp2, E. Mongin2, R. Pettett1, S. Potter1, A. Rust2, E. Schmidt2, S.Searle1, G. Slater2, J. Smith1, W. Spooner1, A. Stabenau2, J. Stalker1, A. Ureta-Vidal2, I. Vastrik2, T. Hubbard1. 1Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambs, CB10 1SA, UK and 2European Bioinformatics Institute (EMBL-EBI), Genome Campus, Hinxton Cambs, CB10 1SA, UK.

lec@sanger.ac.uk

 

The EnsEMBL Genome annotation process consists of four main stages: 'raw compute', gene build, protein analysis and comparative analysis. These processes are run across a variety of genomes from Human and Mouse to Worm and Mosquito. The data produced is then displayed on the web at www.ensembl.org.

Long Abstract

 

 

180A. CGP: A Tool for the Selection of Candidate Disease Genes.

Damian Smedley, Janet Kelso, Soraya Bardien-Kruger, Johann Visagie, Winston Hide and Mark McCarthy. 1Imperial College School of Medicine, Du Cane Road, London, UK and 2South African National Bioinformatics Institute, Cape Town, South Africa.

d.smedley@ic.ac.uk

 

The Candidate Gene Profiler (CGP) allows researchers to select candidate human disease genes based on their genomic location, expression pattern and protein function. The tool utilizes a novel expression ontology covering all the tissues represented in dbEST (VocabProbe).

Long Abstract

 

 

181A. Literature and its Referents: Analyzing PubMed Citations Across PFAM.

Richard K. Belew1, Robert Finn2 and Alex Bateman2. 1Computer Science & Engr. Dept., Univ. California - San Diego and 2Wellcome Trust Sanger Institute.

rik@cs.ucsd.edu

 

In this poster we consider a more gross analysis of reference pattern rather than the text of individual articles. This analysis is motivated by the early bibliometric analyses of citation patterns across thescientific literature, and more recent linkage analyses of WWW pages. Considering a corpus of approximately 600,000 TREMBL/SwissPROT protein entries, the number of references made toparticular articles follows the ubiquitous Zipfian distribution. We also performed a correlational analysis of the frequency-rank of a particular document's references, as a function of how many different PFAMfamilies contain a protein mentioning this reference. As expected, the approximately 80 most frequent, "generic" publications are indeed scattered across the most PFAM families.

Long Abstract

 

 

182A. No poster.

 

183A. Identification of Potentially Functional Reverse Transcriptase Sequences in the Human Genome.

E.F Donaldson, D.W. Lee, A.R. Juntunen and M.A. McClure. Montana State University, Department of Microbiology and Center for Computational Biology.

donaldso@parvati.msu.montana.edu

 

The Genome Parsing Suite provides prototype software to identify Reverse Transcriptase sequences in a genome, filtering hits to retain probable homologues, and scoring these sequences by evaluating the ordered-series-of-motifs indicative of Reverse Transcriptase. Flanking regions for each homologue are then analyzed to determine the genomic content of the Retroid Agent.

Long Abstract

 

 

184A. Poxvirus Orthologous Clusters (POCs) Software Package.

Angelika Ehlers, Stephanie Slack, Rachel L. Roper and Chris Upton. Department of Biochemistry and Microbiology, University of Victoria.

cupton@uvic.ca

 

POCs is a JAVA client-server application that accesses a database containing all poxvirus genomes; it automatically groups genes into families. POCs has a user-friendly interface permitting complex SQL queries to retrieve groups of DNA/protein sequences and gene families for use with a variety of integrated tools. Access at http://www.poxvirus.org.

Long Abstract

 

 

185A. Selection of SNPs for a Genome-Wide Linkage Disequilibrium Mapping Set.

Charles R. Scafe, Hadar Avi-Itzhak, Ryan T. Koehler, Marion Laig-Webster, Yu Wang, Eugene G. Spier, and Francisco M. De La Vega. Applied Biosystems, Foster City, CA, USA.

scafecr@appliedbiosystems.com

 

We describe a triage strategy for selecting SNPs for a genome-wide Linkage Disequilbrium mapping assay set. SNPs with multiple independent observations of the minor allele are put through assay design QC and positional selection steps. This procedure allows efficient construction of >150,000 validated assays for LD mapping.

Long Abstract

 

 

186A. Semantic Integration in the Mouse Genome Informatics System.

JA Blake, M Ringwald, J.E. Richardson, D.P. Hill, J.A. Kadin, H.J. Drabkin, J. Beal, C. Smith, C.M. Lutz, C. Bult, L.E. Corbani, A. Planchart, T.F. Hayamizu and J.T. Eppig. Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME USA.

jblake@informatics.jax.org

 

The Mouse Genome Informatics system (www.informatics.jax.org) depends upon strict attention to object identity and semantic integrity to support an extensive and fully integrated genetics and genomics resource. Incorporation of the Gene Ontologies, the Mouse Anatomical Dictionary and the development of the MGI Phenotype classifications empower data exploration and system interoperability.

Long Abstract

 

 

 

Sequence Comparison.

 

187A. PatternHunter: Faster and More Sensitive Homology Search. 56

188A. Discovery of Biological Sequence Motifs using a Stochastic Dictionary Model. 56

189A. Comparative Sequence Analysis of Non-Coding DNA in Orthologous Gene Loci. 56

190A. A Bioinformatic Pipeline for In-Silico High-Throughput Discovery of Single Nucleotide Polymorphisms. 57

191A. Maximum Score with Group Selection Method for BLAST post-processing. 57

192A. combAlign: A Protein Sequence Alignment Algorithm Considering Recombinations. 57

193A. Classifying Alignment Significance with Support Vector Machines. 57

194A. A Clustering Algorithm for Testing Interval Graphs on Noisy Data. 57

195A. Peptide Sequencing using Natural Abundance Isotope Information and de novo Spectral Analysis. 58

196A. Sequence Variation in the C-terminal Merozoite Surface Protein-1 Gene of Plasmodium falciparum and Epitope-Specific Human Antibody Response. 58

197A. Clustal-G: ClustalW Analysis Using Grid and Parallel Computing. 58

198A. The Distance Function for Computing the Continuous Distance of Biopolymer Sequences. 58

199A. BLAST Merge/Split Modules for BLAST Accelerator. 58

200A. Biological Significance of Jumping Alignments. 59

201A. Augmenting Physical Maps with Sequence. 59

202A. Towards a Vaccine for Scabies. 59

203A. Distributed BLAST System Based on Web: GOST BLAST. 59

204A. A Database on Alternative Splice Forms. 59

205A. The Enhanced Suffix Array and Its Applications to Genome Analysis. 60

206A. Automated Generation of Heuristics for Biological Sequence Comparison. 60

 

187A. PatternHunter: Faster and More Sensitive Homology Search.

Bin Ma, John Tromp and Ming Li. Bioinformatics Solutions Inc., 145 Columbia Street West, Waterloo, Ontario, Canada, N2L 3L2.

mli@bioinformaticssolutions.com

 

Genomics and proteomics studies routinely depend on homology searches based on the strategy of finding short seed matches which are then extended. The exploding genomic data growth presents a dilemma for DNA homology search techniques: increasing seed size decreases sensitivity whereas decreasing seed size slows down computation.

Long Abstract

 

 

188A. Discovery of Biological Sequence Motifs using a Stochastic Dictionary Model.

Mayetri Gupta and Jun S. Liu. Dept. of Statistics, 1 Oxford St., Harvard University, Cambridge, MA 02138, U.S.A.

gupta@stat.harvard.edu

 

We present a novel method for detecting conserved sequence motifs using a stochastic dictionary model. An MCMC strategy is devised with recursive techniques for increased efficiency. Our approach can find multiple motifs of unknown widths, and with insertions and deletions. Polynucleotide repeat traps are tackled without the necessity of masking.

Long Abstract

 

 

189A. Comparative Sequence Analysis of Non-Coding DNA in Orthologous Gene Loci.

Christoph Dieterich, Brian Cusack, Haiyan Wang, Katja Rateitschak, Antje Krause and Martin Vingron. Max-Planck-Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany.

christoph.dieterich@molgen.mpg.de

 

Non-coding DNA segments that are conserved between human and mouse genomic sequences are good indicators of regulatory sequences. We use a systematic approach to detect conserved elements in non-coding regions of orthologous gene pairs. Our results will be made available via the Distributed Annotation System of the ENSEMBL consortium.

Long Abstract

 

 

190A. A Bioinformatic Pipeline for In-Silico High-Throughput Discovery of Single Nucleotide Polymorphisms.

Dipinder S. Keer Mentored by: Dr. Marie-Michèle Cordonnier-Pratt1, Dr. Mark Huber2, Mr. Manish Shah1, Dr. Chun Liang1, Mr. Robert Sullivan1, Mrs. Aynsley Eastman1 and Dr. Lee Pratt1. 1Department of Botany and 2Department of Management Information Systems, University of Georgia, Athens, GA 30602, USA.

dskeer@uga.edu

 

We present a bioinformatic pipeline for in-silico high-throughput discovery of Single Nucleotide Polymorphisms in Expressed Sequence Tags. SNP detection was carried out using PolyPhred. The SNP discovery process attempts to eliminate deficiencies inherent to PolyPhred while integrating SNP discovery using PolyPhred with the EST generation pipeline already in place.

Long Abstract

 

 

191A. Maximum Score with Group Selection Method for BLAST post-processing.

Yuri Kapustin1, Vyacheslav Chetvernin2 and Tatiana Tatusova2. 1Informax Inc. and 2NCBI.

kapustin@ncbi.nlm.nih.gov

 

Maximum Score with Groups Selection is a method for BLAST post-processing which filters out noise and ambiguities producing the best reconcilable alignment combinations. The method is based on the greedy approach and a specialized group selection technique.

Long Abstract

 

 

192A. combAlign: A Protein Sequence Alignment Algorithm Considering Recombinations.

Katja Wegner, Stephan Jansen, Stefan Wuchty and Ursula Kummer. European Media Laboratory GmbH, Schloss-Wolfsbrunnenweg 33, D-69118 Heidelberg, Germany.

wegner@eml.villa-bosch.de

 

The algorithm, combAlign, aligns pairs of protein sequences regarding point mutations and recombinations. combAlign generates lists of local alignments which are subsequently mapped to a graph. The path providing maximal score denotes the best attainable combAlignment. Compared to existing algorithms sequences arranged in a recombinative manner are aligned significantly better.

Long Abstract

 

 

193A. Classifying Alignment Significance with Support Vector Machines.

Lars Arvestad1, Alexander Schliep2 and Olaf Wendisch3.

1Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden, 2Department Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany and 3ZAIK, University of Cologne, Germany.

schliep@molgen.mpg.de

 

A simple method for recognizing sequence homology using Support Vector Machines has been investigated. By utilizing features such as sequence composition in addition to alignment score, pairwise sequence comparisons compare favorably with methods that use information from multiple intermediate sequences.

Long Abstract

 

 

194A. A Clustering Algorithm for Testing Interval Graphs on Noisy Data.

Wen-Lian Hsu1, Kuen-Pin Wu1and Wei-Fu Lu2. 1Institute of Information Science, Academia Sinica, Taipei, Taiwan, ROC and 2Institute of Computer and Information Science, National Chiao Tung University, Hsin-chu, Taiwan, ROC.

hsu@iis.sinica.edu.tw

 

An important problem in DNA sequence analysis is to reassemble the clone fragments to determine the structure of the entire molecule. An error-free version of this problem can be modeled as an interval graph recognition problem. However, lab data is almost never flawless. We present a clustering algorithm to treat data containing errors, which can accommodate some probabilistic assumptions about the overlapping relationships.

Long Abstract

 

 

195A. Peptide Sequencing using Natural Abundance Isotope Information and de novo Spectral Analysis.

William R. Cannon1 and Kenneth D. Jarman2. 1Computational Biology, Biochemistry and Biophysics and 2Applied Mathematics.

william.cannon@pnl.gov

 

We demonstrate the use of natural abundance isotopic "labels" to aid in the identification of peptides with a novel de novo algorithm. The data are from ion trap MS/MS analysis of tryptic peptides. Isotopic resolution of ion series leads to an increased confidence in the identification of the precursor peptide.

Long Abstract

 

 

196A. Sequence Variation in the C-terminal Merozoite Surface Protein-1 Gene of Plasmodium falciparum and Epitope-Specific Human Antibody Response.

Stanley Adoro, Roseangela Nwuba, Chiaka Anumudu, Yusuf Omosun and Mark Nwagwu. Cellular Parasitology Programme, Department of Zoology, University of Ibadan, Ibadan, Nigeria.

bodijahouse@skannet.com

 

Computational and immunological methods were used to analyze the influence of variations in the C-terminal merozoite surface protein (MSP)-1 gene encoding MSP1(19kDa) of isolates of Plasmodium falciparum. While most amino acid variations were located in the loop regions, the human antibody response is to more conserved beta-sheet regions of MSP1(19kDa).

Long Abstract

 

 

197A. Clustal-G: ClustalW Analysis Using Grid and Parallel Computing.

Kuo-Bin Li. Bioinformatics Institute, Singapore 117609, Republic of Singapore.

kuobin@bii-sg.org

Clustal-G is an MPI and GRID-aware implementation of ClustalW. Based on ClustalW version 1.82, Clustal-G parallelizes the pairwise, as well as the progressive alignment, codes of the original program. Any PC/workstation clusters with MPI installed shouldbe able to run Clustal-G. In addition, the computation can be performed in a computational GRID environment using GLOBUS orMPICH-G2. The software is available at http://www.bii.a-star.edu.sg/~kuobin/clustalg/.

Long Abstract

 

 

198A. The Distance Function for Computing the Continuous Distance of Biopolymer Sequences.

G.O. Hakobyan and T.V. Margaryan. Chair of Higher Mathematics, Dept. of Physics, Yerevan State University, Armenia.

gaghakob@ysu.am

In some applications of sequence comparison theories the actual items to be compared are not successions of discrete elements, but "continuous" functions. The central role here plays the distance function of two independent variables. The present paper is aimed to construct a distance function with the help of the given "distance" matrix  D. Being itself as a continuous function it keeps all the properties which has the given distance matrix.

Long Abstract

 

 

199A. BLAST Merge/Split Modules for BLAST Accelerator.

Kiejung Park, Daesang Lee and Ki-Bong Kim. Information and Technology Institute, SmallSoft Co., Ltd., Daejeon, 305-811, South Korea.

kjpark@smallsoft.co.kr

 

To overcome the calculation time problem of BLAST in genome projects, we developed the sequence merger and BLAST output splitter modules which merge a set of BLAST query sequences into a smaller number of query sequences and split BLAST results to generate the results for the initial set of query sequences, respectively.

Long Abstract

 

 

200A. Biological Significance of Jumping Alignments.

Constantin Bannert and Jens Stoye. University of Bielefeld, Faculty of Technology, Genome Informatics.

bannert@TechFak.Uni-Bielefeld.DE

 

The "jumping alignments" algorithm matches a query sequence to a protein family represented by a multiple alignment. A reference sequence may change (jump) between sites of the alignment. We present results on the structural and functional significance of these 'jumps' and ways to improve the alignment process with this information.

Long Abstract

 

 

201A. Augmenting Physical Maps with Sequence.

F. Engler, S. Blundy, S. Pearson and C. Soderlund. Arizona Genomics Computational Laboratory.

efriedr@genome.clemson.edu

 

The mapping of Tentative Consensus sequences and Gene Ontologies to FPC contigs with BSS (BLAST Some Sequence) identifies regions of function along the contigs. Furthermore, using STC libraries and low-coverage whole genome shotgun sequence allows for automatic selection of a Minimal Tiling Path with markedly less redundancy than existing methods.

Long Abstract

 

 

202A. Towards a Vaccine for Scabies.

Pearly Harumal1, Deborah Holt3, Katja Fischer2, Shelley Walton1, Bart Currie1, David Kemp2, Matt Johnson3, Peter Wilson3, Vicky Hewitt3, John Davis3, Annette McGrath3 and Elizabeth Kuczek3. 1Menzies School of Health Research 2Queensland Institute of Medical Research and 3Australian Genome Research Facility, Level 5 Gehrmann Laboratories, University of Queensland, St Lucia, QLD 4072, Australia.

annette@agrf.org.au

 

A set of 12288 sequences from normalised and unnormalised cDNA libraries made from scabies mite were subjected to clustering using PHRAP and BLASTed to public domain databases (Swissprot and GenBank). We focus on gene discovery and protein family analysis of homologues of these sequences to house dust mite antigens.

Long Abstract

 

 

203A. Distributed BLAST System Based on Web: GOST BLAST.

Wan-Seon Lee1, Pan-Gyu Kim3, Mi-Ae Yoo1,2 and Hwan-Gue Cho1,3. 1Bioinformatics and Biocomplexity Research Center, 2Molecular Biology, Pusan National University and 3School of Computer Sci. and Eng., Pusan National University, Pusan 609-735, Korea.

bioinfos@korea.com

 

GOST BLAST is a distributed BLAST system for local area Network environment. GOST BLAST consists of two parts; master server and several client servers. This system supports multiple BLAST queries (sequences and database) and can reduce the turn-around time of BLAST results linearly.

Long Abstract

 

 

204A. A Database on Alternative Splice Forms.

Heike Pospisil, Alexander Herrmann, Harald Pankow and Jens Reich. Max-Delbrueck-Center for Molecular Medicine, Robert-Roessle-Str.10, 13125 Berlin, Germany.

alexander.herrmann@mdc-berlin.de

 

The Alternative Splice Database represents splice forms of 7 different organisms from ESTs and mRNA GenBank sequence records. The algorithm defines a possible alternative splice form by comparing high-scoring ESTs to mRNA sequences using BLAST. It is available at http://www.bioinf.mdcberlin.de/splice/db/.

Long Abstract

 

 

205A. The Enhanced Suffix Array and Its Applications to Genome Analysis.

Mohamed Ibrahim Abouelhoda, Stefan Kurtz, Enno Ohlebusch and Michael Hoehl. Faculty of Technology, University of Bielefeld, PO Box 10 01 31, 33501 Bielefeld, Germany.

mibrahim@TechFak.Uni-Bielefeld.DE, kurtz@TechFak.Uni-Bielefeld.DE, enno@TechFak.Uni-Bielefeld.DE

 

We enhance the suffix arrays with additional tables to replace algorithms using suffix trees with corresponding ones over suffix arrays with the same time complexity. Our algorithms are much faster and more space-efficient. 5n (5 bytes per nucleotide) are required for detecting repeats and 6n bytes for exact pattern matching.

Long Abstract

 

 

206A. Automated Generation of Heuristics for Biological Sequence Comparison.

Guy Slater and Ewan Birney. EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.

guy@ebi.ac.uk

 

We describe a system for automated generation of sequence comparison heuristics which operate by manipulation of the underlying alignment model. This allows rapid implementation of alignment algorithms which exhibit both favourable speed and accuracy. Examples of their use are given.

Long Abstract

 

 

 

Predictive Methods.

 

207A. Specificity and Predictability of DNA Binding from Minimal Structure Information. 60

208A. Identification of –1 Programmed Ribosomal Frameshift Signals in Saccharomyces cerevisiae. 60

209A. A New Method for Identifying Large Number of Contaminated ESTs. 61

210A. Predicting Functions of Novel Protein Motifs by Mining the Knowledge-Base of Gene Ontology. 61

211A. An Efficient Method for Predicting the Membrane Spanning b-strands. 61

212A. Improving the Secondary Structure Prediction of the N-termini of a-helices Using Empirical and Evolutionary Data. 61

213A. A New Method for Iterative Multiple Sequence Alignment using Secondary Structure Prediction. 61

214A. Using Structure and Sequence Information for Predicting Transcription Factor Binding Sites. 62

215A. Computational Identification between N-Terminal Transmembrane Domains and Signal Peptides. 62

216A. SAM_T02 Protein Structure Prediction Webserver. 62

217A. Hybrid HMM and Naive Bayes Models. 62

218A. N-Myristoylation in Plants: a Computational Prediction of N-Myristoylated Protein Kinases in Arabidopsis. 62

219A. Predicting Microbial Metabolism: A Functional Group Approach. 63

220A. Species-Specific Protein Sequence and Fold Optimizations. 63

221A. Integrated primer design strategy for PCR amplification of bisulphite treated DNA. 63

222A. Target Explorer: An Automated Tool for Identification of New Target Genes for Specified Set of Transcription Factors. 63

223A. Prediction of Protein Subcellular Localization in Gram-negative Bacteria: An Updated Version of PSORT. 63

224A. ScanPromW: A Windows Program Searching for Promoter Patterns against a Genome Sequence. 64

225A. Target Prediction of Transcription Factors: Application of Structure-Based Method to Yeast Genome. 64

226A. Learning Better Motif Discrimination using Generative Models. 64

227A. How Far Can We Trust Membrane Protein Topology Predictions?. 64

228A. Detection and Classification of Sequence and Structural Patterns in DNA Using Genetic Algorithm Neural Networks. 64

229A. An Algorithm for Late-Onset Disease Gene Mapping using Partially Diagnosed Pedigrees. 65

230A. A Neural Network Approach for Studying the Relationship between Protein Sequences and Protein-Protein Interactions. 65

231A. No poster. 65

232A. PSAML: A Representation of Protein Data for Structure Comparison. 65

233A. Using Small-World Topology to Refine Networks Derived via High-Throughput Methods. 65

234A. Prediction of snoRNAs in the Human Genome. 65

235A. Computational Localization of Clusters of Transcription Factor Binding Sites in Promoter Sequences. 66

 

207A. Specificity and Predictability of DNA Binding from Minimal Structure Information.

Shandar Ahmad, M. Michael Gromiha and Akinori Sarai. RIKEN Tsukuba Institute, Tsukuba 305 0074, Japan.

shandar@rtc.riken.go.jp

 

Several one-dimensional properties of proteins have been investigated to determine their specificity towards DNA binding. Solvent accessibility of residues has been found to be most significant factor after the sequence neighbour information. No significant preference for any secondary structure type was found on the whole. Neural network has been designed to implement a prediction method based on these findings.

Long Abstract

 

 

208A. Identification of –1 Programmed Ribosomal Frameshift Signals in Saccharomyces cerevisiae.

Jonathan L. Jacobs and Jonathan D. Dinman, Ph.D. Department of Cell Biology and Molecular Genetics, University of Maryland, 2135 Microbiology Bldg., College Park, MD 20742.

jacobsjo@wam.umd.edu

 

Programmed ribosomal frameshifting (PRF) is a phenomenon usually associated with the viral biogenesis of alternatively coded. Recently it has been shown that these signals are functionally present in eukaryotic genomes. In this work, we present a bioinformatics approach for identifying putative –1 PRF sites in the genome of Saccharomyces cerevisiae.

Long Abstract

 

 

209A. A New Method for Identifying Large Number of Contaminated ESTs.

Rotem Sorek. Compugen Ltd.

rotem@compugen.co.il

 

We present a new method for identifying highly contaminated EST libraries. Using this method, we were able to identify EST libraries enriched with genomic contamination, partially spliced sequences and other types of contaminations. This allowed us to discard about 25,000 ESTs that were otherwise inferred as new splice variants.

Long Abstract

 

 

210A. Predicting Functions of Novel Protein Motifs by Mining the Knowledge-Base of Gene Ontology.

Xinghua Lu, Chengxiang Zhai, Vanathi Gopalakrishnan and Bruce G. Buchanan. Center for Biomedical Informatics, University of Pittsburgh Language Technologies Institute, Carnegie Mellon University.

xil3@pitt.edu

 

To predict the function of novel sequence patterns, a system was developed to mine the Gene Ontology knowledge-base and use association of GO terms with motifs to predict the function of patterns. The system was tested using patterns from PROSITE, and a statistical framework was developed to predict the confidence of function prediction.

Long Abstract

 

 

211A. An Efficient Method for Predicting the Membrane Spanning b-strands.

M. Michael Gromiha1, Shandar Ahmad2 and Makiko Suwa1. 1Computational Biology Research Center (CBRC), AIST, 2-41-6 Aomi, Koto-ku, Tokyo 135-0064, Japan and 2RIKEN Tsukuba Institute, 3-1-1 Koyadai, Tsukuba 305-0074, Japan.

michael-gromiha@aist.go.jp

 

A new method has been proposed to predict the membrane spanning b-strands in outer membrane proteins by the combination of “rule-based approach” and neural networks. We observed a reasonable improvement in the accuracy of prediction.

Long Abstract

 

 

212A. Improving the Secondary Structure Prediction of the N-termini of a-helices Using Empirical and Evolutionary Data.

Claire L. Wilson, Simon J. Hubbard and Andrew J. Doig. Department of Biomolecular Sciences, U.M.I.S.T.

clw@bms.umist.ac.uk

 

Current secondary structure prediction methods perform well at identifying helical locations, but often fail to correctly identify N-terminal positions. Empirically-derived free energies are used to represent residue preferences at the N-terminal positions. Analysis of neighbouring N-terminal positions reveals the true sequence is close by and often energetically more favourable.

Long Abstract

 

 

213A. A New Method for Iterative Multiple Sequence Alignment using Secondary Structure Prediction.

Simossis V.A. and Heringa J. Division of Mathematical Biology, The National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA UK.

vsimoss@nimr.mrc.ac.uk

 

We present an iterative method that integrates secondary structure prediction and multiple sequence alignment. This iterative scheme includes SymSSP; a routine that optimises predicted secondary structure information and a new optimal segmentation algorithm to yield a consensus prediction. The complete process iteratively optimises multiple alignment quality as well as secondary structure prediction and is implemented in the PRALINE method (Heringa J., 1998, 2000). All methods are available at http://mathbio.nimr.mrc.ac.uk.

Long Abstract

 

 

214A. Using Structure and Sequence Information for Predicting Transcription Factor Binding Sites.

Tommy Kaplan, Nir Friedman and Hanah Margalit. The Hebrew University, Jerusalem, Israel.

tommy@cs.huji.ac.il

 

We describe an EM-based approach that uses solved protein-DNA complexes and genomic sequences to predict the binding specificity of transcription factors, based on their DNA-binding domain. We demonstrate the potential of our method by its application to the Cys2His2 zinc-finger DNA-binding family.

Long Abstract

 

 

215A. Computational Identification between N-Terminal Transmembrane Domains and Signal Peptides.

Zheng Yuan1 and Rohan D Teasdale2. Institute for Molecular Bioscience and Special Research Centre for Function and Applied Genomics, University of Queensland, QLD 4072, Australia.

1z.yuan@imb.uq.edu.au , 2r.teasdale@imb.uq.edu.au

 

A new method is developed based on the sequence features (amino acid composition, hydrophobicity and position) of the hydrophobic regions for N-terminal transmembrane domains and signal peptides. Using Fisher's linear discriminant functions, we can well predict the two types of peptides. This method can complement current transmembrane protein prediction and signal peptide prediction methods to obtain more accurate predictions.

Long Abstract

 

 

216A. SAM_T02 Protein Structure Prediction Webserver.

Rachel Karchin, Mark Diekhans, Jonathan Casper, Spencer Tu, Richard Hughey and Kevin Karplus. University of California, Santa Cruz Computer Science and Computer Engineering Departments.

rachelk@soe.ucsc.edu

 

SAM_T02 predicts the fold and secondary structure of a target protein sequence, using multi-track hidden Markov models and neural nets trained on SAM-T2K multiple alignments. SAM_T02 is available at http://www.soe.ucsc.edu/research/compbio/HMM-apps/T02-query.html.

Long Abstract

 

 

217A. Hybrid HMM and Naive Bayes Models.

Beverly Seavey1, David Page2 and Brian Kay3. 1University of Wisconsin Dept. of Computer Sciences, 2University of Wisconsin Dept. of Computer Sciences and Dept of Biostatistics and Medical Informatics, 3Argonne National Laboratory.

seavey@cs.wisc.edu

 

Analysis of sequence data and analysis of feature data are currently disparate fields within bioinformatics, with little work in combining them. This is unfortunate because data sets with mixtures of sequence and feature data are likely to become the norm in the near future of bioinformatics. The primary purpose of the work described here is to introduce and evaluate an algorithm that is a hybrid of HMMs and the simple feature based approach of naive Bayes. We apply it to the task of predicting peptide ligand binding specificity of various Src-homology 3 (SH3) domains.

Long Abstract

 

 

218A. N-Myristoylation in Plants: a Computational Prediction of N-Myristoylated Protein Kinases in Arabidopsis.

Tobey M. Tam and Michael Gribskov. San Diego Supercomputer Center, University of California at San Diego.

ttam@sdsc.edu

 

The N-terminal N-myristoylated consensus sequence has been previously determined for animals and yeast, however, little is known in plants. We have developed a prediction program based on pairwise alignment that identifies an N-myristoylation consensus motif for plants. The program is based on a log-odds matrix scoring system.

Long Abstract

 

 

219A. Predicting Microbial Metabolism: A Functional Group Approach.

Bo Kyeng Hou, Wenjun Kang, Larry P. Wackett and Lynda B.M. Ellis. Center for Environmental Molecular Science University of Minnesota.

bkher71@dreamwiz.com

 

A pathway prediction system has been developed to predict microbial catabolism, based on the UM-BBD, which contains a broad variety of reactions of organic functional groups. The biodegradation of a query compound whose metabolism is not yet known can be predicted based on the functional groups it contains.

Long Abstract

 

 

220A. Species-Specific Protein Sequence and Fold Optimizations.

Michel Dumontier and Christopher W.V. Hogue. Samuel Lunenfeld Research Institute, Mt. Sinai Hospital, Toronto, ON M5G 1X5.

micheld@mshri.on.ca

 

An organism’s ability to adapt to its particular environmental niche is of fundamental importance to its survival and proliferation. We have identified species-specific protein sequence and structural domain optimizations, which we exploited to generate predictive scoring functions. These scoring functions performed well in their species-specific protein identification ability and may be used in future protein-engineering experiments.

Long Abstract

 

 

221A. Integrated primer design strategy for PCR amplification of bisulphite treated DNA.

Tamas Rujan, Reinhold Wasserkort and Armin O. Schmitt. Epigenomics AG, Berlin, Germany. www.epigenomics.com, rujan@epigenomics.com

 

We developed an integrated strategy for primer design for single and multiplex PCR amplification of bisulphite treated DNA. In addition to criteria usually used for primer design on genomic DNA we perform further tests, for example to avoid unwanted priming. Experiments suggest a high success rate for sPCR and mPCR.

Long Abstract

 

 

222A. Target Explorer: An Automated Tool for Identification of New Target Genes for Specified Set of Transcription Factors.

Sosinsky A., Wildonger J., Bonin K., Mann R. and Honig B. Howard Hughes Medical Institute and Department of Biochemistry and Molecular Biophysics, Columbia University, New York, USA.

as1689@columbia.edu

 

Target Explorer creates customized library of binding site matrices, searches for clusters of transcription factor binding sites and retrieves annotation for potential target genes that surround identified clusters. It was successfully applied for identification of new target genes for cooperative factors Lozenge and Pointed. Target Explorer is available at http://trantor.bioc.columbia.edu/search_for_BS.

Long Abstract

 

 

223A. Prediction of Protein Subcellular Localization in Gram-negative Bacteria: An Updated Version of PSORT.

Jennifer L. Gardy, Cory A. Spencer, Shannan J. Ho Sui and Fiona S.L. Brinkman. Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, B.C., Canada.

jlgardy@sfu.ca

 

In consultation with Kenta Nakai, the developer of PSORT, we are improving the algorithm for prediction of subcellular localization of proteins in Gram-negative bacteria. This algorithm, which currently represents the only comprehensive predictive tool for prokaryotic subcellular localization prediction, has not been updated since 1991. For further information see http://www.pathogenomics.bc.ca/brinkman/research.html.

Long Abstract

 

 

224A. ScanPromW: A Windows Program Searching for Promoter Patterns against a Genome Sequence.

Daesang Lee, Chankyu Park and Kiejung Park. Information and Technology Institute, SmallSoft Co., Ltd. and Department of Biological Science, KAIST, Daejeon, 305-701, South Korea.

dslee@bioneer.kaist.ac.kr

 

The ScanPromW program, which runs on Windows environment, allows users to search a microbial genome database for promoter consensus elements. This uses optimized consensus sequence profile through the position-specific similarity assessment and the genome scanning results are listed according to their score to the consensus profile.

Long Abstract

 

 

225A. Target Prediction of Transcription Factors: Application of Structure-Based Method to Yeast Genome.

Akinori Sarai1, Samuel Selvaraj1, Michael M. Gromiha1, Joerg-Gerald Siebers1, Ponraj Prabakaran1 and Hidetoshi Kono2. 1RIKEN Tsukuba Institute 3-1-1 Koyadai, Tsukuba 305-0074 Japan and 2University of Pennsylvania 231 South 34 St. Philadelphia PA 19104 USA.

sarai@rtc.riken.go.jp

We have developed a structure-based method for the target prediction of transcription factors. We have applied the method to the analysis of yeast genome, predicting the targets of particular transcription factors. The results suggest that the method is capable of predicting experimentally known target genes and binding sites correctly.

Long Abstract

 

 

226A. Learning Better Motif Discrimination using Generative Models.

Gal Elidan, Yoseph Barash, Tommy Kaplan and Nir Friedman. Hebrew University, Ross Bldg., Givat Ram, Jerusalem, 91904, Israel.

galel@cs.huji.ac.il

A common model for representing transcription factors binding sites is position specific score matrices. These assume independence between positions. We explore several extensions that relax this assumption. These include factorized mixture models, context specific mixture models, and Bayesian networks. We evaluate these variants on synthetic and real-life datasets.

Long Abstract

 

 

227A. How Far Can We Trust Membrane Protein Topology Predictions?

Karin Melén1, Gunnar von Heijne1 and Anders Krogh2. 1StockholmBioinformatics Center, Stockholm University, Sweden and 2Bioinformatics Centre, University of Copenhagen, Denmark.

karin@sbc.su.se

 

Methods for predicting the topology of membrane proteins usually reach an accuracy of between 60 and 75%. However, it is in general not clear how reliable a specific prediction is. We have analyzed different ways of assessing the reliabilty and we have derived reliability scores for five common methods.

Long Abstract

 

 

228A. Detection and Classification of Sequence and Structural Patterns in DNA Using Genetic Algorithm Neural Networks.

Robert G. Beiko and Robert L. Charlebois. Department of Biology, University of Ottawa.

rbeiko@science.uottawa.ca

 

We present a method to detect and classify patterns in DNA, by subdividing a sequence of DNA into smaller windows, then converting these windows into different measures of structure and sequence composition. By selecting different groups of these variables, and using them to train neural networks, we were able to identify conserved patterns in upstream regions of Escherichia coli and Sulfolobus solfataricus.

Long Abstract

 

 

229A. An Algorithm for Late-Onset Disease Gene Mapping using Partially Diagnosed Pedigrees.

Guo-Yun Yu and Christopher M. Gomez. Department of Neurology, University of Minnesota, Minneapolis, Minnesota, USA.

gyy@tc.umn.edu

 

Late-onset disease gene mapping is often a challenge because of diagnosis uncertainty. Technical advances make it possible to take a fundamentally different approach to discover such disease genes. We will present an algorithm to use partially diagnosed pedigrees to map disease genes.

Long Abstract

 

 

230A. A Neural Network Approach for Studying the Relationship between Protein Sequences and Protein-Protein Interactions.

Richard Chang and David Page. Department of Computer Sciences, University of Wisconsin-Madison.

richard.chang@abbott.com

 

The protein interactions of the SH3 domain are translated to an artificial intelligence multiple instance problem. A solution using neural networks as part of an Expectation Maximization algorithm is described. This leads to improved accuracy over a simple neural network.

Long Abstract

 

 

231A. No poster.

 

232A. PSAML: A Representation of Protein Data for Structure Comparison.

Su-Hyun Lee1, Jin-Hong Kim2, Geon-Tae Ahn2 and Myung-Joon Lee2. 1Changwon National University and 2University of Ulsan.

suhyun@sarim.changwon.ac.kr

 

We present an XML representation of protein data named PSAML, which can be used for comparing protein structures and detecting their similarities. The PSAML language is designed on the protein data model named PSA, which describes a protein structure as the secondary structures of the protein and their relationships.

Long Abstract

 

 

233A. Using Small-World Topology to Refine Networks Derived via High-Throughput Methods.

Debra S. Goldberg and Frederick P. Roth. Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School.

debg@hms.harvard.edu

 

Many biological networks are small-world networks. To exploit the small-world properties of the S. cerevisiae protein interaction network, we developed measures of local neighborhood cohesiveness around potentially interacting protein pairs. Using these measures, we are able to accurately assess the reliability of protein-protein interactions observed in error-prone yeast two-hybrid studies.

Long Abstract

 

 

234A. Prediction of snoRNAs in the Human Genome.

SAGARA Jun-Ichi1, NAKAMURA Shugo2, KENMOCHI Naoya3, SATO Tomoyuki1,4, OKOUCHI Ikuo1,4, SUWA Makiko1 and ASAI Kiyoshi1. 1Computational Biology Research Center, National Institute of Advanced Industrial and Technology, 2Department of Biotechnology, The University of Tokyo, 3Central Research Laboratories, Miyazaki Medical College and 4Fuji Research Institute Corporation.

jun@ni.aist.go.jp

 

We predict snoRNAs in the human genome using a statistical method which investigates sequential motifs. We have also developed a human intron database produced from exons predicted by Gene Decoder*2 which is a gene finding technology based on HMMs. We show the prediction data of snoRNAs and the human intron database.

Long Abstract

 

 

235A. Computational Localization of Clusters of Transcription Factor Binding Sites in Promoter Sequences.

Szymon M. Kielbasa, Nils Bluethgen and Hanspeter Herzel. Theoretical Biology, Humboldt University Berlin.

s.kielbasa@itb.biologie.hu-berlin.de

 

We present an iterative algorithm to detect over-represented pairs of motifs in promoter sequences. Results of applying the method to promoters of coregulated genes of yeast and human promoters regulated by the Ras pathway are shown. A comparison to the composite elements available in the Compel database is given.

Long Abstract

 

 

 

New Frontiers.

 

236A. Trends in Molecular Bioinformatics. 66

237A. Discovery Genome Functions with Modeling Five-Elements Computing. 66

238A. Discovery Genome Mechanisms for Cancer Therapy with Agents. 66

239A. BioBhasha - A Programming Language for Biologist. 67

240A. A Rule-Based Approach for Automatically Identifying Gene and Protein Names in MEDLINE  67

241A. Java Library of Generic Internet Robot Algorithms. 67

242A. Novel Approach in Computational Analysis of Biological Complexity. 67

243A. AbML: an XML Schema Description for Antibody Information. 67

244A. Bijective Mapping of Discrete Biological Sequences. 68

245A. A High-speed Similar Protein Retrieval Method using the Distance between Molecular Surface Data. 68

246A. Eukaryotic Linear Motif Resource for Functional Sites in Proteins. 68

247A. A Database for the Management of Gene Expression Data in situ. 68

248A. A Comprehensive Portal to Bioinformatics Training. 68

249A. Semi-Automated Calling System for SNPs (Single Nucleotide Polymorphisms). 69

250A. Analyzing Protein Sequence Tags (PSTs) of Complex Protein Mixtures: A Fast Deisotoping and Deconvolution Algorithm for ESI-MS spectra. 69

 

236A. Trends in Molecular Bioinformatics.

Ashwin Sivakumar1, R. Balaji2, Vidhya Gomathi Krishnan1 and John Howard Parish1. 1School of Biochemistry and Molecular Biology, The University of Leeds, Leeds LS2 9JT, UK and 2Indian Institute of Sciences, Bangalore, India.

bmbasi@bmb.leeds.ac.uk

 

"Post Genome Informatics" takes bioinformatics beyond its original boundaries. With the concurrent advances in computer technology the biological data are amenable to analysis and pattern recognition. This paper is a collection of logistic and statistical analysis of the trends in the area of molecular bioinformatics.

Long Abstract

 

 

237A. Discovery Genome Functions with Modeling Five-Elements Computing.

Jianwei Sun. Independent Researcher and Developer, Room 1608, No.5, Lane 500, MaoTai Road, Shanghai, 200336, China.

jianwei_sun@hotmail.com

It is desired to discover the genome functions resulting in a physiological effect. In this investigation, the mathematical model for the Five-Elements system is investigated by building an evolutionary computing platform for investigating signal transduction produced response of functional-operatings in the Five-Elements system under the paradigm of discovery genome functions with therapy systems.

Long Abstract

 

 

238A. Discovery Genome Mechanisms for Cancer Therapy with Agents.

Jianwei Sun. Independent Researcher and Developer, Room 1608, No.5, Lane 500, MaoTai Road, Shanghai, 200336, China.

jianwei_sun@hotmail.com

It is better to build informatics models for discovering genome mechanisms for cancer therapy from analysing the data generated by microarrays. In this investigation, the modeling is directly using computing power and software agent paradigm with the tangible cases of breast cancer therapy for the discovering job. Meanwhile, an evolutionary computing platform is being built for the investigation.

Long Abstract

 

 

239A. BioBhasha - A Programming Language for Biologists.

BVLS Prasad. Indian Institute of Science, Molecular BioPhysics Unit, Bangalore, Karnataka, 560012, India.

shiva@mbu.iisc.ernet.in

 

'BioBhasha- A Programming Language for Biologist' is developed using C++ Programming Language and Object Oriented Paradigm (OOP). BioBhasha provides a set of Biological Abstract Data Types (BioADTs), which a programmer can use to write programming code in biological terminology. This design makes BioBhasha extensible, maintainable, reusable and biologist friendly. It is designed to provide a bio-programming environment that encourages creativity in exploratory research and flexibility in developing novel bio-computational applications.

Long Abstract

 

 

240A. A Rule-Based Approach for Automatically Identifying Gene and Protein Names in MEDLINE Abstracts.

Hong Yu, M.S., M.Phil.1, Vasileios Hatzivassiloglou, PhD2, Carol Friedman, PhD1,4, Ivan H. Iossifov3, and Andrey Rzhetsky, PhD1,3. 1Dept. Medical Informatics, 2Dept. Computer Science, 3Columbia Genome Center, Columbia University and 4Department of Computer Science, Queens College, City University of New York, New York 10032, USA.

hy52@columbia.edu

 

Identifying gene and protein terms is important for obtaining biological knowledge from literature. We have developed GPmarkup (for gene and protein-name mark up), a system that automatically identifies gene and protein terms and maps gene and protein symbols (e.g., DR3) to names (e.g., Death Receptor 3) in MEDLINE abstracts.

Long Abstract

 

 

241A. Java Library of Generic Internet Robot Algorithms.

Audrius Meskauskas, Frank Lehmann Horn and Karin Jurkat Rott. Department of Applied Physiology, University of Ulm , Einsteinallee 11, D 89069 Ulm.

Audrius.Meskauskas@medizin.uni-ulm.de

 

We suggest the package (library and java code generators) for creating bioinformatical internet robots. The library provides cache, connecting strategy, security system against improper use and organizing system, providing view on the running program. This system can integrate new analysis tools as soon as they appear in the internet pages.

Long Abstract

 

 

242A. Novel Approach in Computational Analysis of Biological Complexity.

Ahmed Fadiel1 and Stuart Lithwick2. 1The Centre for Applied Genomics and 2The Bioinformatics Supercomputing Centre. The Hospital for Sick Children, Toronto, Ontario, Canada.

afadiel@bioinfo.sickkids.on.ca

 

We hypothesized that gene order/location is genome specific and is correlated with the genome evolution and it’s complexity. We tested this hypothesis using non-conventional computational approaches based on complexity analysis. Our results indicated that gene/ORF distribution patterns are genome-specific and are largely conserved within chromosomes of each species. In addition, interestingly we found that genome complexity is correlated with the evolutionary distance between species.

Long Abstract

 

 

243A. AbML: an XML Schema Description for Antibody Information.

Uwe Plikat, Adrian Bruengger and Christoph Wanke. Novartis Pharma, Basel, Switzerland.

uwe.plikat@pharma.novartis.com

 

We propose an XML schema, AbML, for the standardized description and exchange of antibody information. We make use of existing standards for the description of certain data objects as well as provide controlled vocabulary wherever feasible.

Long Abstract

 

 

244A. Bijective Mapping of Discrete Biological Sequences.

Jonas S Almeida1,2 and Susana Vinga2. 1Dept. of Biometry and Epidemiology, Medical Univ. South Carolina, 135 Cannon Street, Suite 303, P.O. Box 250835, Charleston, SC 29425, USA and 2Inst. Tecnologia Química e BiológicaUniv., Nova Lisboa, Av. da República (EAN), P.O.Box 127, 2781-901 Oeiras, Portugal.

almeidaj@musc.edu

 

Universal Sequence Maps (USM, http://bioinformatics.musc.edu/~jonas/usm/) are novel iterative mapping functions, derived from Chaos Game Representation (CGR), that enable bijective mapping of any discrete sequence into continuous unitary hypercubes. USM enables scale independent representation of transition matrices and, as such, offers an advantageous platform for discriminant analysis of biological sequences.

Long Abstract

 

 

245A. A High-speed Similar Protein Retrieval Method using the Distance between Molecular Surface Data.

Yoshikazu Kaneta. Graduate School of Information Science and Technology Engineering, Osaka University, Japan).

kaneta@ise.eng.osaka-u.ac.jp

 

An efficient similar protein retrieval method based on the distance space constructed by calculating the dissimilarity between any two protein molecular surfaces is proposed. The application to the enzyme protein database shows that the retrieval time is reduced to 9.5% of the sequential retrieval.

Long Abstract

 

 

246A. Eukaryotic Linear Motif Resource for Functional Sites in Proteins.

Rune Linding1, Pal Puntervoll2, Christine Gemund1, Scott Cameron3 and Toby Gibson1. 1EMBL, Germany and 2University of Bergen, Norway and 3University of Dundee, UK.

linding@EMBL-Heidelberg.DE

 

The ELM (Eukaryotic Linear Motif) resource is a set of tools to detect functional sites within protein sequences. Context-based discriminatory rules will be applied to filter out false hits, giving end users a small number of plausible functions. ELM information is available at http://elm.eu.org.

Long Abstract

 

 

247A. A Database for the Management of Gene Expression Data in situ.

M. Samsonova1, A.Pisarev1, E.Pustel'nikova1 and P.Baumann2. 1Bioinformation Systems Lab, St.Petersburg State Technical University, St.Petersburg, Russia and 2Active Knowledge GmbH, Kirchenstr. 88, D-81675 Munich, Germany.

samson@fn.csa.ru

 

We propose a novel strategy for management of the information on gene expression in situ. It consists in application of the array DBMS RasDaMan for database design. We have developed the database named as Mooshka (http://urchin.spbcas.ru/Mooshka) which stores in situ data on the expression of segmentation genes in Drosophila blastoderm, as well as numerical results of analysis of a structure and behavior of the segmentation genetic network. Mooshka provides a possibility to search and analyze information within an image and allows one to implement a wide range of data processing operations as internal database queries.

Long Abstract

 

 

248A. A Comprehensive Portal to Bioinformatics Training.

Bernhard Haubold, Manon von Bülow, Jayshree Mistry, Karin Maslen and Monika Haas. LION bioscience AG, Heidelberg, Germany.

monika.haas@lionbioscience.com

 

Web-Based Training Bioinformatics is a comprehensive course in the application of information technology to biomedical research. It currently covers six themes, ranging from databases to comparative genomics. Each theme has a modular structure with each module consisting of a theoretical introduction, a tutorial on representative tools, and worked exercises.

Long Abstract

 

 

249A. Semi-Automated Calling System for SNPs (Single Nucleotide Polymorphisms).

Yoko Higashi1, Arata Sato1, Hirotaka Higuchi1, Hitoshi Sakano1, Toshihiko Morimoto1, Tsutomu Matsunaga1, Keisuke Ishii2 and Masaaki Muramatsu2. 1NTT Data Co. Ltd. and 2Hubit Genomix, Inc.

higashiy@rd.nttdata.co.jp

 

In SNP calling with ordinary software, the laboratory staffs need to look at the plots and manually review each genotype, which is a big bottleneck for genotype analysis. We developed software which makes it possible to call genotypes semi-automatically. It attained 80% accuracy and reduced working hours by 80%.

Long Abstract

 

 

250A. Analyzing Protein Sequence Tags (PSTs) of Complex Protein Mixtures: A Fast Deisotoping and Deconvolution Algorithm for ESI-MS spectra.

U. Bauer, R. Moraga, C. Baumann and J. Schwarz. Xzillion GmbH & CoKG, Bioinformatics/Mass Spectrometry.

Ute.Bauer@xzillion.com

 

Recently proteomics technologies for analysis of protein expression based on ESI LC-MS/MS emerged. The PST approach reduces the complexity of the protein mixture digest by isolating C-/N-terminal peptides. Hence interpretation of LC-MS data becomes possible by combining a fast and sensitive deconvolution algorithm with a special PST peptide database search.

Long Abstract

 

Session B.

 

 

Microarrays.

 

1B. Quality Assurance Methods for Processing Microarray Imagery. 69

2B. Data Management and Analysis for Gene Expression Array. 69

3B. An Empirical Comparison Of Methods For Detecting Differentially Expressed Genes In Cancer Datasets. 70

4B. No Poster. 70

5B. ArrayExpress, a Public Repository for Microarray Gene Expression Data. 70

6B. No Poster. 70

7B. cDNA Microarray Images synthesized from the Real Spot Edge Templates. 70

8B. SOURCE: The Stanford Online Universal Resource for Clones and ESTs. 71

9B. Global Gene Expression Profiling of E. coli with Interruption of Acetate Production. 71

10B. Gene Expression Profiling Analysis Augmented by Mathematically Transformed Gene Ontology. 71

11B. Higher-Order Gene Interaction Revealed by Log-Linear Analysis of DNA Microarrays. 71

12B. A New Normalization Method for cDNA Microarray Data using Clustering Background. 71

13B. Genetic Network Analysis from Microarray Gene Expression Data via Bayesian Network and Nonparametric Regression. 72

14B. Correlation of Hypoxia-Regulated Genes with Hypoxia-Enhanced Metastatic Ability by Gene Expression Profiling. 72

15B. ISAcycle: Independent Subspace Analysis of Gene Expression Data. 72

16B. Computational Tools, Floral Primordial Development pathways and Its regulation using DNA chip Technology in Arabidopsis. 72

17B. No poster. 72

18B. Linearization of DNA Macroarray data. 72

19B. Statistical Analysis of Multi-Center Microarray Data. 73

20B. Bayesian Networks and Perturbation Experiments. 73

21B. Quality Measures in cDNA Microarray Experiments. 73

22B. Large-scale Analysis of the Human and Mouse Transcriptomes. 73

23B. Experimental Design and Analysis of Microarray Data. 73

24B. Robustness of Ensemble Learning for Independent Component Analysis of Micro-Array Channel-Ratio Data. 74

25B. Comparative Analysis of Algorithms for Signal Quantitation from Oligonucleotide Microarrays. 74

26B. An Experimentally Optimized Algorithm for High Throughput, Parallel Determination of Gene Structure using Microarrays. 74

27B. Comparison of Transcript Abundance Models in Gene Expression Microarrays. 74

28B. Generation of a Human Fovea Gene Expression Database. 75

29B. Testing Non-Linearity and Adjusting Lowess Transformation in cDNA Microarray Data Sets. 75

30B. Molecular Class Discovery in Cancer and Clinical Outcome Prediction using Genome-Wide Gene Expression Profiling: A Case Study on Ovarian Carcinomas. 75

31B. "Spotting Error" in Microarray Data. 75

32B. An Algorithm for Identifying Regulatory Relationships in Single-mutant Gene Expression Data. 75

33B. Microarray Analysis of the Developing Mouse Cerebellum. 76

34B. Elucidation of Genes Involved in HTLV-I-induced Transformation using the K-harmonic Means Algorithm to Cluster Microarray Data. 76

35B. Between Group Eigen Analysis: A Simple and Flexible Class Prediction Method for Gene Expression Data. 76

36B. MAP: Microarray Annotation Program. 76

37B. Development of a Relational Toxicogenomics Database for the Prediction of Chemical Toxicity. 76

38B. No poster. 77

 

1B. Quality Assurance Methods for Processing Microarray Imagery.

Peter Bajcsy, Ph.D.1, 1 Zonglin L. Liu, Ph.D.2, and Lei Liu, Ph.D.3. 1National Center for Supercomputing Applications, 605 East Springfield Avenue, Champaign, IL 61820, 2Department of Animal Sciences, University of Illinois, Urbana, IL 61801 and 3The W. M. Keck Center for Comparative and Functional Genomics, University of Illinois at Urbana-Champaign, 330 ERML, 1201 W. Gregory Dr., Urbana, IL 61801.

pbajcsy@ncsa.uiuc.edu, z-liu@staff.uiuc.edu and leiliu@uiuc.edu

 

We present three quality assurance (QA) methods for processing DNA microarray images. The three methods are based on signal-to-noise ratio, topology of a microarray dot and statistical distributions of background and foreground (signal) pixel intensities inside of a microarray grid cell. The methods are applied to DNA microarray images to detect systematic errors and remove any unreliable information from further analysis.

Long Abstract

 

 

2B. Data Management and Analysis for Gene Expression Array.

Olga Krebs, Rolf Kabbe, Karlheinz Gross and Roland Eils. German Cancer Research Center, Heidelberg, Germany.

o.krebs@dkfz.de

 

We have designed and implemented an array informatics system which integrates data management and analysis and is intended to support and integrate RNA expression data with other kinds of functional genomics data. Its functionality ranges from the storage of the data in relational data base management systems (currently Oracle RDBMS running on a Unix system) and Data Warehouse to front-end tools for the presentation and maintenance of the data.

Long Abstract

 

 

3B. An Empirical Comparison Of Methods For Detecting Differentially Expressed Genes In Cancer Datasets.

Soumyaroop Bhattacharya, Tue Tri Nguyen, Satish Patel, Jia Ke and James Lyons-Weiler. Center for Bioinformatics and Computational Biology, University of Massachusetts Lowell, United States of America.

Soumyaroop_Bhattacharya@student.uml.edu

 

We compared the performance of a number of simple methods for the analysis of microarray data using empirical and simulated datasets based on their Predictive Utility (proportion of datasets that the method returns a classification with the pre-defined bipartition ie. all in Group A separated from all in Group B).

Long Abstract

 

 

4B. No Poster.

 

 

5B. ArrayExpress, a Public Repository for Microarray Gene Expression Data.

Helen Parkinson, Niran Abeygunawardena, Ele Holloway, Misha Kapushesky, Gaurab Mukherjee, Philippe Rocca-Serra, Susanna Sansone, Ugis Sarkans, Mohammad Shojatalab, Jaak Vilo and Alvis Brazma. European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. parkinson@ebi.ac.uk

 

ArrayExpress is a public repository for gene expression data based at the EBI. ArrayExpress uses the MAGE-OM model (an OMG standard) and accepts submissions in MAGE-ML format and via the MIAME compliant submission and annotation tool MIAMExpress. MIAMExpress incorporates terms from the ontology developed by the MGED ontology working group.

Long Abstract

 

 

6B. No Poster.

 

 

7B. cDNA Microarray Images synthesized from the Real Spot Edge Templates.

Hye Young Kim1, Yong Sung Lee2, Young Seek Lee3 and Jin Hyuk Kim1. 1Dept. of Physiology and 2Dept. of Biochemistry, College of Medicine, Hanyang University, Seoul, 133-791, Korea and 3Dept. of Biochemistry and Molecular Biology, College of Science and Technology, Hanyang University, Ansan, 425-791, Korea.

hykim121@hanyang.ac.kr

 

Synthetic cDNA microarray images could be generated with mathematical model of cDNA distribution in the spots. Because these images are based on the random ratio of specific RNA in control and experimental group, they can be used for evaluation of accuracy of microarray experiments.

Long Abstract

 

 

8B. SOURCE: The Stanford Online Universal Resource for Clones and ESTs.

Hernandez-Boussard T, M. Diehn, A. Alizadeh, J.C. Matese, G. Binkley, H. Jin, J. Gollub, J. Demeter, J. Hebert, C.A. Ball, P.O. Brown, D. Botstein and G. Sherlock. Stanford University, Stanford CA USA.

boussard@genome.stanford.edu

 

The Stanford Online Universal Resource for Clones and ESTs (SOURCE - http://genome-www.stanford.edu/source) compiles information from biological public databases that can be used to annotate human, mouse or rat clones. SOURCE produces 'gene reports' to facilitate the analysis of large datasets, including gene expression patterns and physical mapping for a given gene.

Long Abstract

 

 

9B. Global Gene Expression Profiling of E. coli with Interruption of Acetate Production.

Won Jae Heo, Sung Ho Yoon and Sang Yup Lee. Korea Advanced Institute of Science and Technology, Daejeon, 305-701, Korea.

wanja@mail.kaist.ac.kr

 

Physiological functions of the acetate synthetic pathway in Escherichia coli have been studied for many years and now transcriptome analysis has been performed for global understanding of the role of the acetate producing pathway using DNA microarrays.

Long Abstract

 

 

10B. Gene Expression Profiling Analysis Augmented by Mathematically Transformed Gene Ontology.

Jill Cheng, John Martin, Melissa Cline, Tarif Awad and Michael A. Siani-Rose. Affymetrix, Inc., Santa Clara, California, USA.

jill_cheng@affymetrix.com

To improve current methodology on expression analysis, novel applications were developed based on the graph structure of gene ontology. This entails automatic biological interpretation of microarray results and a knowledge-guide clustering algorithm where expression profiles and functional annotations were combined. Analysis of results on a mouse hematopoietic time-series microarray experiment will be presented.

Long Abstract

 

 

11B. Higher-Order Gene Interaction Revealed by Log-Linear Analysis of DNA Microarrays.

Hiroyuki Nakahara1, Shin-ichi Nishimura1, Masato Inoue1, Gen Hori2 and Shun-ichi Amari1. 1Lab for Mathematical Neuroscience and 2Lab for Advanced Brain Signal Processing, RIKEN Brain Science Institute.

nishi@mns.brain.riken.go.jp

 

We introduce log-linear decomposition of higher-order interactions, based on the weak definition of conditional independence. Our measure, based on information geometry, can estimate fine structure of higher-order interaction. Using real datasets, we show that our method can reveal genetic 'switches' that modulate cellular functions.

Long Abstract

 

 

12B. A New Normalization Method for cDNA Microarray Data using Clustering Background.

Dong Mi Shin1, Hye Young Kim2, Myung Guen Chung3, Jin Hyuk Kim2, Young Seek Lee3 and Yong Sung Lee1. 1Department of Biochemistry, 2Department of Physiology, College of Medicine, Hanyang University, Seoul, Korea and 3Department of Biochemistry and Molecular Biology, College of Science and Technology, Hanyang University, Ansan, Korea.

yongsung@hanyang.ac.kr

 

New normalization method carried out after clustering the segments by ratio of spot intensity to background intensity was studied. This background dependent normalization decreased the number of genes whose expression levels were changed significantly and it could make their distribution more consistent through the whole range of signal intensities.

Long Abstract

 

 

13B. Genetic Network Analysis from Microarray Gene Expression Data via Bayesian Network and Nonparametric Regression.

Seiya Imoto1, Kim Sunyong1, Takao Goto1, Sachiyo Aburatani2, Kousuke Tashiro2, Satoru Kuhara2 and Satoru Miyano1. 1Human Genome Center, Institute of Medical Science, University of Tokyo and 2Graduate School of Genetic Resources Technology, Kyushu University.

imoto@ims.u-tokyo.ac.jp

 

We show a method for inferring genetic networks from cDNA microarray data by using Bayesian network model, which can capture even nonlinear structures between genes. Nonparametric regression models with B-splines and a criterion, BNRC, for evaluating the network are newly defined. We show its high performance with computational experiments.

Long Abstract

 

 

14B. Correlation of Hypoxia-Regulated Genes with Hypoxia-Enhanced Metastatic Ability by Gene Expression Profiling.

Patrick Subarsky and Richard P. Hill. Medical Biophysics, University of Toronto and Ontario Cancer Institute, Princess Margaret Hospital.

p.subarsky@utoronto.ca

 

Experimental metastatic potential of some tumor cell lines is transiently increased following hypoxic exposure. Hierarchical clustering of cDNA microarray time series data representing hypoxic exposure followed by oxic recovery demonstrated a globally repressed sub-set of genes and further discrete gene sub-sets with unique patterns of induction.

Long Abstract

 

 

15B. ISAcycle: Independent Subspace Analysis of Gene Expression Data.

Hyejin Kim, Seungjin Choi and Sung-Yang Bang. Dept. of Computer Science and Engineering, POSTECH.

marisan@postech.ac.kr

 

ISAcycle is an unsupervised learning method for gene expression data analysis, based on independent subspace analysis (ISA) which aims at finding independent feature subspace of multivariate data in an unsupervised fashion by maximizing the independence between norms of projections on linear subspaces. We apply ISA to cell cycle-related gene expression data analysis and show its usefulness: (1) the ability of assigning genes to multiple coexpression pattern groups; (2) the capability of clustering key genes that determine each critical point of cell cycle.

Long Abstract

 

 

16B. Computational Tools, Floral Primordial Development Pathways and Its Regulation using DNA Chip Technology in Arabidopsis.

Varsha Raja. Xintra ,Bioinformatics, Toronto, Ontario, Canada.

 

The primary goal of sequencing the entire Arabidopsis genome is to use this information to understand overall cellular, molecular and developmental processes. It can be further explored to understand the flowering mechanisms and its regulation at the molecular level. In order to accomplish this goal new experimental and computational tools will be needed. With the advent of structural and functional computational genomics, plant genome revolution has changed and promising toys such as high- throughput screening, imaging systems, micro array and chip technology will become powerful tools in order to understand biological processes such as flower development and enable the analysis of gene expression patterns.

 

 

17B. No poster.

 

 

18B. Linearization of DNA Macroarray data.

Yi Xie, Adele Cutler and Bart Weimer. Utah State University Logan, Utah 84322.

yixie@cc.usu.edu

 

The problem of data nonlinearity limits the usefulness of DNA expression arrays in functional genomic research. In this report, we demonstrate that we were able to linearize the raw data on a membrane-based DNA macroarray. The accuracy of these linear transformations was validated by a serial dilution experiment.

Long Abstract

 

 

19B. Statistical Analysis of Multi-Center Microarray Data.

Taesung Park1, Sung-Gon Yi, Hosik Choi1, Seung-Yeoun Lee2, Kee-Ho Lee3, Jung Kyoon Choi4, Sangsoo Kim4, Yeom Young Il4, Choi Jong Young5 and Daeghon Kim Chonbuk6. 1Department of Statistics, Seoul National University, Seoul, Korea , 2Department of Applied Mathematics, Sejong University, Seoul, Korea , 3Laboratory of Molecular Oncology, Korea Cancer Center Hospital , 4Korea Research Institute of Bioscience and Biotechnology, Taejon, Korea , 5The Catholic University of Korea, Seoul, Korea and 6National University, Jeonju, Chonbuk, Korea.

skon@kr.FreeBSD.org

 

For the case when the same type of microarrarys from different labs or clinical centers, we propose a statistical model to account for an additional variability caused by different clinical centers. The proposed model is based on the ANOVA model.

Long Abstract

 

 

20B. Bayesian Networks and Perturbation Experiments.

Iosifina Pournara and Lorenz Wernisch. Dept of Crystallography, Birkbeck College, University of London.

i.pournara@cryst.bbk.ac.uk

 

Bayesian Learning is used to construct genetic networks that describe how the expression level of each gene depends on the external simuli and on the expression levels of other genes. I am investigating the robustness of Bayesian Learning and the significance of the perturbation experiments on constructing genetic networks.

Long Abstract

 

 

21B. Quality Measures in cDNA Microarray Experiments.

Taesung Park1, Ki-Woong Kim1, Sunggon Yi1, Seung-Yeoun Lee2, Jin-Hyuk Kim3, Hea Young Kim3 and Yong Sung Lee Hanyang3. 1Department of Statistics, Seoul National University, Seoul, Korea, 2Department of Applied Mathematics, Sejong University, Seoul, Korea and 3University College of Medicine, Seoul, Korea.

tspark@stats.snu.ac.kr

 

Although several spot quality measures have been considered, it has not been investigated yet which quality measures are most sensitive to detect spots with poor quality. We perform a systematic comparison to investigate the sensitivity of these quality measures to detect spots with poor quality.

Long Abstract

 

 

22B. Large-scale Analysis of the Human and Mouse Transcriptomes.

Andrew I. Su 1, Michael P. Cooke 1, Keith A. Ching 1, Yaron Hakak 1, John R. Walker 1, Tim Wiltshire 1, Anthony P. Orth 1, Raquel G. Vega 1, Lisa M. Sapinoso 1, Aziz Moqrich 2, Ardem Patapoutian1,2, Garret M. Hampton 1, Peter G. Schultz1,2, and John B. Hogenesch1. 1Genomics Institute of the Novartis Research Foundation (GNF); San Diego, CA and 2The Scripps Research Institute; La Jolla, CA.

asu@gnf.org

 

We present a preliminary description of the normal mammalian transcriptome comprised of gene expression measurements from 91 human and mouse samples. We have mined this dataset for insights into molecular and physiological gene function, mechanisms of transcriptional regulation, disease etiology, and comparative genomics. These data are accessible at http://expression.gnf.org.

Long Abstract

 

 

23B. Experimental Design and Analysis of Microarray Data.

Justin C. Fay and Michael B. Eisen. Department of Genome Sciences, Lawrence Berkeley National Laboratory.

jcfay@lbl.gov

 

Replicated DNA microarray experiments were used to identify the amount of error produced during the labeling, hybridization and scanning steps in microarray experiments. Analysis of this error indicated certain probes are more variable among replicates than others. Two statistical methods were used to account for this variance in error and the results were compared.

Long Abstract