Building Microarray Informatics Support

¹Duke Bioinformatics Shared Resource, Box 3958, Duke University Medical Center, Durham, NC 27710
²Institute of Statistics and Decision Science, Box 90251, Duke University, Durham, NC 27708

Synopsis

Large-scale gene expression data has placed increasing demand on informatics support. In this tutorial, we will share our experience in strategic planning and practical implementation of an institute-wide bioinformatics core at Duke for Affymetrix DNA chip and spotted cDNA microarray studies. A comprehensive review of current literature for existing practices will be presented. These practices include clustering expression profiles, mapping expression data to metabolic pathways and to chromosome locations, and most recently, modeling and simulating regulatory networks. This tutorial will focus on the following three issues: 1) Storage, retrieval and dissemination of expression data, 2) Auxiliary domain expert databases for data mining and knowledge discovery, 3) Data analysis and visualization tools. Both academic solutions and commercial packages will be reviewed.

General purpose software capable of analysis of image data are usually adequate for the use of commonly employed algorithms on images of moderate size. Images obtained for the purpose of gene expression analysis are typically too large to be easily handled by commercial software. It may also be desirable to analyse a set of images of gene expression simultaneously, which is outside the scope of most software capable of analysing single images. Analysis of gene expression data is further complicated by prior knowledge of specific genes being measured at predetermined locations in the image. The benefit of this information imposes the cost of preventing the image from being viewed as a signal because some pixel values are more important than others. The relative importance is subjective and not readily quantified. For these reasons, it is advantageous to shift analyses of gene expression images from an algorithmic approach to the object oriented paradigm which is equally capable of efficiently implementing algorithms but easily focuses computational analyses on retention of data structures in place of data summaries.

Due to the rapid advance of array informatics, the contents of this tutorial will be updated regularly. To share these with the scientific community, we have developed a Microarray Informatics Portal (www.array.tech.nu). Please reference this portal for the latest information on this ISMB tutorial.

Details

Microarray Informatics

Due to the rapid advance of array informatics, the contents of this tutorial will be updated regularly. Please always reference the Microarray Informatics Portal (www.array.tech.nu) for the latest information.

Scope of the Tutorial

In today's competitive functional genomics business, understanding and applying informatics technology in microarray studies is crucial to maintaining a competitive position. This tutorial gives you an understanding of how to build microarray informatics support, from an applied perspective without undue bias toward vendor specific implementations.

Objective of the Tutorial

Upon completion of the tutorial, you will be able to:

Understand what microarray informatics is and how you can use it to make objective inferences about biological processes measured in your data.
Understand the limitations of microarray technology in order to avoid costly and unrealistic expectations.

Tutorial Outline

INTRODUCTION

Microarray Informatics: The Road Map From the Collection of Interested Genes to the Fabrication of the Array
Image Analysis and Pattern Recognition
From Statistics to Knowledge: Knowledge Discovery in Databases (KDD)

Review of Related Subjects

High-throughput Instrumentation: DNA Chips and Spotted cDNA Arrays
Data Warehouse and Knowledge Base
Data Abstraction, Dimension, and Vector Space
Classification, Pattern Recognition, and Machine Learning
Electronic Publication of Massive Data Set

Comparative Costs of Gene Expression Data

Hardware
Software
Labor

Image Analysis

Interpretation of pixel locations in images of gene expression
Interpretation of pixel values according to data acquisition
Transformations and decompositions of image data
Image file formats

Exploratory Data Analysis (EDA) and Data Mining (DM)

Introduction: Data-driven vs. Hypothesis-driven Research
Data Validation, Transformation, and Standardization
Quantifying Similarity and Difference
Clustering and Pattern Recognition Hierarchical Clustering
K-means
Kohonen Self-organize Maps (SOM)
Principal Component Analysis (PCA)
Multidimensional Scaling (MDS)

Data Visualization

Computational Approaches

Limitations of canned software
Shifting analyses from focusing on algorithms to focusing on data structures
Storage of expensive computations

Knowledge Bases for Functional Genomics

Metabolics: KEGG, EcoCyc, WIT, Boehringer Mannheim
Model Organisms: GeneCard, MGD, YGD, YPD, WormPD
Literature: ePubCentral
Knowledge Discovery in Databases (KDD) and Text Mining

Deciphering the Genetic Network

Gene Interaction Knowledge Bases
The Difficulties of Reverse Engineering
Boolean Network and Mathematical Challenges

e-Publication

Deposit the Data to Public Databases
Complementary Website for Publication
Make the Website Interactive and Informative

Research Collaboration Environment

Online Community and Information Portal
Knowledge Sharing

Appendices

Public Domain Solutions
Commercial Products
Literature Collection
Contractual agreements with commercial suppliers