Isaac S. Kohane, Atul Butte, and Ben Reis
Children's Hospital Informatics Program (Harvard)
The massively parallel acquisition of RNA expression data is rapidly becoming streamlined and dropping in price. In the near future we can expect that biologists and clinicians in many institutions will be routinely measuring such data. Therefore analysis of these data sets to characterize biological systems, identify high-yield candidate genes/ESTâs for further biological investigation, or quantify a patientâs health risks, to just name a few tasks, will become a standard part of the investigational armamentarium. Many algorithms have been developed to take RNA expression data sets and generate clusters that are putatively reflective of functional dependencies. These algorithms range in complexity from simple fold-difference calculations to comprehensive pair-wise comparisons and model construction. This tutorial is designed to teach the basics of the various bioinformatics methodologies available to analyze RNA expression data sets, yet will approach the subject from a practical standpoint, so that attendees can immediately put these algorithms to use.
By the end of the session, attendees will be able to:
1. Understand the formats of expression data files produced by Affymetrix software and Incyte software.
2. Be able to explain the different types of genomic clustering available, including intervention fold differences, self-organizing maps, phylogenetic-type trees, and know the advantages and disadvantages of each.
3. Know how to calculate correlation coefficient, mutual information, entropy, and other measures of information.
4. Be able to interpret the results of each clustering method, and know what possible next steps are available in analyzing the results.
5. Understand all the types of experiments done with microarrays to date, and the potential variety of experiments possible.
1. Review
The first part of the tutorial will be the most didactic. It will include a review of:
|
|
|
|
2. Questions and Answers
During this segment of the tutorial, participants will be encouraged to explore how they might use these techniques in domains that are of interest to them. Also, the instructors will moderate a more detailed discussion of the problems associated with each of the techniques reviewed and where the current research challenges lie.
3. Example Analysis
A publicly available data set will be introduced. The instructors will lead the participants step by step through several analyses of this data set. The will provide a very concrete sense of what is involved in performing the analyses introduced in the Review part of the tutorial.
Instructors:
The instructors for this course, listed below, are involved in investigations of gene expression with collaborators in multiple academic centers in Boston and elsewhere. These collaborations involve the study of the functional genomics of organ transplantation and rejection, cardiac disease, angiogenesis, tumorigenesis, neurodevelopment, neuromuscular disease, neuroendocrine circadian rhythmicity, to just mention a few of the established application domains.
Isaac S. Kohane, MD, PhD
Associate Professor of Pediatrics Harvard Medical School Director, Childrenâs Hospital Informatics Program |
Atul Butte, MD
Fellow in Informatics |
Ben Reis, PhD
Fellow in Informatics Division of Health Sciences and Technology, Harvard/MIT |