Picture: 2008 ISCB ASSA Winner,
"[David] Haussler’s group was one of the pioneers of machine learning in bioinformatics, introducing Hidden Markov Models for the statistical analysis of patterns in biological data," says Brunak. However, Haussler’s recent achievements have been more in the application of bioinformatics methods than in their development. Since 1999, he has been one of the principal figures in sequencing, and later analysing, the human genome and those of other mammals, and in mining this genomic information for insight into vertebrate evolutionary history.
Haussler originally trained as a mathematician. His first encounter with computational biology came in graduate school, at the University of Boulder in Colorado, where he had the good fortune to study for his Ph.D. under Andrzej Ehrenfeucht. "He taught me that I should never be constrained by disciplinary boundaries, and never be frightened to tackle big problems. The word ‘bioinformatics’ didn’t exist when I was a graduate student, but we were doing it."
Haussler’s first years as an independent investigator were devoted to studies in pattern recognition and machine learning, focusing on modelling the way the brain learns. He shifted from computational neuroscience back to bioinformatics when Anders Krogh joined him at Santa Cruz as a post-doc. "He [Anders] came to my lab to work on machine learning, but soon discovered that these methods could be applied to biological sequence analysis, to classifying proteins into families and recognising genes in fragments of DNA."
Late in 1999, Haussler was called by Eric Lander, one of the leaders of the public human genome sequencing project, and asked to apply his HMM methodology to identifying the genes in the then newly sequenced human DNA," he explains. At that time, the public project was in a "full-on race" with Celera to publish an initial working draft of the sequence.
Barely six months after Haussler joined the project, both teams were ready to release their first genome drafts. Haussler well recalls July 7, 2000, when the complete draft genome sequence was posted on the University of Santa Cruz’ Web server. "Seeing the waterfall of As, Gs, Cs, and Ts pouring off our server was an emotional moment," he says. "We were witnessing the product of more than three billion years of evolution, sequences passed down from the beginning of life to present-day humans." This excitement was shared by the worldwide scientific community; Internet traffic on the Santa Cruz server reached 0.5 terabytes per day then: a record that still stands.
Haussler has dedicated the first years of the new millennium to mapping and analysing that sequence. Other questions that have attracted Haussler’s attention include the analysis of hyper-conserved DNA sequences that remain virtually unchanged in divergent species, and the genetic changes that distinguish humans from apes. While most researchers in this field have concentrated on gene gain during evolution, Haussler and his team recently identified twenty-six genes that are well-established in the vertebrate lineage but that were lost in the latter stages of human evolution.
This article is excerpted from the July 2008 issue of PLoS Computational Biology. To link to the full journal article please visit www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000101.