|Picture: 2011 ISCB ASSA Winner,
Photo courtesy of European
Molecular Biology Laborator
If computational biology seems challenging in the second decade of the 21st century, spare a thought for those who pioneered the discipline in the 1980s. Michael Ashburner at the University of Cambridge was one of them. “His work is now seen as a landmark and an achievement in technology,” says Alfonso Valencia, chair of the ISCB awards committee.
Ashburner began his career with a degree in genetics from the University of Cambridge in 1964. He stayed on to do a PhD, studying Drosophila and, in particular, polytene chromosomes, which form when certain specialised cells undergo repeated rounds of DNA replication. Polytene chromosomes have a characteristic banded structure. In Drosophila there are some 5,000 bands and a subset of these undergo, during development, a reversible structural modification as the result of transcription; this is known as puffing and can be considered an analog of gene activity. In the late 1960s and early 1970s, Ashburner studied puffing patterns and inferred the existence of a cascade of genetic controls under the influence of the hormone ecdysone during larval development.
In the late 1970s, Ashburner turned his attention to the study of the Alcohol dehydrogenase gene and its environs. By the mid-1980s, he had the most detailed analysis in full genetic terms of any small chromosome region of any multi-cellular organism, and had the Adh gene sequences from several different species of Drosophila. “That drew me into bioinformatics because we needed a way of comparing sequences,” he says. “There was almost no software available to help.”
Two people came to his aid. The first was Walter Bodmer, director of the Imperial Cancer Research Fund, who gave Ashburner the use of a DEC computer with access to the early network. “We could access this machine by dial-up and do some analysis,” he says. The second was Doug Brutlag at Stanford University, who was developing MOLGEN, an early bioinformatics system, which he allowed Ashburner to access.
That presented a significant obstacle, however. Getting a computer in the United Kingdom to speak to one in Stanford was not straightforward. Today, everybody uses the Internet, defined by the TCP/IP protocol. But in the early ‘80 s, the UK and United States used different systems. The US was pioneering TCP/IP while the UK had a standard called the Coloured Book protocols. “The only place that had an interface between the two protocols was University College, London, and they were very helpful,” says Ashburner, “giving us 5 kb of disk space.”
The process of connecting to Stanford was far from simple. “The way you did it was to dial up your local packet switching exchange at the Post Office and connect to the Rutherford Appleton Laboratory. You then typed in some code which connected you to UCL where you could use TCP/IP,” he says. The signal was routed via Goonhilly satellite station in Cornwall to Carnegie Mellon University and from there to Stanford. “I had a dumb terminal, that is a box with no memory, so everything had to be captured by a printer in parallel.” Ashburner was far from deterred, however.
At about that time, the European Molecular Biology Laboratory (EMBL) in Heidelberg and GenBank in the US released the first nucleotide sequence libraries in quick succession. Using his network access, Ashburner and his colleagues, collaboratively with MOLGEN, set up one of the first bulletin boards, called BioNet, to keep people informed of changes to the library and to software. “This became well used and things evolved from there,” he says.
As the field of bioinformatics grew, the need for an institution to house the data and conduct research increased. So in 1992, the EMBL decided to set up an institute of bioinformatics that would house this library and carry out research. This organisation became known as the European Bioinformatics Institute, based in Hinxton, UK, with Ashburner and John Sulston having led the UK bid to host it. “I was persuaded to become the first program coordinator and took half-time leave from Cambridge to do that,” he says. He eventually took over as joint-director, a post he held until 2001. “At first, the finances were sticky and the politics were horrendous. But it has since gone from strength to strength,” he says.
At the same time, Ashburner continued his interest in Drosophila genetics. This is a field with a rich and long history of collecting and sharing mutations. The first catalogue of mutations was published in 1925 and it was still being revised in paper form in the late 1980s. But the field was beginning to expand quickly and the books were out of date as soon as they were published. “It became clear to me that we couldn't carry on publishing in paper form every 10 or 20 years,” he recalls.
So in 1989 he proposed that the community set up an electronic database to take over the role of the printed one. In 1992, the NIH funded the project that became known as FlyBase, one of the first genetic and now genomic databases.
FlyBase was a crucial factor in triggering Ashburner's interest in a structured, controlled vocabulary, a formal representation of knowledge about genes and gene products. He began to define terms for gene products by their biological processes, such as wing development, and then defined the data structure in which these terms were related to each other. “It occurred to me that if you were able to do this for several model species, you'd have a fantastic tool,” he says.
But this insight initially met with little interest. “My first presentation, at ISMB in Greece in 1997, went down like a lead balloon,” he recalls. Eventually, he and three like-minded colleagues settled the matter in a bar at the Montreal ISMB in 1998.
Together, they decided to set up a cross-species ontology to be used by the Drosophila, yeast, and mouse databases. They called it the Gene Ontology, and it is now a major bioinformatics project that covers over 1,800 species. Their original paper on the idea in Nature Genetics is one of the most highly cited in the field. “His achievement is not just to have built this system but also to have organised the consortium behind it. It is now one of the most used resources in all of biology,” says Valencia.
He went on to collaborate with Gerry Rubin and Craig Venter in sequencing the Drosophila genome in 1999. “The process turned me into a nervous wreck,” he jokes. He published his account of this roller-coaster experience in a short but entertaining book called Won for All: How the Drosophila Genome was Sequenced (Cold Spring Harbor Laboratory Press, 2006).
“We're lucky to have such an inspirational figure in the community,” says Valencia. “This award has been well deserved for a number of years.”
This article is excerpted from the June 2011 issue of PLoS Computational Biology. To link to the full journal article please visit www.ploscompbiol.org/article/info%3Adoi/10.1371/journal.pcbi.1002081