Picture: 2011 Overton Prize Winner,
In the spring of 1997, Olga Troyanskaya was working on a degree in computer science and biology at the University of Richmond, Virginia, when she contacted Steven Salzberg, then at Johns Hopkins University, about a summer internship in his lab devoted to computational biology. "He took a chance on me—a random student from another school—and was tremendously inspirational," she says. She spent the following two summers working in Steven Salzberg's laboratory, first at Johns Hopkins and then at The Institute for Genomic Research.
And so began the career of one of the most promising young researchers in bioinformatics, and a deserving winner of this year's Overton Prize. "She is one of these forces of nature, full of energy,” says Alfonso Valencia, chair of the ISCB awards committee.
Troyanskaya herself talks with infectious enthusiasm about her work. "I've always been fascinated by the problems of biology,” she says. “I was just better at computer science and math than the wet lab research. And it seemed to me that there had to be a lot you could contribute with computer science that you couldn't do with experimental techniques alone.”
From the University of Richmond, Troyanskaya moved to Stanford University to complete a PhD in biomedical informatics, under the supervision of Russ Altman, a bioinformatician, and David Botstein, a geneticist. "I wanted a setup that was close to real biological problems, and I got exactly that. I learned a great deal from both of them," she says.
In 2003, she moved to Princeton University as an assistant professor in the Department of Computer Science and the Lewis-Sigler Institute for Integrative Genomics. "I am fortunate that the computer science department appreciates the impact of computing in biology, and that I have many wonderful colleagues at both the department and in the Institute. I found several amazing collaborators, and this allowed me to begin a number of interesting projects."
One of the key problems she focuses on is making better use of the vast but unwieldy biological datasets in databases around the world. “So instead of focusing on one study, we can take the entirety of published data. That allows you to ask very specific questions in a data-driven way and to develop novel biological hypotheses,” she says.
An important goal is to predict the function of genes or proteins. There have been many experimental approaches to determine what genes do and how they are controlled inside the cell. But this work tends to produce datasets that are large and noisy. Troyanskaya's approach is to develop new ways for extracting useful information from these datasets using techniques from computer science such as machine learning and data mining.
"Computation by itself is often not enough to discover new biology but it can direct experimental work," she says. And she has set up a wet lab to help test and validate the hypotheses that the computer science helps generate. In 2009, for example, she used this approach to identify 109 new proteins involved in mitochondrial biogenesis in yeast.
This combined approach is one of the things that sets Troyanskaya apart, says Valencia. "She is one of the first to have come from the computational side and then moved into the experimental area to combine both,” he says.
Understanding the function of individual genes is only a small part of a much bigger story. Many genes and proteins play multiple roles within a cell as parts of various networks of biological processes. Mapping out these networks and understanding how they work and interact with each other is yet another strand of her research. "She has made important contributions to systems biology," says Valencia.
The process of evaluating and validating computational predictions is an area requiring a broad collaboration to develop standards and methods that can be used to achieve a consensus about the results. To this end, Troyanskaya is collaborating with the curators of model organism databases and members of the Gene Ontology Consortium.
Another problem that many researchers face is handling the data avalanches currently being generated. So Troyanskaya, in collaboration with Princeton colleagues Kai Li and Moses Charikar, is looking at ways to better search and visualise these huge datasets, something that is challenging because of high noise levels and the enormous volume of the data. "We are developing better ways to do this," she says.
The awards committee was also impressed by Troyanskaya's service for the community. She is involved in the Society's two official journals, PLoS Computational Biology and Bioinformatics. And she is involved in conferences: organizing, chairing tracks and program committees. "That is something that is very much appreciated," says Valencia. "We are lucky to have her."
And there is surely more to come. Troyanskaya points to numerous questions that are driving her research forward. She wants to know, for example, how we can predict which genes are involved in kidney disease, to understand their function and their clinical role on a molecular level. She works on these questions in close collaboration with experimental researchers, such as Matthias Kretzler and his group from the University of Michigan, Ann Arbor. And she is passionate about finding ways to ask questions in a data-driven way, not just in a knowledge-driven way that relies on what we already know about biology. "These are the questions that I'm really interested in," she says. "And we really haven't yet harnessed the full potential of our data collections."
This article is excerpted from the June 2011 issue of PLoS Computational Biology. To link to the full journal article please visit www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002081