| Picture: 2012 Overton
Ziv Bar-Joseph loves to run. He rises early and hits the streets and trails around Pittsburgh where he lives, often in training for a long-distance race. This dedication has paid off. He has the enviable distinction of having run a sub-three hour marathon, a feat achieved by few amateurs. "Running is very important to me," he says.
But it is not just in his running that Bar-Joseph shows a willingness to go the distance. As a computer scientist and computational biologist at Carnegie Mellon University in Pittsburgh, Bar-Joseph shows a similar dedication as head of the Systems Biology Group at the School of Computer Science. "We have all been impressed by the novelty of the approaches he has developed," says Alfonso Valencia, chair of the ISCB Awards Committee.
Bar-Joseph gained a PhD in computer science from the Massachusetts Institute of Technology between 1999 and 2003. That time turned out to be hugely significant, not least because computational biology was undergoing a revolution. "For the first time, we were getting sequences for large species. First, the fly, then humans. It was very inspiring," he says.
Initially, Bar-Joseph knew little about computational biology but took a class to better understand the significance of these advances and the problems they posed. "It seemed to me that these types of problems were well-suited for the machine learning tools I had experience with," he says.
One of the key problems was how to compare sequences either within species or between them. Various researchers had developed methods to do this using a branch of computer science called combinatorics, which essentially counts the number of similar patterns.
But while this works well when comparing two sequences, it's not so good for comparing seven or eight sequences. It doesn't scale. Consequently, researchers began to experiment with probabilistic approaches that focus on the statistical properties of the patterns. In particular, computational biologists had significant successes with a statistical approach called a hidden Markov model. That attracted Bar-Joseph who had studied this model.
He also recognised that other earlier studies, attempting to reconstruct networks in cells, were significantly limited: the data was a snapshot of a complex dynamic system but they treated it as if it were static.
Clearly, biological systems change. "One thing I've been involved in is introducing dynamics into the algorithms so that they can cope with the way things change in time. That requires different tools," he says.
The approach has paid off when it comes to understanding regulatory networks and explaining how proteins control each other. For example, yeast has about 6,000 proteins. Of these, some 250 are control proteins and each of these, on average, controls 100 or so other proteins. However, each control protein is itself controlled by a handful of other proteins.
Understanding a system like this is a tricky business. The static data can tell you what proteins control other proteins, but that doesn't tell you when and under what conditions because that requires more experiments.
Other types of data are more temporal and can reveal how protein levels change over time. "The question we asked was whether we can use this temporal data to try and recover the underlying network dynamics," he says. "We came up with methods to integrate these datasets in order to reconstruct the set of events over time and these have since been used in various other systems too."
Bar-Joseph has learnt to work closely with biologists who test the results. "If the algorithm predicts that 'a' controls 'b', for example, you can do the experiment to test whether that's true." That's important because the patterns that the algorithms reveal must be biologically relevant.
To better understand the challenges that experimentalists face, Bar-Joseph spent a sabbatical working in a wet lab doing exactly this kind of work. That taught him some valuable lessons. For example, wet lab work is not just a question of validating the model. "The results from the lab feed back into the model and enhance it. It's a two-way street," he says. Others have been impressed with Bar-Joseph's approach to experimental work. "Ziv is an example of somebody coming from the theoretical side of things and completely embracing the experimental approach," says Burkhard Rost, president of the ISCB. "It's stunning how he is able to handle such a diverse set of technical methods."
This process of feedback from biology to computer science has become an important theme in Bar-Joseph's work. One of his recent successes is in explaining the way fruit flies develop bristles on their foreheads. These bristles are like aircraft sensors, measuring temperature, wind speed, and so on. To work well, they need to be spaced in a very precise way.
The bristles grow from cells but clearly only a small subset of cells. The cells do not know how many neighbours they have or the local density of bristles nearby. So what determines which cells grow into bristles and the spacing between them?
Bar-Joseph quickly realised that this was similar to a problem that computer scientists have wrestled with for 30 years. This is the problem of determining the subset of computers in a network that control all the others. When each computer in the network is connected to one computer in this subset (but no two in the subset are connected to each other), this subset is called maximally independent.
Finding maximally independent sets is hard, particularly in large distributed networks. Computer scientists do it by assuming that every computer knows who all its neighbours are.
Bar-Joseph realised that the fruit fly cells that eventually become bristles form a maximally independent set---they are connected to all other cells but not to each other. However, they do not know who their neighbours are and so must solve this problem in a different way. His breakthrough was to work out how they did it and develop an algorithm that does the same thing while assuming no knowledge of the neighbours. "It takes a bit longer but that's the trade-off," he says.
This may have important applications for wireless sensor networks that researchers are using to monitor everything from ocean conditions to volcanic eruptions. "We only published at the beginning of 2011 so we don't know if it will penetrate the commercial world," he says.
Valencia is also impressed by Bar-Joseph's broader contribution to the computational biology community. "He is a member of the editorial board for the journal Bionformatics, so clearly his contributions go beyond this theoretical and experimental work," says Valencia. "That's very good for a young scientist."
The future holds many promising problems for Bar-Joseph too. He is particularly interested in studying how pathogens interact with cells, how the proteins from flu viruses interact with cell proteins, for example. "If we can reconstruct the networks of interactions then we might be able to determine intervention points that will guide us to therapeutics," he says.
He also wants to study the interaction networks in different species. Many of the genes in humans and mouse are similar, but drugs that work well in mouse often don't work in humans because the pathways, levels, and interactions are different. "We want to get more insight into this," he says.
That's clearly a long game. These are problems that will require dedication, talent, and endurance to solve. Exactly the kind of qualities you might find in a marathon runner.
This article is excerpted from the May 2012 issue of PLoS Computational Biology. To link to the full journal article please visit www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002535.