Wednesday, February 27, 2008

Of Mice and Computers

In the last of my posts about New Zealand, I'll talk about Mike Langston's talks on computational biology. He talked a lot about the style of computational biology. The difficulty of getting data unless you form a real partnership with people (who view their data as very valuable), the noisiness of the underlying problems, the need to worry about entire computation systems with memory/compute power/specialized hardware rather than individual algorithms, the compromises one has to make between theory and making things actually work, and so on. As I listened, it dawned on me, there's another area of research where I hear about similar issues -- search engines and computational advertising. The similarities sound uncanny.

The similarities reached the level of specific problems. The technical aspects of the talk were about heuristics/methods for maximum clique and biclique (bipartite clique) on real data. I've certainly heard of biclique coming up in search engine papers. Langston claimed to have the fastest bipartite clique algorithm in practice (at least on the biological data he was interested in). I wonder if there's room for crossover between these two fields?

The talk left me with a feeling that I've had before when seeing computational biology talks. The field seems interesting, but it looks like to have a real impact, you really need a lot of devotion to learning the underlying biology and making connections with working biologists. It doesn't seem like a good field for dabbling. So far, I haven't found myself ready to take the plunge and try to get involved with the field. Luckily for me, I still have other interesting things to do.

1 comment:

Suresh said...

There is one extremely important respect in which bio is very different from comp. advertising and search: the fact that there's an underlying natural process that we must model with our methods.

This to me is the most frustrating aspect of doing work in comp. bio. In essence, everything we do is judged ultimately by nature, and nature (in this respect) is extremely (for lack of a better word) arbitrary and capricious. There are few general principles, and many exceptions to every rule, and that makes any kind of modelling/algorithmic work very difficult.

The underlying problems in search and advertising are also tricky, in that at a basic level you're trying to read people's minds and predict behaviour. However, there isn't a well defined and arbitrary adversary that you're going up against.