Wednesday, August 06, 2008

On Simulations

I've been coding up some simulations for some Allerton papers that are due all too soon. Of late I've depended far too much on my (now former) student Adam Kirsch to take care of doing our simulations, but he's graduated, and they're needed, so off I go. (Adam's graduating is all to the good, but clearly, I'll be missing him, especially when it comes time to write simulations.)

I'm always amazed at how averse theory people seem to be to doing simulations. I find them useful for generating ideas and thinking about problems in the early stages -- cutting off wrong directions and giving insight into the right ones. If you don't like doing simulations for such purposes, because it doesn't work for you, or you're clever enough to not need data, I have no issue with that -- people work differently.

But I also use simulations as a way of checking my work. If I have a theorem that says that a random process will behave a certain way, and it's possible to code a simulation of the process, I'll check my theorem with code. If the theory and the code don't match up, my assumption is that something is wrong somewhere, and the result is not ready until the two match or I know why they don't. Surprisingly, I think it's about 50-50 as to which I end up finding is wrong, the code or the theorem. (By the same token, if I don't have a theorem, and there's more than one way to simulate a process, I'll code multiple simulations, and make sure they match!)

Of course not all results can be checked by coding something up -- but many can. Particularly in the study of random processes, which is my area. And it's clear to me that many researchers don't check by coding -- because I (or students working with me) have several times found mistakes by doing a simple implementation and finding that we get different numbers out than the paper gives. Generally the mistakes aren't "fatal" -- usually a constant is off somewhere, and often eventually the O notation will take care of it -- but of course it is grating as a reader when something in a paper is plain wrong and you're left to figure out why. When someone doesn't do a check-by-code, I must admit, it lowers my trust of and my overall opinion of the person's work. Sure, people make mistakes (myself included) -- but if you're ignoring a straightforward approach for checking your work, that doesn't inspire confidence.

I imagine some theory people are so out of practice coding they "can't" do a simulation. (But hey, that's not really an excuse, that's what grad students are for...) And others probably just consider it a waste of time. If you really are good enough not to need to check your work this way, more power to you. Me, I'll get back to the (admitted drudgery of) coding my things up and seeing if they work the way I think they should....

16 comments:

Anonymous said...

Full quotation. Simulations help
understanding the problem and moreover to prove you are at least
not wrong when you write theorems.
As a side note, since people publishes
theorems, I think they shall
also publish the source code
(e.g. on their webpage).
The purpose is twofold: to help
others take confidence with your
theoretical frameworks, saving
time for future workup, and to
keep alive the confrontation between theory and practice.

Mark Wilson said...

Worst-case bounds for algorithm performance can often be shown up as pretty flabby by some basic simulation, but that doesn't seem to be done by most authors.

I agree that when theoretical results are corroborated by computation, the source code and results should be made available publicly and referenced in the paper. Perhaps there should be some sort of policy by journals or SIGACT, for example, on this.

Craig said...

"Worst-case bounds can often be shown up as pretty flabby..."

My experience is somewhat different. As a practioner who uses theoretical work, the typical problem is that the simulation is much more optimistic than reality. My favorite rule is from colleagues who do switch design -- "the Devil gets to pick your input patterns" and the challenge, of course, is to figure out what the Devil would pick (usually it is the output of a nasty, hard to simulate or analyze, multi-state process). And it locks you right against the worst-case bound.

Anonymous said...

My cynical opinion is that the arguments in this thread beautifully explain why most theorists don't run any simulations:

(1) You might find theoretical mistakes. Why take that chance? In a field where conferences don't publish retractions, there's no shame in making a mistake in a conference paper. If nobody catches the mistake, then so much the better. If somebody catches a fatal mistake, then at least you got one extra conference paper out of it. (Non-experts will never know, and experts probably won't really hold it against you, since anybody could make a mistake.) If somebody catches a correctable mistake, then you silently fix it in the journal version (if any) while vaguely thanking the person who caught it for "helpful conversations" in the acknowledgments.

Overall, if you don't want people to make mistakes, then you have to penalize them for it. There's enough strategic behavior on the part of students seeking jobs that you can't just count on them to do the right thing. (Of course, most people try to do the right thing, but even a minority can have a big effect on the field.)

(2) Mark Wilson's comment about flabbiness is quite right. If you prove a quadratic bound while simulations suggest the algorithm really runs in n log n time, then your result looks a little less impressive. It's rare for the theoretical analysis to be quite as strong as the simulation results, so including simulations in your paper is practically an advertisement for the weaknesses of your analysis.

alex said...

Regarding (2): this should be
exactly the point of distinguishing
theory and practice and a step
towards living with the fact that
no big-Oh notation is as good
as it seems. Everybody lives with
it outside the theoretician community, why shouldn't the community itself.

Anonymous said...

Isn't (2) really a positive thing? If you can prove a quadratic bound, but simulations show n log n, you can say that we show a worst case n^2 bound, but the algorithm is even better in practice! It is only an advertisement for your algorithm.

Anonymous said...

^^

Haha, exactly! Thats what I thought too when I read it :)

Michael Mitzenmacher said...

I think it's a very interesting open question whether a paper that had results of the form: "We design an algorithm with O(n^2) performance. We do not have a corresponding lower bound. Experimental results suggest the performance is O(n log n) on many cases of interest." would be considered "desirable' at a high-level (FOCS/STOC) theory conference. I'd like to think so -- I certainly think it wouldn't be an issue for many more practically oriented theory conferences. But I wouldn't be surprised to see a review for such a paper that said something like "Remove the experiments. They're a waste of space and add nothing to the results," or even "While the O(n^2) result is nice, the authors themselves seem to suggest that the right answer might be O(n \log n). They should resolve this issue by either proving this bound or providing a lower bound before publication."

The problem is I don't think the community in general appreciates simulation work, and there's not really a standard as to when such work is desirable or acceptable as an adjunct to the theory. In networking, it's more clear -- such simulations are almost by default required (indeed, to the extreme of whether they actually add meaningfully to the results or not).

I'm enjoying the debate and hope for more comments on this.

Anonymous said...

Michael, I fail to see the debate that you are enjoying. There's no theorist is in this "debate", just some confused youngsters who like to bash what they think is "theory."

Your position on this is sufficiently well known in the theory circles, to the point that everyone is bored. It is also not a constructive position. You need to have a lot more standing in theory before your somewhat radical positions will be treated seriously. Say, if someone like Feige was telling us to include more experiments in STOC/FOCS papers, then there would be a debate.

PS: This comment is not meant as a personal attack, though I can see how it lends itself to such an interpretation. I'm just saying that any realistic attempt to change a field begins by convincing those guys that you can be one of the best by their own rules, and only then proposing new rules.

Michael Mitzenmacher said...

Anon: I suppose I believe it's more effective to persuade by argument than by force of personality. While I would agree with you on Feige's stature in the theory community, I doubt a pronouncement by him would be any more (or less) effective than a pronouncement by me about what FOCS/STOC should be like. Convincing the people who serve on PCs what is acceptable is the main way to enact change at the conference level -- and there I expect younger people will be either to persuade, again by argument and (where possible) by example.

Anonymous said...

"I doubt a pronouncement by [Feige] would be any more (or less) effective than a pronouncement by me about what FOCS/STOC should be like."

I'm not a great fan of proof-by-reputation either, but I think you are suffering from a bit of Harvarditis here ;)

Anonymous said...

Michael, I agree with you in general and it certainly makes sense to perform and/or include simulations for certain, specific types of results. However, I would guess that a large majority of theory papers are either not amenable to simulations at all (e.g., lower bounds, crypto, coding theory, complexity, learning theory) or would not benefit from having experimental results included (e.g., algorithms papers where the improvement doesn't show up until impractically large instance size).

In fact, here is an experiment: what fraction of papers in the last few STOC/FOCS proceedings would you say would be helped by simulations? (One can ask the same question for SODA, where I expect the percentage would be larger but still not a majority.)

Michael Mitzenmacher said...

Anon 12: I would certainly agree that some, or perhaps even many, theory papers are not suitable for simulations; but I'm skeptical it's a large majority. Of course, obviously I'm biased as I work in areas where simulations, I think, should be more common.

As for your experiment, I'm not following FOCS/STOC enough to comment on the fraction that could benefit from simulation. But based on your comment I went to the FOCS 2008 home page to take a look, since I feel you've "called me out". The first paper that caught my interest, k-wise independent graphs by Alon and Nussboim, would seem to be improved by simulation. Indeed, in their abstract they claim as a motivation for looking at k-wise independent graphs that handling large G(N,p) graphs becomes infeasible and "cheaper" graphs such as k-wise independence graphs must be used instead. (I'm essentially quoting them; please see their abstract.) It would have been nice to see some evidence to back this statement up somewhere in the paper, by simulation. Simulations might also have been useful to test the accuracy of their statement and check when the asymptotics kick in as well.

Going down further in the list, I saw the Succincter paper by Patrascu, a paper I had looked at before since it deals with compression, an area I like. (Indeed, Mihai, if you're reading this, you should cite my Compressed Bloom Filter paper in the journal version, where I essentially ask for the solution your paper provides...) This is a paper that (to me) screams for implementation results. I've put it on my personal short list of "good implementation projects for undergrads"; I'm curious as to how well the methods work in practice, and the paper, naturally, gives no real clue.

So of the papers I've looked at, which are in areas I'm interested in, I think 2 for 2 could benefit from some simulation work.

To be clear, I'm not saying these are bad papers in any way. Indeed, the point is that they're good papers -- they interest me. I think they could be better papers -- both more interesting, and more useful to more people -- if they had some additional simulation work as well.

Anonymous said...

I'm just saying that any realistic attempt to change a field begins by convincing those guys that you can be one of the best by their own rules, and only then proposing new rules.

This sort of attitude is exactly why so many things in academia that should change don't. Aside from giving a good rationalization for not listening to anybody, it's simply a false observation about change. Worse yet, it's false in both directions: this strategy is neither sufficient nor necessary for effecting change.

Establishing an incredible reputation doesn't help much in changing what a field values (witness Madhu Sudan and his aliens paper). If Feige announced that FOCS/STOC papers should feature more simulations, then a few people might listen to him, but most would ignore him, and graduate students would go around talking about how poor Uri was getting soft in his old age.

In the other direction, systemic change doesn't generally come from brilliant scientists convincing their colleagues through pure reason. Instead, it comes from political machinations. For example, the prevalence of open access publishing in biomedical research didn't come about by personally convincing scientists that it was ethically or scientifically necessary. Rather, it came about because NIH started requiring it for all grant-supported research, after a big fight.

The upshot is that if you want to change the way CS theory is done, it won't be easy to do it by convincing individual theorists that there's a better path (regardless of whether you are right, or for that matter whether you are Uri Feige). It may be better to focus on convincing hiring committees, or tenure committees, or NSF panels, that their standards and goals should shift a little. Wherever the funding and jobs go, the research community will follow.

There's no theorist is in this "debate", just some confused youngsters who like to bash what they think is "theory."


Actually, I'm commenter #5, and I'm a theorist, as well as no longer so young. I freely admit I'm no Uri Feige, though (I think it's safe to say the few Uri Feige-level theorists don't waste their time commenting anonymously on blogs).

Daniel Lemire said...

While I agree with Michael, I'd like to point out that in fields where people do run experiments, you get terrible mistakes also. And they are as baffling (or more baffling) than when no experiment was done.

Someone implements some algorithm and says that some measured constant is 1.222. You implement the same algorithm, and get a constant of 1.444. Where is the catch? If you cannot compare the implementations, it is very difficult to even publish a note saying there is a contradiction. You can try emailing the authors, but good luck getting access to their implementation! (There were studies done on this topic and most authors will refuse to provide you with their implementations!)

So, in my opinion, it is not enough to run experiments, you must also publish your source code. If you made a mistake, make sure everyone can see it. That is real science!

As for implementation being a waste of time... Knuth was not afraid to get his hands dirty and even maintain the code he published.

Finally, the idea that only people at the top of the food chain, the very people who benefit the most from the current system, can or should change the system is wrong in so many ways... that's called cultivating biases. That's also a fallacy because changes almost never occur because an old man decided to reform the system.

Craig Partridge said...

"Convincing the people who serve on PCs what is acceptable is the main way to enact change at the conference level..."

In my experience in ACM and IEEE, this approach sometimes works, but is a hard slog and often runs into institutional resistance.

The better solution is to create your own conference with like-minded fellows. If the paper produces wonderful papers and exciting results, it can change the field. I've seen it happen at least 4 times over the past 15 years in data comm research. (I'll note the hazard here is that the conference almost booms -- is big enough to be financially viable but never quite achieves enough oomph to become one of the top 3 or 4 conferences in the field -- and then it lingers on the calendar for years).