Friday, February 27, 2009

Hash Table with Preemptive Bloom Filter

There's a useful hash table/Bloom filter trick I've been seeing across a number of papers. (If you don't know what a Bloom filter is, then I haven't been doing my job...) For the record, I'm not claiming this trick is mine; maybe it originated in the paper Longest Prefix Matching Using Bloom Filters, where it plays a key role. But I also can't recall a paper that clearly highlights the idea, so I'm doing it here.

Suppose you'll be doing lookups of (large) keys in a (slow) hash table, and many of your lookups will be for items that actually are not in the table. For example, you may be storing URLs on blacklist or dangerous packet signatures or prefixes of addresses or whatever else. Then it makes sense to keep a Bloom filter for the items in your hash table in (fast) memory. Before you do a lookup into the table, you check the Bloom filter. A false positive in the Bloom filter means you do a lookup when you didn't have to, but this allows you to avoid a lookup for a key that's not in your hash table the large majority of the time. The approach works for any type of hash table.

This should be considered a standard approach in hash table design, so I suggest we come up with a catchy name for it, so it doesn't have to be continually repeated in future papers. I'll suggest it be called a "hash table with preemptive Bloom filter", but feel free to suggest a cooler sounding name for it in the comments.

This example highlights the need for coming up with names for nice ideas that generalize to many situations. I've seen this idea described repeatedly in multiple papers precisely because nobody has thought to give it a good name.

Thursday, February 26, 2009

Visiting London

Wednesday I continued my over-the-Atlantic trip at University College London, where I gave for the first time a talk on our network coding work (blog post, arxiv link) that appears as a INFOCOM 2009 paper. The slides are on my talks page. The talk was assisted by my host Brad Karp ("no relation", most famous for his work on Greedy Perimeter Stateless Routing), who asked several pointed and useful questions that will help as this work continues. Brad's part of a growing networking group at UCL that includes Mark Handley.

Where Cambridge's charm comes from its character as a large college town and its history, London's a great big city with plenty of charm and history of its own. Both parts of the trip were great fun, in both cases because of my ever-helpful hosts -- thanks Frank and Brad! I'm already scheming to find a good excuse (international research collaborations!) to go back for a longer time.

Wednesday, February 25, 2009

Visiting Cambridge

I spent Monday and Tuesday visiting Cambridge (the old one; the University is having its 800th year celebration this year). Despite my standard aversion to travel, I couldn't possibly refuse; I was invited to give a Kuwait lecture by one of my early mentors, Frank Kelly. I was fortunate enough to take his course on stochastic models the year I spent in Cambridge after Harvard. (It was very interesting on this trip hearing Frank's thoughts about the similarities between computer network traffic problems and road network traffic problems -- besides doing among many other things some of the important early work on economic models related to Explicit Congestion Notification for TCP, Frank served three years as Chief Scientific Adviser to the United Kingdom's Department for Transport.)

Let me gush a bit about Cambridge for a minute. Cambridge is sufficiently small that even though I've rarely been through since that year, it was very familiar as soon as I stepped off the train. Besides indulging in nostalgia, like walking through my "alma mater" Churchill College (the ugliest college of Cambridge!), I did my share of sightseeing, including viewing the new grasshopper clock and a walk along the Backs. I also got to see a bit of Frank's new life as the Master of Christ's college. This included a special look at the recently refurbished undergraduate quarters of Charles Darwin (part of the celebration of Darwin's 200th birthday), as well as a peek through the semi-hidden windows in the Master's quarters into the chapel and college dining room. And for the first time I experienced Cambridge's Centre for Mathematical Sciences, a beautiful and functional setting for mathematics, bringing people from a range of different areas to the same cluster of buildings. (It's located about 3 minutes from Churchill -- why couldn't they have had this when I was here? Back then, Churchill was out in the far outskirts of the Cambridge community; according to Frank, most of the new construction is moving out of the town center, expanding the University so that in a few decades time, Churchill might actually be in the "center" of Cambridge.) I encourage interested math/CS students (who can get one of the fellowships for Cambridge) take the Part III Tripos, which provides a solid mathematical background for any future research career.

For my talks, I tried to present surveys for a wide audience, so Monday's Kuwait lecture was on Bloom filters and applications, and Tuesday's talk covered my survey of results for deletion channels. A highlight at the talks was seeing another of my role models, David MacKay. Well known for his popular textbook on information theory, David has finally finished his new book on energy, entitled Sustainable Energy - without the hot air (and gave me a copy to read for the flight home). It's a popular science book giving a scientist's calculations on what the real numbers are for the potential future of sustainable energy. I've previously mentioned the draft on this blog; I'll present a more detailed review of the final work some time in the future, but if you're at all interested in a scientific take on the subject, you should buy it, and you can even check it out first by downloading it via the book's home page.

Friday, February 20, 2009

Other Blog Writings of Note

Just some pointers to some other stuff around the blogosphere:

If you like that kind of stuff, for the last two days (here and here) the amusing FemaleScienceProfessor had been writing about students who feel a bit more entitled than they should. My memorable story is my first week at Harvard, just before classes started, a student e-mailing to ask me if I could change the class time because it interfered with baseball practice....

I seem to have not mentioned the blog for Blown to Bits, the book co-authored by Harvard's own Harry Lewis, who is writing most of the bits for the blog. If you're interested in the computer science and society issues, it's a good place to keep track of amusing things going on. (You can also download the book under the Creative Common license.)

I'm pleased to see people pointing out the STOC 2009 papers they are interested in seeing, like Luca and Sariel and Lance.

The FOCS 2009 call is up. Can't say I get the "brief description" thing myself -- as mentioned on other blogs, in the old days, we'd call that an introduction. Maybe now PC members who only read the introductions can save themselves some bandwidth and only read the brief descriptions! (Yes, that's a joke. Well, maybe.) But don't get me wrong, I applaud experimentation in our conference processes, and hope it goes well.

Hooray for more science funding in the stimulus package. Check out what the CRA blog and the ACM blog have to say.

Thursday, February 19, 2009

NSDI news

NSDI 2009 will be coming up in April in Boston. (April 22-24) Since I was on the PC, I'll plan on stopping by, but the problem with the conference actually being in Boston is it provides an insufficient excuse for skipping out on all my other duties (most notably teaching class, but also tucking the kids into bed and helping with their homework).

I was reminded about the upcoming NSDI dates because the-powers-that-be just asked me to be on the PC again, for 2010. As we all know by now, I'm a sucker for conferences that use a 5-point rating scale and kick people out of the room for any passing reason -- and I enjoyed the PC for this NSDI -- and they managed to be the first 2010 conference to ask me -- so it looks like I'll be back again...

Looking back over my mail, I asked people about a year ahead of STOC 2009 to be on the PC. They're a good few months ahead of that. Psychologically, 2010 seems just so far away, how could I say no?

Wednesday, February 18, 2009

New Blog : Matt Welsh

My colleague at Harvard, Matt Welsh, has started a new blog, Volatile and Decentralized. It will be focused on systems -- primarily sensor networks -- but will also have all of his other interesting insights on teaching, students, being an associate professor, Harvard, etc. Go check out the first few posts!

I've often wondered why more systems people don't blog. I think I asked someone once (Jennifer Rexford?) and was told, "We're too busy." I'm glad Matt's not too busy to share his work and insights with all of us.

Monday, February 16, 2009

Some Notes on Collaboration

[For part of tomorrow's CS 222 class, we'll be talking about
collaboration, inspired by the Gowers's Polymath project. It fits in
with one of the themes in the class, which is getting young grad
students or senior undergrads into research mode. While I swear I've
written something like this up before, now I can't find it, so I'm
writing it down now and it will be part of my class "lecture" for
CS222 tomorrow. Think of this as a draft of lecture notes, which --
inspired in part by Luca's online blog notes -- I'm putting online. And
following the theme of collaboration, feel free to add comments,
suggestions, or advice to students that you think is useful.]

In this class, you'll be doing a final research project, most of you
with one or more partners. I'd like to spend a little time talking
about collaboration, some tricks for doing it, and how important it
can be for you as a graduate student.

I think most undergraduates -- especially those who aren't scientists
-- have the wrong impression that science and especially computer
science isn't very collaborative. The common image is of a professor
hiding away in his office, or a coder alone with the terminal. In
contrast, those of you who have been in the terminal room late the
night before a programming assignment probably know how collaborative,
and social, coding can be. And the large majority of my papers, and
most papers written in computer science, have more than one author.

In fact, as a graduate student, collaborating successfully is likely
to be key to your success. Synergy is real. Research is inherently
non-linear; it's not how much time you spend working on a problem,
it's coming up with the right idea. And working with others, for many
people, simply leads to better ideas more quickly. A key insight
could require knowledge that each individual doesn't have alone; a
different perspective can move someone forward when they're stuck; an
idea one might be quick to discard as being the wrong path might prove
fruitful in another's eyes. And beyond that, collaborating is often
fun, and having fun while working on a problem can make people more
productive on its own. So there are reasons House has his staff,
Buffy has her Scooby gang, and even Holmes hangs out with Watson.

Some quick thoughts about collaborating. First, as a beginning
graduate student, you are probably very concerned about who gets the
credit. My advice is to put this aside. When you go into a project,
you don't know who will hit on the right path -- and it's very
possible that the thought you had that broke open the problem might
not have happened if you hadn't been working with those other people.
Think of the long term -- you'll be known for your body of work over
your lifetime, people will see what you can do over a number of
projects. You don't need to focus on specific credit for this
specific project. And you want to foster an environment where you can
collaborate with others easily and naturally. That becomes harder
when you're always worried about who will get the credit. For more
on this, see also Hardy and Littlewood's Four Axioms for Collaboration,
or the end of Fan Chung's notes for graduate students.

Another thought about collaborating -- although, actually, you can use
this idea even when you're working on a project yourself! I've
generally found that when working with others in research, people
implicitly tend to take on different natural paired contrasting roles.
In fact, I think people taking on these different roles can greatly
enhance the process -- so it may be worthwhile for people to
explicitly take on these roles (and switch off from time to time)!
The sort of thing I'm thinking about includes:

Optimist/Pessimist: One person can be trying to think 3 steps ahead,
making intuitive leaps forward, assuming other details will work out
or as yet unproven lemmas will be proven. And another person can try
to be the skeptic, making sure that the assumptions being made to move
things forward aren't completely out of line, that details eventually
get filled in, and that the proofs don't break.

Writer/Editor: When one person writes, the other should read like a
hypercritical reviewer.

Implementer/Debugger: It can be time-consuming watching over someone's
shoulder as they code trying to catch mistakes on the fly, but it can
be an effective way for two people to code together.

There are other natural pairs of roles people can take on doing research,
and I think it's useful to be aware of them so you and your collaborators
can work together more smoothly.

Now we'll talk a bit about the Polymath project, which represents an
extreme experiment in collaboration -- research via blog. Is this the
future of research, as new communication tools make group research on
this large a scale possible? Does this paradigm seem helpful or
harmful to the research process? What is the right size for a
collaborative group, and why? Let's discuss...

Saturday, February 14, 2009

Assignment Re-Use

For discussion: How much do you re-use assignments from year to year?

I think in computer science, assignment re-use is much more common than in other areas. In part that's because other sciences more naturally have "parametrized" problems -- you change the numbers, you get at least a nominally different problem. CS assignments don't always naturally fit that mold. Programming assignments can require a long time to design, and don't naturally parametrize. (Indeed, part of the goal in writing a program is that you can give the program different inputs!) With theoretical assignments, you can try to change the setting (the Kleinberg/Tardos book does a nice job trying to give different settings -- story problems -- in various exercises for specific techniques), but if you're asking for a proof, it can be hard to modify substantially. Good problems often get re-used.

A problem with assignment re-use that I'm unfortunately quite familiar with as an ex-member (and currently "on-call consultant") for Harvard's administrative board is that solutions start to circulate and copying can occur. When one student copies another within a class, it can already be difficult. (The question of whether the information was "given" or "taken" can be important in deciding the appropriate response.) What about when one student copies work from a student who has already finished the class? What responsibility, if any, does your institution expect people who have finished a class to take for not sharing their work inappropriately?

DCC 2009

The list of papers for the 2009 Data Compression Conference is available.

I haven't been to the DCC for some years. I've had some papers there, and served on the PC a few times. I remember it as a fun conference, with people from a mix of areas fairly loosely connected by the general theme of compression. It was also well organized, and pleasantly located at Snowbird, Utah, every year, so you could walk from the conference room in the hotel right out to the slopes.

I know they're very interested in papers from the "algorithmic" side of the world. With any luck, some of the projects from my graduate class this year will be on compression, and I may soon find myself with a good excuse to go again.

Thursday, February 12, 2009

Enrollments 2009, Update

Sometimes, enrollments drop the first week, as students see the first assignment, or the first few lectures convince them to switch out. (And some students start out taking an extra course and then drop one as they figure out what they like.)

This year, my numbers are holding remarkably steady.

CS 124: 84
CS 222: 24

Both classes have "officially" gained a student since the last post. In fact, more students will likely come in; I know there's paperwork in the pipeline for a few more students for CS 124; and one student in CS 222 brought in a couple of friends from MIT who may cross-register into CS 222. I suppose it will be at least another week before I have the final numbers.

Of course, assignment #1 for CS 124 is due today. That might have the effect of getting a few students to drop...

In any case, the huge jump from last year for CS 124 (again, described in the last post) has held up. I'd like to understand how much of that is due to cyclical increases in CS enrollments, and how much of it is due to our new introductory class bringing people into CS. Please comment on whether or not you're noticing increases in enrollments, and their size, in your classes (or department more generally).

Wednesday, February 11, 2009

Polymath Project

Reiterating some other blogs, I point to the Tim Gowers's project, asking the question "Is massively collaborative mathematics possible?" Like others, I intend to show it to students, and discuss the ways in which it is like -- and unlike -- less massively collaborative research.

A complaint I often hear among faculty is students in Computer Science (and, I would imagine, mathematics) don't realize how social research in CS (and mathematics) usually is. The stereotype of sitting along in your cubicle working on code all day is out of touch. CS papers generally have many authors; it actually seems research in the liberal arts is less interactive, with people toiling, solo, on their own book projects. If the project does just a little to correct that impression, it will be a success in my mind.

Tuesday, February 10, 2009

Another New Blog

Another new blog, this time by Luis von Ahn. I'm deeply pained by the fact that besides being brilliant, he's apparently very funny. I laughed out loud from this post on Real Men of Genius, and have often wondered (out loud, even) why I don't get tips.

An Article on Genius

While I don't usually discuss things I've read, this article on genius from the New Yorker a few months back actually seems relevant to our community. Academic genius, in the stereotypes I know, burns brightly young. Can we foster the other type of genius discussed in this article within the competitive academic structure?

Comments from Jennifer Rexford

Just in case people might miss her comments, I asked Jennifer Rexford for her input on conflict of interest issues, and she added some of her thoughts to a previous post (end of comments of Part III).

Also, she pointed me to related discussions at the SIGCOMM blog.

http://blog.sigcomm.org/2008/09/fairness_of_sigcomm_conference.html
http://blog.sigcomm.org/2008/09/openness_of_the_sigcomm_confer.html

I think it's worth pointing out -- the SIGCOMM community has a blog. And they discuss as a community issues like how conferences should be organized on it. I admit, it all seems nascent still, but interesting nonetheless.

Monday, February 09, 2009

STOC PC Wrapup

I think -- hope! -- this will be my final post on STOC for a while.

As my posts have made clear, as STOC chair I chose to experiment with a few things. Indeed, even posting about the PC process as much as I have on this blog has been something of an experiment. So now's a time took back and reflect on lessons learned.

To summarize some of my previous posts, here are some recommendations for the chairs of the future (or the steering committee) :
1) Move to co-chairs instead of a single chair.
2) Increase the PC size -- a small difference is worth a lot in coverage and reducing papers/PC member.
3) I find the 1-5 scale helpful in clarifying the easy accept/rejects and papers with great differences in opinion. It also helps to check that PC members are being consistent.
4) Decide your conflict of interest policy early, make it clear to the committee, and prepare for its implications.
5) Keep the meeting running smoothly; insist that if there's agreement on a paper that discussion is limited.
6) Automate as much as possible; find software you like working with.

I also have observations for authors and other members of the community.
1) When submitting an short abstract (like those that are being distributed with the list of accepted papers), just submit plain text. Your text is just being put in a database and moved around at various points. Its main use is for someone to skim it to see if they want to review the paper, or later, to attend your talk. Any special characters are likely to mess up some system at some point; Latex should be kept to a minimum.
2) Start earlier and have someone proofread your work. Errors in a proof -- even trivial errors -- are a sure way to reduce confidence in the acceptability of your paper. Other errors or typos create a bad impression. And a poorly written intro does hurt -- if we can't understand your contribution clearly, it's hard to accept the paper.
3) Most importantly to me, please don't ask annoying general questions like "When will accept/reject decisions come out", "When will we get the reviews?", and "When will the list of accepted papers go up?" Anyone who asked me one of those questions -- even anonymously on the blog -- please know that I figure you owe me a drink, and I expect you will pay up at the conference or your first opportunity. The answer to all these questions is "As soon as possible." In some cases there are administrative things going on you may not be aware of (reviews being finalized, authors changing titles or abstracts, etc.). In some cases the delay is simply that I have a life and a job that are taking priority. In any case, I can explain to you further why it's rude to ask such questions, but if you buy me the drink, then I'll probably happily spare you the explanation.
4) If you do have a special question or request -- like you need to schedule your talk on a specific day -- a little politeness, and patience, goes a long way. Remember, there are a few hundred of you, and just one of me... (And seriously, if you need to schedule your talk on a specific day because of a conflict, let me know NOW!)

Sunday, February 08, 2009

STOC PC Meeting : Part V (Software)

I used Shai Halevi's conference software for STOC. I must admit, I was swayed by Shai's willingness to host the software and help out in a pinch -- I didn't want to deal with Harvard's IT staff being off-hours should a disaster happen. And while no disasters happened, Shai was a great help whenever anything did come up. I (and the rest of the community) owe him thanks for volunteering his time specifically for STOC, as well as more generally for creating the software. I really can't imagine running a PC meeting without such software.

Shai's software seems very similar in spirit to HotCRP, which I've been using recently as a PC member for some other conferences. I like them both. I'd have to say HotCRP seems a bit more polished, like it has gone through some more iterations, and has some more features than Shai's software. On the other hand, if you like control, you can apparently readily modify Shai's code (and the resulting conference database) directly to get what you want. I prefer them both to my experiences with EasyChair. If you're going to chair a conference, you should definitely check out Shai's page.

It might be worthwhile for SIGACT (or some other appropriate group) to look into having a common hosting platform and support for this sort of stuff. (Someone else I knew was using HotCRP for her conference, and when I asked how she set it up, she said that USENIX, the sponsor organization, hosted the conference software for them...) A standard default configuration maintained centrally would avoid having to "re-invent the wheel" each time someone takes over the conference. Or perhaps in the end we'll just leave it to EasyChair, which offers a perhaps less appealing standard default, but at a good price (free, currently).

Friday, February 06, 2009

STOC PC Meeting : Part IV: (Conflicts of Interest II)

Back to STOC stuff, continuing my past post on conflicts. While the theory of how to handle conflicts is one thing, the practice is another. How did things work in practice?

For those who say it's mildly annoying, I admit I agree. People having to leave the room is obviously less appealing than people not having to leave the room -- I'm not going to argue that. But really, I think the annoyance is minor, at worst. In most cases, zero to two people had to leave the room. People seemed to pay attention and get up and leave when they had to. (Arguably, people pay more attention to what's coming up when there are conflicts, which is actually a plus.) We could have been better about remembering to call people back in promptly. With practice, I don't think that would be a big deal either.

I didn't feel that we lost out on needed expertise because of the policy, but again, the implementation was more flexible than in networking conferences I've been involved in.
One PC member insisted on leaving the room for a discussion and vote because of a conflict, but we insisted they stay to answer our questions first!

Overall, I did not find it overly disruptive. And I think such a policy greatly reduces or even avoids worst-case scenarios where a PC member -- intentionally or unintentionally -- biases the outcome of a paper where they have a conflict. I've been on many PCs over the years, and I've seen it happen more than once. No, I'm not saying such happenings are rampant -- I think they're rare -- but I think they could be rarer still. To me, the annoyance involved is a small price to pay for that. Also, I think even if you disregard the possibility of someone intentionally pushing a paper where they have a conflict (which, actually, does happen), on the whole people underestimate the power of unconscious and unintentional bias.

Paul Beame (in comments to the previous post) says that people should be allowed to stay in the room, but remain silent and declare conflicts as they arise; and similarly, that software shouldn't block people from seeing reviews/discussions on conflict papers. I think he offers a consistent alternative, but I (as he knows :) ) disagree. His approach seems designed to avoid any possible manipulation of decisions by PC members that may have a conflict, which of course is all to the good. In practice, in my experience, that silence rule is not generally maintained. (For a variety of reasons, many of them laced with good intentions. It's hard for experts to stay quiet, particularly when a colleague/student/etc. is involved.) He ignores the issue that simply having the person in the room may stifle some discussion in the PC meeting. (Junior people can be and are often intimidated, to various degrees, by senior people in such a setting. We can argue about whether they should be, but in practice, they can be.) And finally (and most importantly), he's dismissing the issue of information in reviews or discussions getting back to the authors. (Yes, discussions are confidential -- as are the parts of the reviews labelled "to the committee" instead of "to the author" -- but as we've read in the previous comments, there can be a fair amount of leaking of information after the fact.) My thought is that when you say you have a conflict, that precisely means you SHOULD NOT see the reviews or hear the discussion for the paper -- the networking standard -- unless there's an important overriding reason for you to do so. That prevents you from intentionally or unintentionally leaking supposedly confidential information.

And to be clear, I think the "we need the expertise" argument is overhyped. In cases where it's needed, I would agree exceptions need to be made. In most cases, it's not needed (there are other people in the room capable of making the judgment), but people don't want to be left out of the decision process where they are also an expert. There's a difference.

I have no regrets about implementing things this way. Others disagree. Overall, I think it's a topic worthy of more community discussion. In particular, the use of conflicted subreviewers that also came up in comments on the previous post really deserves some attention. I think a more consistent and thought-through standard should apply. Our community may have it's own standard, but as a community I think more discussion including ackowledgment and understanding of the pros and cons of the various possible approaches would be useful.

STOC Links

Here they are.

Titles and authors: http://www.umiacs.umd.edu/conferences/stoc2009/titles.shtml
Abstracts also (html): http://www.umiacs.umd.edu/conferences/stoc2009/abstracts.shtml
Abstracts (raw text): http://www.umiacs.umd.edu/conferences/stoc2009/abstracts-raw.txt

Thursday, February 05, 2009

Enrollments, 2009

CS-124, Algorithms and Data Structures

2008 : 36 students
2009: 83 students

That 83 will probably deviate by a few after the last add/drops shake out, but it's a somewhat surprising doubling+ of the class size. That's what we get for introducing a new, dynamic instructor for the intro CS course last year, who seems to have boosted CS enrollments across the board (taking effect this year). I was expecting 60-65.

By the way, 2008 was the lowest enrollment I've had, by far, although the previous few years had been in the 40's. My biggest year, back in the boom, was 90.

I'm double-teaching this semester, and my graduate "Algorithms at the End of the Wire" class has 23, which is pretty standard. (I think the biggest year it had more than 40... and given this enrollment boom, that's what I'll expect NEXT time I offer it.) It might lose one or two but that should hold steady. I admit I'm pleased to look around the room and see a lot of the top undergraduates I know taking the course (which was one of the reasons I arranged to double-teach this semester; I wanted them in the class before they all go and graduate...)

Wednesday, February 04, 2009

Rule Stupidity

Note: Update 1 Below

I thought I'd give my readers a STOC-break and discuss beginning-of-the-semester frustration with a stupid Harvard rule.

Setup: Math 123 (2nd semester algebra) and CS 124 (my algorithms/data structures course) are at the same time this year. Both are important prereq courses for many other courses. There are a small number of joint Math/CS majors, who, naturally, really want to take both. (They're both only offered spring semester.)

Now, since my class is recorded and put online for the Harvard Extension School -- usually now within 24 hours -- and Harvard students are given access to all these videos, this wouldn't seem to be a problem. The students take both classes and just watch the videos for mine. Is it ideal? Of course not. But it's better than making them wait a year to take one of these important classes. I do have Teaching Assistants and sections and office hours and TA office hours and all class notes online and such so they do have resources besides the recording. There are other details -- arranging final exams, for instance -- but I (and the math professor) have expressed our willingness to deal with that.

The administration says no.

Apparently, the rules on the books for "simultaneous enrollments" (taking two classes at the same time) won't allow for recorded lectures, and despite my plea for an exception in this clearly suitable case, the rule is, apparently, the rule.

Except (as I've also informed the rule-enforcing-body) it's really not. Because in this case the students will simply sign up for an independent study with me, CS91r, and have the content of their independent study be the class CS 124. This is less than ideal for everybody -- the students don't get proper credit for the course they're taking, I don't get the TA-resources associated with the students, and there's more paperwork for everyone. But it seems clear to me (and, apparently, the students, who asked me to do this) that this is in the students' best interest, so that's what I'll do.

But still, the rule-enforcing body says, the rule's the rule, and they'll enforce it.

I'm sending them all a link to this post so they can defend themselves if they choose. Feel free to comment with your own stories of stupid rule enforcement at your university...

Update 1:

I'm happy to say that I've had further conversations with the powers-that-be, and I think we've come up with a mutually acceptable interpretation of the rules that will allow the students to register for the class. I have to thank the powers-that-be for working with me, and on reflection, it undoubtedly wasn't because I got angry about it, but simply because I was consistent in asking if there was some kind of solution. It's still got to pass through a vote of a committee, but I'm optimistic, and I certainly feel they listened to my concerns -- which doesn't always happen with a bureaucracy!

Tuesday, February 03, 2009

STOC PC Meeting : Part III: (Conflicts of Interest)

I introduced another change for the PC meeting for STOC this year. My experience is that theory conferences are somewhat "loose" about conflicts of interest. What I mean is that while PC members can't submit papers, and it's expected that you don't review papers of current (or recent) students/advisors, other than that conflicts of interest aren't usually treated as a big deal. As a contrast, it's standard for the networking PCs I've been on to be at least as rigorous as an NSF panel; for example, if a paper has ANYONE at your institution as an author, you have no electronic access to the reviews and you're expected to leave the room during the discussion (and similarly for other "standard" conflicts of interest, including any collaborators from say the last two years). In my experience, on theory PCs, people don't leave the room just because they have a conflict. It might be expected they'd not comment, although it's not always clear to me this expectation is followed.

Indeed, I remember my first SIGCOMM PC -- the first hour or two I kept wondering why people kept coming and going in and out of the room every time a new paper was discussed. It seemed like a lot of people were off to get coffee when they weren't a reviewer on the paper. Someone explained to me that these were the people with conflicts, and I was shocked at what was considered a conflict. (I figure anyone from UCSD or Microsoft currently has to leave the room for at least 1/4 of the papers on any networking PC.)

Interestingly, when I mention this difference in "style" to people, networking people seemed shocked (and maybe a little horrified) by how theory PCs handle conflicts, and theory people seem shocked by how networking PCs handle conflicts.

[Aside: it's a bit interesting to think about further this in light of the comments on my previous post on sending scores, where many people note that many authors can get "inside information" on their paper from PC members after the fact, even though the meeting is supposed to be "confidential". This post won't cover that further -- it was actually written before the other post!]

It's a bit hard to imagine utilizing such a strict conflict of interest polity for a theory PC. The community's a bit smaller, we're highly collaborative, and the papers can be so specialized that if you stick hard and fast to conflict rules you might not have enough PC members who are really suitable to judge a specific paper. Another negative is that it does cost time -- there's a switching cost when people enter and leave the room.

But I asked the PC for a straw vote, and a large majority of those with an opinion thought it was a good idea to have people with conflicts of interest leave the room, primarily for the obvious reason that it's a lot easier to have an open discussion when you're not worried what some people might hear. (Not surprisingly, younger PC members on average seemed to think this was a bigger concern.) I should be clear, however, that this idea was quite controversial; many PC members expressed a very clear and vocal dislike for a policy that had people leaving the room regularly. Indeed, it was definitely the most controversial decision I made as the PC chair.

In order to be practical, I made it clear that I understood there would have to be exceptions, as needed, to deal with special cases, since I didn't really plan for this policy from the beginning (including when making paper assignments, although I did ask PC members to mark conflicts with the software when ranking papers they wanted -- this did not seem to be treated uniformly and universally by the PC, again because it's probably not standard in theory conferences). But the policy I planned to implement was essentially the following: if you had any standard conflict on a paper, and there wasn't a very good overriding reason for you to be in the room (e.g, you were an original reviewer, or you had a specific expertise the committee needed to evaluate the paper), you should leave.

In the next post, I'll talk about how that worked.

But before that, I think this is a matter the community (or, to the point, conference steering committees) should address. My personal take over many, many PCs is that while the strict conflict interpretation used in networking conferences may not be completely workable for a STOC/FOCS type conference, if I had to choose, it's clearly better than the "We'll assume everybody will behave properly" approach taken at most theory conferences. (Many theory conferences take a strict line on PC members not submitting, but then do essentially nothing about other possible conflicts -- that just doesn't seem right.) I think something in between these two extremes are possible, where exceptions based on needed expertise is possible, and that's what I tried to implement, but it could use some more careful thinking through. What do you all think?

Monday, February 02, 2009

STOC PC Meeting : Interlude

First, I'd like to remind everybody that the day before STOC this year, instead of tutorials or other activities, we'll be having a special event for Les Valiant's 60th birthday on Saturday, May 30th. There will be talks by a number of people to commemorate his tremendous body of work. The organizers are currently trying to finalize arrangements so that the event is free to everyone, but if needed there might be a nominal registration fee to pay for coffee and such. Confirmed speakers include:

Jin-Yi Cai
Steve Cook
Vitaly Feldman
Mark Jerrum
Mike Kearns
Michael Rabin
Rocco Servedio
Paul Valiant
Vijay Vazirani
Avi Wigderson

We hope you will plan on attending.

Second, I sent out the accept/reject notices. Comments coming later this week.

Sunday, February 01, 2009

STOC PC Meeting : Part II

For this PC, I asked reviewers to use a 5 point scale, corresponding to
1: Bottom 1/2 of submissions.
2: Top 1/2 but not top 1/3 of submissions.
3: Top 1/3 but not top 1/5 of submissions.
4: Top 1/5 but not top 1/10 of submissions.
5: Top 1/10 of submissions.
I'd like to reflect on how that experiment worked.

Overall, I think it worked well. One plus it that I think the scale makes it very easy to find the bottom half of the papers (easy rejects) and top 10-15% of the papers (easy accepts), so that less time needs to be spent discussing those papers.

On the other hand, on day 2, we were left with a bunch of papers with scores with about a 3 average. This makes sense -- since we accepted about 25% of the papers, papers with about a 3 average were, by definition, borderline. In short, a grade of 3 could mean, "I like the paper but it's a reject" or "I like the paper and it's an accept."

One solution might be to tweak those percentages (an experiment worth trying) to better match the acceptance boundary. But, at the end of the day, I think the fact of it is that borderline papers are hard -- that's why we still have face-to-face PC meetings. No matter what voting system you use, these papers are the hardest to deal with. When you get down to these papers, the real question is, "Do you want to accept the paper or not?" I think a mechanism in the review software to allow a second round of voting -- corresponding to the question, "Conditioned on this being one of the X papers left we have to decide on, do you think we should accept or reject?" would be useful and would have saved us some time. In practice, we just did that verbally (approximately) in the meeting (as part of the discussion).

I think there are other advantages of this 5 point scale. When a PC member isn't following the scale -- say assigning much less than 1/2 of their papers scores of 1, or much more than 20% scores of 4 and 5 -- it's essentially immediately apparent to everyone. That's more transparent than the 10 point scale. (One can always use software that "re-calibrates" individual's scores to some sort of baseline -- that also works, but I think is much less transparent.)

To me it's just clear the 5-point scale approach must be better. At the end of the day, we have to make a binary decision on each paper. This scale gets us most of the way there, while giving enough room to distinguish the best papers and papers that need more discussion. I would use it again as a chair, and I prefer it as a PC member as well.

There's one downside to this scale -- which I'd appreciate comments on. Do we send the scores with the reviews, or not? It can be disheartening to get back scores of 1. On the other hand, it's always annoying when your paper is rejected, and I think scores provide useful feedback to the authors. (If you got all 1's and 2's, perhaps you should consider a conference other than FOCS for re-submission.) Several PC members said we shouldn't send the scores with the comments. I think we should -- of course, I'm used to getting scores of this form back from networking conferences. What do you think?