Wednesday, March 11, 2009

New Cuckoo Hashing Paper

Moni Naor nicely pointed me to a new preprint he has up on de-amortizing cuckoo hashing. (The paper is entitled De-amortized Cuckoo Hashing: Provable Worst-Case Performance and Experimental Results, by Arbitman, Naor, and Segev.)

The "starting point" of the paper was the technique for de-amortizing cuckoo hashing Adam Kirsch and I suggested in our 2007 Allerton paper, which involved using a content addressable memory as a queue for move operations when doing cuckoo hashing. (See the blog post I wrote way back when here.) Our work was experimental, as we didn't analyze the queueing process, which is based on the number of moves needed when doing an insertion.

Their new paper obtains provable results, showing that there a cuckoo hashing scheme that has constant worst-case insertion time (with high probability over any polynomial number of steps), by utilizing a logarithmic-sized spare space (to queue pending moves, and stash items that can't be placed in the table, a la the Kirsch/Mitzenmacher/Wieder stash technique described here). They even show that only poly-logarithmic independent hash functions suffice, using a new result from Braverman. The only big caveat is that they analyze only the case of 2-choice cuckoo hashing, where the structure of the underlying random graph related to the cuckoo hashing process is better understood. Because of this, the result only allows loads of up to 50%.

It's a very, very nice result with a lot of theoretical pieces to put together. The use of the Braverman result appears to be a general technique for lessening the randomness requirements of these sorts of problems that is interesting in its own right, never mind the useful advance in the theory of cuckoo hashing. I was also excited to see that they used simulations to enhance their theoretical results (for example, providing a conjecture on how small a constant number of moves would be required at each step -- apparently 3 suffices, a quite reasonable number!) and show off how practical the scheme is. Imagine, a theoretical paper with simulations! (Hooray!)

Overall I think the paper is a strong step forward in demonstrating the practical applicability of cuckoo hashing, while at the same time providing strong formal theoretical guarantees. I still believe cuckoo hashing (along with multiple-choice hashing) will become a (the?) standard approach for hashing in the years to come, so of course I'm excited to see results like this. There remain plenty of further open questions -- more than 2 choices? tight bounds on queue sizes? -- so don't worry, there are still interesting things left to do.


TheAlchemist said...

I can't wait to actually download some code! =)

Pat Morin said...
This comment has been removed by the author.