Will.Whim

A weblog by Will Fitzgerald

Producing n hash functions by hashing only once

To implement a bloom filter, you need a bunch of hash functions. In naive implementations (and I’ve seen plenty), programmers pick out, say, five cryptographic hash functions. One problem with this is that the hash functions for bloom filters have different requirements than hash functions for cryptography–the latter tend to be more than is required for the former. What you want for bloom filters is something that’s very, very fast, while maintaining that basic desiderata for bloom filter hashing, uniform spread.

There’s a good paper that reminds us that you can easily simulate n hash functions by having just two hash functions around. This can be as simple as this function to create the ith hash of a key, given the results a and b of hashing a key with these two functions:

hash(i) = (a + b * i ) % m

where m is the maximum value of the hash (for example, the number of buckets in a bloom filter).

But here’s a good trick not really worth a paper–but it’s still a good trick. Typically, it’s totally reasonable to limit the size to under the maximum size of an unsigned 32-bit number. These days, at least, it’s probably cheaper to calculate a base hash function on unsigned 64-bit numbers. So, you can take the upper half and the lower half of the 64-bit hashed value and return them as two 32 bit numbers.

Voila! Double hashing with one hash function. And using FNV means you have a very cheap hash function to start with, so really this can be very, very fast.

I implemented this in my bloom filter code for go.

For @mojombo.

Advertisements

5 responses to “Producing n hash functions by hashing only once

  1. Jason Davies September 4, 2011 at 2:26 pm

    Awesome tip! Is it possible to generate two suitable hashes for this using 32-bit FNV e.g. simply by taking the next iteration (XORed with zero) for the second hash? Or do they need to be more independent than that?

    I ask because I’m writing a version in JavaScript, which unfortunately doesn’t have 64-bit integer precision.

  2. Will Fitzgerald September 4, 2011 at 5:08 pm

    Jason, I think if you take the last iteration on the FNV function (save it as ‘a’) and simply do another iteration with some seed, you’ll get a second hash number (‘b’) almost for free. FNV is a very cheap function.

  3. Jason Davies September 4, 2011 at 5:24 pm

    Hi Will,

    Great, thanks, that’s what I was hoping! Do I need to worry about the seed, or is a byte value of 0 sufficient? Essentially I can skip another XOR step in this case as of course XORing with 0 is an identity transformation. Alternatively, I could use the lowest 8 bits of “a”, or something else…

  4. Jason Davies September 4, 2011 at 5:34 pm

    Here is my JavaScript bloom filter implementation, by the way: https://github.com/jasondavies/bloomfilter.js

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: