Content area
Bloom filters are widely used in modern data systems to accelerate membership tests in databases, storage engines, and networking stacks. On modern CPUs, Bloom filter performance is typically limited by memory latency rather than computation: each query issues k seemingly random probes over a bit array that is often much larger than the last-level cache (LLC). Most prior work on making Bloom filters faster has focused on changing the data structure itself to align with the modern cache hierarchy. We take a complementary approach and ask how far one can go by changing only the query strategy.
First, we show that reordering the k probes for each query–either by fully sorting them (cache-oblivious) or by partitioning them around a cache-sized pivot (cache-aware)–transforms the aggregate access distribution from uniform to strongly skewed toward low indices, greatly improving cache reuse without modifying the Bloom filter representation. Second, we refine the hash generation process to behave more like a low-discrepancy sequence over the filter, reducing the fraction of queries whose probes all fall outside the LLC-sized region.
Across a range of filter sizes for negative-query workloads, our designs reduce LLC-load misses by up to 8.3× and improve end-to-end runtime by up to 1.45× relative to a conventional Bloom filter, while remaining competitive with specialized cache-aware structures such as split-block Bloom filters.