An Extensive Benchmark of C and C++ Hash Tables

Jackson Allan · Published 29 May 2024 · Updated 2 June 2024

Introduction: Why Another Hash-Table Benchmark?

Although thorough hash-table benchmarks have already been published by others in recent years, I have decided to contribute another benchmarking suite for several reasons. Firstly, the existing publications concentrate on C++ tables, whereas there is relatively little coverage of C tables. Consequently, C programmers tend to be directed towards older, well-known libraries despite the existence of newer libraries that may offer superior performance. Secondly, the results reported by the existing benchmarks are mostly cumulative, so they do not clearly show how the tables tested perform under specific conditions such as the table’s current load factor. Moreover, some existing benchmarks test only one key type or bucket size, which makes it hard to draw general conclusions about a given table’s performance. The final motivation is a personal interest: I recently released a novel generic hash table in C (via Verstable and CC) and would like to showcase how it measures up against both the leading C++ tables and other tables available in the C landscape.

Click here to skip to the results.

Benchmarking Setup

The benchmarks include three hash table configurations designed to measure the tables’ performance under diverse conditions:

All the tables benchmarked are configured to use the same hash functions, namely the MurmurHash3 finalizer for integer keys and FNV-1a for string keys.

The maximum load factor for all tables is set to 0.875. Hence, and because all the tables—except for std::unordered_map under GCC—grow by doubling the bucket count once the maximum load factor is reached, the results show the performance of each table across the load-factor range of approximately 0.44 (the troughs in the graphs) to 0.875 (the peaks). I chose this limit because it is relatively high and several tables with hard-coded, non-customizable maximum load factors (namely absl::flat_hash_map and boost::unordered_flat_map) use it.

The integer-key data set consists of sequential integers shuffled randomly. The string-key data set consists of sequential NULL-terminated strings shuffled randomly.

The code was compiled in C++ using GCC 13.2.0, via MinGW-w64, with -O3 and -DNDEBUG. The benchmarks were run on an AMD Ryzen 7 5800H with the frequency locked at 90%.

As for the benchmarks themselves, they consist of the following:

Of the aforementioned benchmarks, Time to replace 1,000 existing keys with N keys in the table, Time to erase 1,000 nonexisting keys with N keys in the table, Time to look up 1,000 existing keys with N keys in the table, Time to look up 1,000 nonexisting keys with N keys in the table, and Time to iterate over 5,000 keys with N keys in the table are performed side-by-side as part of the same process of inserting N unique keys.

This benchmarking setup has some limitations. Firstly, the range of load factors tested is—as previously mentioned—approximately 0.44 to 0.875. Hence, the results do not show the performance at lower load factors. As some of the tables have default maximum load factors as low as 0.5, they are being measured mostly outside the load-factor range in which their authors intended them to operate. Secondly, the benchmarks do not measure memory usage. However, the memory usage of each open-addressing table, at least, can be estimated based on the maximum load factor it can reasonably tolerate and its memory overhead per bucket. Thirdly, although the choice of the MurmurHash3 finalizer as the integer hash function ensures that all the tables compete on equal footing and none are impaired by a weak hash function, it may disadvantage tables that can tolerate a fast but weak hash function (e.g. the default integer hash function for std::unordered_map is usually an identity function, whereas ankerl::unordered_dense, absl::flat_hash_map, boost::unordered_flat_map, CC’s cc_map, and Verstable all require or benefit from a hash function that provides entropy across all bits of the hash code). Finally, the benchmarks do little to show the cumulative impact of tombstones on the performance of the tables that rely on them when erasures are frequent.

The complete code of the benchmarks is available here.

Hash Tables Benchmarked

C++ Tables

C Tables

Results

Since the horizontal scale of the resulting graphs (each corresponding to one of the aforementioned benchmarks) is linear, not logarithmic, I ran the benchmarks for three different total key counts. Generally, the lower the key count is, the hotter the tables are in the cache. Here are the results:

As I ran the benchmarks 14 times for every key count, each data point shown in the graphs represents the average of 10 recordings (I excluded the two highest and two lowest recordings to limit the effect of outliers and any background processing).

The graphs are interactive. Hover over a table’s label to highlight the associated plot, and click it to toggle the plot’s visibility. The graphs automatically scale to the visible plots.

Displayed below are the graphs for 0 to 20,000,000 keys. Click here to skip to the heatmap summarizing the data shown in these graphs.

32-bit integer key, 32-bit value: Total time to insert N nonexisting keys
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
64-bit integer key, 448-bit value: Total time to insert N nonexisting keys
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
16-char c-string key, 64-bit value: Total time to insert N nonexisting keys
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
32-bit integer key, 32-bit value: Time to erase 1,000 existing keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
64-bit integer key, 448-bit value: Time to erase 1,000 existing keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
16-char c-string key, 64-bit value: Time to erase 1,000 existing keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
32-bit integer key, 32-bit value: Time to replace 1,000 existing keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
64-bit integer key, 448-bit value: Time to replace 1,000 existing keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
16-char c-string key, 64-bit value: Time to replace 1,000 existing keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
32-bit integer key, 32-bit value: Time to erase 1,000 nonexisting keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000