An Extensive Benchmark of C and C++ Hash Tables

Jackson Allan · Published 29 May 2024 · Updated 2 June 2024

Introduction: Why Another Hash-Table Benchmark?

Although thorough hash-table benchmarks have already been published by others in recent years, I have decided to contribute another benchmarking suite for several reasons. Firstly, the existing publications concentrate on C++ tables, whereas there is relatively little coverage of C tables. Consequently, C programmers tend to be directed towards older, well-known libraries despite the existence of newer libraries that may offer superior performance. Secondly, the results reported by the existing benchmarks are mostly cumulative, so they do not clearly show how the tables tested perform under specific conditions such as the table’s current load factor. Moreover, some existing benchmarks test only one key type or bucket size, which makes it hard to draw general conclusions about a given table’s performance. The final motivation is a personal interest: I recently released a novel generic hash table in C (via Verstable and CC) and would like to showcase how it measures up against both the leading C++ tables and other tables available in the C landscape.

Click here to skip to the results.

Benchmarking Setup

The benchmarks include three hash table configurations designed to measure the tables’ performance under diverse conditions:

All the tables benchmarked are configured to use the same hash functions, namely the MurmurHash3 finalizer for integer keys and FNV-1a for string keys.

The maximum load factor for all tables is set to 0.875. Hence, and because all the tables—except for std::unordered_map under GCC—grow by doubling the bucket count once the maximum load factor is reached, the results show the performance of each table across the load-factor range of approximately 0.44 (the troughs in the graphs) to 0.875 (the peaks). I chose this limit because it is relatively high and several tables with hard-coded, non-customizable maximum load factors (namely absl::flat_hash_map and boost::unordered_flat_map) use it.

The integer-key data set consists of sequential integers shuffled randomly. The string-key data set consists of sequential NULL-terminated strings shuffled randomly.

The code was compiled in C++ using GCC 13.2.0, via MinGW-w64, with -O3 and -DNDEBUG. The benchmarks were run on an AMD Ryzen 7 5800H with the frequency locked at 90%.

As for the benchmarks themselves, they consist of the following:

Of the aforementioned benchmarks, Time to replace 1,000 existing keys with N keys in the table, Time to erase 1,000 nonexisting keys with N keys in the table, Time to look up 1,000 existing keys with N keys in the table, Time to look up 1,000 nonexisting keys with N keys in the table, and Time to iterate over 5,000 keys with N keys in the table are performed side-by-side as part of the same process of inserting N unique keys.

This benchmarking setup has some limitations. Firstly, the range of load factors tested is—as previously mentioned—approximately 0.44 to 0.875. Hence, the results do not show the performance at lower load factors. As some of the tables have default maximum load factors as low as 0.5, they are being measured mostly outside the load-factor range in which their authors intended them to operate. Secondly, the benchmarks do not measure memory usage. However, the memory usage of each open-addressing table, at least, can be estimated based on the maximum load factor it can reasonably tolerate and its memory overhead per bucket. Thirdly, although the choice of the MurmurHash3 finalizer as the integer hash function ensures that all the tables compete on equal footing and none are impaired by a weak hash function, it may disadvantage tables that can tolerate a fast but weak hash function (e.g. the default integer hash function for std::unordered_map is usually an identity function, whereas ankerl::unordered_dense, absl::flat_hash_map, boost::unordered_flat_map, CC’s cc_map, and Verstable all require or benefit from a hash function that provides entropy across all bits of the hash code). Finally, the benchmarks do little to show the cumulative impact of tombstones on the performance of the tables that rely on them when erasures are frequent.

The complete code of the benchmarks is available here.

Hash Tables Benchmarked

C++ Tables

C Tables

Results

Since the horizontal scale of the resulting graphs (each corresponding to one of the aforementioned benchmarks) is linear, not logarithmic, I ran the benchmarks for three different total key counts. Generally, the lower the key count is, the hotter the tables are in the cache. Here are the results:

As I ran the benchmarks 14 times for every key count, each data point shown in the graphs represents the average of 10 recordings (I excluded the two highest and two lowest recordings to limit the effect of outliers and any background processing).

The graphs are interactive. Hover over a table’s label to highlight the associated plot, and click it to toggle the plot’s visibility. The graphs automatically scale to the visible plots.

Displayed below are the graphs for 0 to 20,000,000 keys. Click here to skip to the heatmap summarizing the data shown in these graphs.

32-bit integer key, 32-bit value: Total time to insert N nonexisting keys
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
64-bit integer key, 448-bit value: Total time to insert N nonexisting keys
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
16-char c-string key, 64-bit value: Total time to insert N nonexisting keys
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
32-bit integer key, 32-bit value: Time to erase 1,000 existing keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
64-bit integer key, 448-bit value: Time to erase 1,000 existing keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
16-char c-string key, 64-bit value: Time to erase 1,000 existing keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
32-bit integer key, 32-bit value: Time to replace 1,000 existing keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
64-bit integer key, 448-bit value: Time to replace 1,000 existing keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
16-char c-string key, 64-bit value: Time to replace 1,000 existing keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
32-bit integer key, 32-bit value: Time to erase 1,000 nonexisting keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
64-bit integer key, 448-bit value: Time to erase 1,000 nonexisting keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
16-char c-string key, 64-bit value: Time to erase 1,000 nonexisting keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
32-bit integer key, 32-bit value: Time to look up 1,000 existing keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
64-bit integer key, 448-bit value: Time to look up 1,000 existing keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
16-char c-string key, 64-bit value: Time to look up 1,000 existing keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
32-bit integer key, 32-bit value: Time to look up 1,000 nonexisting keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
64-bit integer key, 448-bit value: Time to look up 1,000 nonexisting keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
16-char c-string key, 64-bit value: Time to look up 1,000 nonexisting keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
32-bit integer key, 32-bit value: Time to iterate over 5,000 keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
64-bit integer key, 448-bit value: Time to iterate over 5,000 keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000
16-char c-string key, 64-bit value: Time to iterate over 5,000 keys with N keys in the table
Linear time from zero ⟶ N keys 4,000,000 8,000,000 12,000,000 16,000,000

I have consolidated the results displayed above into a single heatmap that makes it easier to compare tables across benchmarks. Each cell contains the total of the averaged recordings that the corresponding table registered in the corresponding benchmark (except for the Total time to insert N nonexisting keys benchmark, which is already cumulative), normalized to the result of the table that performed the fastest in that benchmark. A value of 2.00, for example, indicates that the table is two times slower than the best performer. The color scale spans from 1.00 to 10.00 (all results that are 10 or more times slower than the best in the benchmark are colored black).

Total time taken relative to the fastest table in the benchmark (lower/lighter is better) absl ankerl boost emilib2 ska std tsl CC khash DICT DICT_OA stb_ds STC uthash Verstable 32-bit integer key, 32-bit value: Insert nonexisting 1.11 1.78 1.00 1.01 1.85 6.39 1.67 1.33 1.23 1.46 1.27 2.02 1.07 6.75 1.27 64-bit integer key, 448-bit value: Insert nonexisting 1.54 1.36 1.08 1.49 1.69 4.91 1.96 1.29 1.15 1.02 1.78 1.39 1.00 4.11 1.38 16-char c-string key, 64-bit value: Insert nonexisting 1.33 1.55 1.16 1.61 2.22 2.87 1.34 1.73 1.72 1.00 1.89 1.13 1.54 2.61 1.40 32-bit integer key, 32-bit value: Erase existing 2.17✝ 2.77 1.36✝ 1.33✝ 1.51 11.26 2.47 1.50 1.28✝ 1.23✝ 1.00✝ 4.80✝ 3.81 7.79 1.49 64-bit integer key, 448-bit value: Erase existing 1.86✝ 2.33 1.31✝ 1.28✝ 1.75 9.84 3.31 1.50 1.11✝ 1.00✝ 1.15✝ 3.98✝ 4.08 6.10 1.35 16-char c-string key, 64-bit value: Erase existing 1.29✝ 1.53 1.04✝ 1.00✝ 1.26 3.21 1.71 1.08 1.27✝ 1.01✝ 1.30✝ 2.00✝ 4.68 2.41 1.06 32-bit integer key, 32-bit value: Replace existing 1.67 1.74 1.66 1.53 1.15 2.71 1.77 1.29 1.71 1.90 1.00 2.45 1.35 3.80 1.29 64-bit integer key, 448-bit value: Replace existing 1.38 1.33 1.48 1.24 1.28 2.34 1.49 1.29 1.49 1.00 1.13 1.75 1.26 2.90 1.46 16-char c-string key, 64-bit value: Replace existing 1.13 1.04 1.03 1.00 1.17 1.52 1.43 1.03 1.30 1.04 1.29 1.08 1.18 1.46 1.03 32-bit integer key, 32-bit value: Erase nonexisting 1.17 1.70 1.00 1.25 1.61 5.25 1.92 1.19 2.23 2.72 2.60 3.94 2.38 3.72 1.17 64-bit integer key, 448-bit value: Erase nonexisting 1.27 1.75 1.00 1.25 2.02 5.83 2.74 1.25 2.67 2.87 3.52 4.06 2.52 3.91 1.19 16-char c-string key, 64-bit value: Erase nonexisting 1.08 1.21 1.00 1.04 1.65 2.43 1.96 1.07 2.47 1.69 2.64 2.03 2.28 1.93 1.06 32-bit integer key, 32-bit value: Look up existing 1.63 1.15 1.28 1.29 1.19 2.28 1.20 1.12 1.48 1.07 1.00 1.94 1.16 2.58 1.09 64-bit integer key, 448-bit value: Look up existing 1.61 1.10 1.48 1.33 1.50 2.41 1.46 1.31 1.62 1.00 1.25 1.73 1.47 2.45 1.11 16-char c-string key, 64-bit value: Look up existing 1.12 1.02 1.05 1.00 1.24 1.59 1.53 1.03 1.39 1.02 1.33 1.05 1.27 1.38 1.02 32-bit integer key, 32-bit value: Look up nonexisting 1.00 1.33 1.02 1.26 1.38 3.67 1.55 1.01 1.84 2.33 2.26 3.23 2.00 3.22 1.00 64-bit integer key, 448-bit value: Look up nonexisting 1.10 1.34 1.11 1.23 1.65 3.96 2.10 1.04 2.03 2.31 2.91 3.12 2.34 3.24 1.00 16-char c-string key, 64-bit value: Look up nonexisting 1.01 1.33 1.18 1.22 1.67 2.32 1.86 1.01 2.28 1.61 2.69 1.91 2.13 1.83 1.00 32-bit integer key, 32-bit value: Iterate 3.87 1.00 2.44 2.38 7.37 87.83 5.34 4.12 6.68 16.79 5.45 1.17 4.89 37.39 3.48 64-bit integer key, 448-bit value: Iterate 1.80 1.00 1.88 1.85 2.21 24.67 1.82 1.87 1.91 5.45 1.71 1.10 1.59 10.53 1.78 16-char c-string key, 64-bit value: Iterate 3.22 1.00 2.30 2.33 5.71 79.94 4.63 3.74 5.53 16.87 4.79 1.09 4.06 30.64 2.70 ✝ Relies on tombstones or a tombstone-like mechanism

Analysis

Below is my analysis of the results, especially the 0-to-20,000,000-key graphs displayed above. Naturally, this (and the next) section is somewhat subjective. Hence, I encourage readers seeking a hash-table library to also examine the graphs and heatmaps themselves, bearing in mind their specific needs.

I would like to finish with a few general observations about hash-table designs. Firstly, these benchmarks show once again that node-based separate-chaining tables are slow. Secondly, conventional open-addressing tables that use linear or quadratic probing are very susceptible to load factor. The same goes for Robin Hood tables, despite the oft-repeated claim that a reasonable default maximum load factor for such tables is 0.9. Thirdly, SIMD and hybrid open-addressing/separate-chaining tables are mostly immune to high load factors. These two designs achieve this load-factor resilience in very different ways: the former maximizes the number of buckets probed at once, while the latter limits the buckets probed to the few containing a key that hashed to the lookup key’s ideal bucket. Finally, SIMD, hybrid, and Robin Hood tables all limit variance in the time needed to complete an operation. This is reflected by the steadiness of their graph plots compared to the plots for the other tables.

Conclusion: Which Hash Table to Choose?

In C++, boost::unordered_flat_map appears to be the all-around best performer, especially when hot in the cache. Therefore, it makes an excellent default choice. However, ankerl::unordered_dense offers the best iteration speed, performs reasonably well in the other benchmarks, and is a standalone header. Hence, it is a good choice if iteration speed is the factor most important to you or you prefer a single-header solution. It also has faster lookups, especially when buckets are large and when the load factor is kept low. emilib2 shows great promise and could be a good alternative to boost::unordered_flat_map once it is ready for public consumption.

In C, the results show that Verstable and CC’s cc_map are all-around excellent performers, with the former being faster for iteration. However, they are not the best performers in every scenario. If your use case is skewed towards a particular operation or pair of data types, then it is worth also considering other options, especially if you are content to use a low maximum load factor. In particular, STC’s hmap is faster for inserting integer keys, M*LIB’s DICT has the fastest lookups of existing integer keys, and stb_dshm and sh have the fastest iteration. Although not shown in the benchmarks, klib’s khash generally has the lowest memory overhead among the tables tested, so it—or khashl, which consumes even less memory—may be the best choice when memory is limited.

STC’s hmap, klib’s khash, and M*LIB’s DICT and DICT_OA are each part of a large, comprehensive library. The usefulness of these libraries’ other features to you is also a compelling reason to choose one of them.

uthash, on the other hand, should be avoided due to all-around poor performance, unless you need the specific qualities of an intrusive or node-based table.

Finally, I must warn that some of the C tables raise portability concerns. uthash and stb_dshm hash and compare keys byte-by-byte by default. This poses an issue not only for structs containing padding bytes but also—at least theoretically—for fundamental integer types, which may contain padding bits (6.2.6.2). The problem is especially apparent for stb_ds because it offers no way for users to provide their own hash and comparison functions (other than modifying the library).

Addendum: Verstable vs CC’s cc_map

The above conclusion may leave readers wondering about the differences between Verstable and CC’s cc_map, which both use the same hybrid design, outside the context of performance. In short, Verstable is a standalone hash-table library. It requires C99 or later, and it requires users to instantiate a pseudo-template for every combination of key and value data types (as is common among high-performance C container libraries). It also offers full support for custom memory allocators. CC, on the other hand, is a larger library that includes a range of generic containers. It uses novel techniques to provide a simpler API and avoid the need for boilerplate from users. However, it requires C11 or later and compiler support typeof (which is standard as of C23), and its support for custom memory allocators is more limited.

Acknowledgments

I would like to thank Joaquín M López Muñoz, Martin Leitner-Ankerl, Patrick Pelissier, and Attractive Chaos for their valuable feedback during the drafting of this article.

Discussion

Comments on this article belong here.

The article was also discussed on Reddit at r/C_Programming and r/cpp.