{"id":189,"date":"2018-07-24T11:31:27","date_gmt":"2018-07-24T09:31:27","guid":{"rendered":"https:\/\/hpjansson.org\/blag\/?p=189"},"modified":"2023-04-10T20:37:22","modified_gmt":"2023-04-10T18:37:22","slug":"a-hash-table-re-hash","status":"publish","type":"post","link":"https:\/\/hpjansson.org\/blag\/2018\/07\/24\/a-hash-table-re-hash\/","title":{"rendered":"A hash table re-hash"},"content":{"rendered":"<p>Hash tables! They're everywhere. They're also pretty boring, <em>but<\/em> I've had <a href=\"https:\/\/gitlab.gnome.org\/GNOME\/glib\/issues\/1198\">GLib issue #1198<\/a> sitting around for a while, and the GNOME move to GitLab resulted in a helpful reminder (or two) being sent out that convinced me to look into it again with an eye towards improving GHashTable and maybe answering some domain-typical questions, like \"You're using approach X, but I've heard approach Y is better. Why don't you use that instead?\" and \"This other hash table is 10% faster in my extremely specific test. Why is your hash table so bad?\".<\/p>\n<p>And unfairly paraphrased as those questions may be, I have to admit I'm curious too. Which means benchmarks. But first, what exactly makes a hash table good? Well, it depends on what you're looking for. I made a list.<\/p>\n<h3>A miserable pile of tradeoffs<\/h3>\n<p>AKA hard choices. You have to make them.<\/p>\n<p><strong>Simple keys vs. complex keys<\/strong>: The former may be stored directly in the hash table and are quick to hash (e.g. integers), while the latter may be stored as pointers to external allocations requiring a more elaborate hash function and additional memory accesses (e.g. strings). If you have complex keys, you'll suffer a penalty on lookups and when the hash table is resized, but this can be avoided by storing the generated hash for each key. However, for simple keys this is likely a waste of memory.<\/p>\n<p><strong>Size of the data set<\/strong>: Small data sets may fit entirely in cache memory. For those, a low instruction count can be beneficial, but as the table grows, memory latency starts to dominate, and you'll get away with more complex code. The extra allowance can be used to improve cache efficiency, perhaps with bucketization or some other fancy scheme. If you know the exact size of the data set in advance, or if there's an upper bound and memory is not an issue, you can avoid costly resizes.<\/p>\n<p><strong>Speed vs. memory generally<\/strong>: There are ways to trade off memory against speed directly. Google Sparsehash is an interesting example of this; it has the ability to allocate the table in small increments using many dynamically growing arrays of tightly packed entries. The arrays have a maximum size that can be set at compile time to yield a specific tradeoff between access times and per-entry memory overhead.<\/p>\n<p><strong>Distribution robustness<\/strong>: A good hash distribution is critical to preventing performance oopsies and, if you're storing untrusted input, thwarting deliberate attacks. This can be costly in the average case, though, so there's a gradient of approaches with power-of-two-sized tables and bitmask moduli at the low end, <a href=\"http:\/\/burtleburtle.net\/bob\/hash\/integer.html\">more elaborate hash functions<\/a> and prime moduli somewhere in the middle, and cryptographically strong, salted hashes at the very high end. Chaining tables have an additional advantage here, since they don't have to deal with primary clustering.<\/p>\n<p><strong>Mode of operation<\/strong>: Some tables can do fast lookups, but are slower when inserting or deleting entries. For instance, a table implementing the Robin Hood probing scheme with a high load factor may require a large number of entries to be shifted down to make room for an insertion, while lookups remain fast due to this algorithm's inherent key ordering. An algorithm that replaces deleted entries with tombstones may need to periodically re-hash the entire table to get rid of those. Failed lookups can be much slower than successful ones. Sometimes it makes sense to target specific workloads, like the one where you're filtering redundant entries from a dataset and using a hash table to keep track of what you've already seen. This is pretty common and requires insertion and lookup to be fast, but not deletion. Finally, there are more complex operations with specific behavior for key replacement, or iterating over part of the dataset while permuting it somehow.<\/p>\n<p><strong>Flexibility and convenience<\/strong>: Some implementations reserve part of the keyspace for internal use. For instance, Google Sparsehash reserves two values, one for unused entries and one for tombstones. These may be -1 and -2, or perhaps \"u\" and \"t\" if your keys are strings. In contrast, GHashTable lets you use the entire keyspace, including storing a value for the NULL key, at the cost of some extra bookkeeping memory (basically special cases of the stored hash field &#8212; see \"complex keys\" above). Preconditions are also nice to have when you slip up; the fastest implementations may resort to undefined behavior instead.<\/p>\n<p><strong>GC and debugger friendliness<\/strong>: If there's a chance that your hash table will coexist with a garbage collector, it's a good idea to play nice and clear keys and values when they're removed from the table. This ensures you're not tying up outside memory by keeping useless pointers to it. It also lets memory debuggers like Valgrind detect more leaks and whatnot. There is a slight speed penalty for doing this.<\/p>\n<p><strong>Other externalities<\/strong>: Memory fragmentation, peak memory use, etc. goes here. Open addressing helps with the former, since the hash table then only makes a few big allocations which the allocator can satisfy with <code>mmap()<\/code> and simply <code>munmap()<\/code> when they're not needed anymore. Chaining hash tables make lots of heap allocations, which can get interspersed with other allocations leaving tiny holes all over when freed. Bad!<\/p>\n<h3>A note on benchmarks<\/h3>\n<p>This isn't the first time, or the second time, or likely even the 100th time someone posts a bunch of hash table benchmarks on the Internet, but it may be the second or third time or so that GHashTable makes an appearance. Strange maybe, given its install base must number in the millions &#8212; but on the other hand, it's quite tiny (and boring). The venerable <a href=\"http:\/\/incise.org\/hash-table-benchmarks.html\" rel=\"nofollow\">hash table benchmarks by Nick Welch<\/a> still rule the DuckDuckGo ranking, and they're useful and nice despite a few significant flaws: Notably, there's a bug that causes the C++ implementations to hash pointers to strings instead of the strings themselves, they use virtual size as a measure of memory consumption, and they only sample a single data point at the end of each test run, so there's no intra-run data.<\/p>\n<p>He also classifies GHashTable as a \"Joe Sixpack\" of hash tables, which is uh&#8230; pretty much what we're going for! See also: The Everyman's Hash Table (with apologies to Philip).<\/p>\n<p>Anyway. The first issue is easily remedied by using <code>std::string<\/code> in the C++ string tests. Unfortunately, it comes with extra overhead, but I think that's fair when looking at out-of-the-box performance.<\/p>\n<p>The second issue is a bit murkier, but it makes sense to measure the amount of memory that the kernel has to provide additional physical backing for at runtime, i.e. memory pages dirtied by the process. This excludes untouched virtual memory, read-only mappings from shared objects, etc. while still capturing the effects of allocator overhead and heap fragmentation. On Linux, it can be read from <code>\/proc\/&lt;pid&gt;\/smaps<\/code> as the sum of all <code>Private_Dirty<\/code> and <code>Swap<\/code> fields. <em>However<\/em>, polling this file has a big impact on performance and penalizes processes with more mappings unfairly, so in the end I settled on the <code>AnonRSS<\/code> field from <code>\/proc\/&lt;pid&gt;\/status<\/code>, which works out to almost the same thing in practice, but with a lower and much more fair penalty.<\/p>\n<p>In order to provide a continuous stream of samples, I wrote a small piece of C code that runs each test in a child process while monitoring its output and sampling <code>AnonRSS<\/code> a hundred times or so per second. Overkill, yes, but it's useful for capturing those brief memory peaks. The child process writes a byte to stdout per 1000 operations performed so we can measure progress. The timestamps are all in wall-clock time.<\/p>\n<p>The tests themselves haven't changed that much relative to Welch's, but I added some to look for worst-case behavior (spaced integers, aging) and threw in <a href=\"https:\/\/github.com\/attractivechaos\/klib\/blob\/master\/khash.h\" rel=\"nofollow\">khash<\/a>, a potentially very fast C hash table. I ran the below benchmarks on an Intel i7-4770 Haswell workstation. They should be fairly representative of modern 64-bit architectures.<\/p>\n<h3>Some results<\/h3>\n<p>As you may have gathered from the above, it's almost impossible to make an apples-to-apples comparison. Benchmarks don't capture every aspect of performance, and I'm not out to pick on any specific implementation, YMMV, etc. With that in mind, we can still learn quite a bit.<\/p>\n<p align=\"center\"><a href=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-01-sequential-memory-time.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-208 size-full\" title=\"Hash table benchmark | Inserting sequential integers | Memory used \/ time\" src=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-01-sequential-memory-time.png\" alt=\"Hash table benchmark | Inserting sequential integers | Memory used \/ time\" width=\"1024\" height=\"600\" \/><\/a><\/p>\n<p>\"Sequential\" here means {1, 2, &#8230;}. The integer tests all map 64-bit integers to void *, except for the Python one, which maps <code>PyLong_FromLong()<\/code> PyObjects to a single shared PyObject value created in the same manner at the start of the test. This results in an extra performance hit from the allocator and type system, but it's a surprisingly good showing regardless. Ruby uses <code>INT2FIX()<\/code> for both, which avoids extra allocations and makes for a correspondingly low memory overhead. Still, it's slow in this use case. GHashTable embeds integers in pointers using <code>GINT_TO_POINTER()<\/code>, but this is only portable for 32-bit integers.<\/p>\n<p>When a hash table starts to fill up, it must be resized. The most common way to do this is to allocate the new map, re-inserting elements from the old map into it, and then freeing the old map. Memory use peaks when this happens <strong>(1)<\/strong> because you're briefly keeping two maps around. I already knew that Google Sparsehash's sparse mode allows it to allocate and free buckets in small increments, but khash's behavior <strong>(2)<\/strong> surprised me &#8212; it <code>realloc()<\/code>s and rearranges the key and value arrays in-place using an eviction scheme. It's a neat trick that's probably readily applicable in GHashTable too.<\/p>\n<p>Speaking of GHashTable, it does well. <em>Suspiciously<\/em> well, in fact, and the reason is <code>g_direct_hash()<\/code> mapping input keys {1, 2, &#8230;} to corresponding buckets {1, 2, &#8230;} hitting the same cache lines repeatedly. This may sound like a good thing, but it's actually quite bad, as we'll see later in this post. Normally I'd expect khash to be faster, but its hash function is not as cache-friendly:<\/p>\n<p><code>#define kh_int64_hash_func(key) (khint32_t)((key)&gt;&gt;33^(key)^(key)&lt;&lt;11)<\/code><\/p>\n<p>Finally, it's worth mentioning Boost's unique allocation pattern <strong>(3)<\/strong>. I don't know what's going on there, but if I had to guess, I'd say it keeps supplemental map data in an array that can simply be freed just prior to each resize.<\/p>\n<p align=\"center\"><a href=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-02-sequential-memory-items.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-209 size-full\" title=\"Hash table benchmark | Inserting sequential integers | Memory used \/ ops\" src=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-02-sequential-memory-items.png\" alt=\"Hash table benchmark | Inserting sequential integers | Memory used \/ ops\" width=\"1024\" height=\"600\" \/><\/a><\/p>\n<p>This is essentially the same plot, but with the time dimension replaced by insertion count. It's a good way to directly compare the tables' memory use over their life cycles. Again, GHashTable looks good due to its dense access pattern, with entries concentrated in a small number of memory pages.<\/p>\n<p align=\"center\"><a href=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-03-sequential-memoverhead-items.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-210 size-full\" title=\"Hash table benchmark | Inserting sequential integers | Memory used \/ ops\" src=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-03-sequential-memoverhead-items.png\" alt=\"Hash table benchmark | Inserting sequential integers | Memory used \/ ops\" width=\"1024\" height=\"600\" \/><\/a><\/p>\n<p>Yet another way of slicing it. This one's like the previous plot, but the memory use is per item held in the table. The fill-resize cycles are clearly visible. Sparsehash (sparse) is almost optimal at ~16 bytes per item, while GHashTable also stores its calculated hashes, for an additional 4 bytes.<\/p>\n<p align=\"center\"><a href=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-04-sequential-throughput-time.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-226 size-full\" title=\"Hash table benchmark | Inserting sequential integers | Throughput \/ time\" src=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-04-sequential-throughput-time.png\" alt=\"Hash table benchmark | Inserting sequential integers | Throughput \/ time\" width=\"1024\" height=\"600\" \/><\/a><\/p>\n<p>Still noisy even after gnuplot smoothing, this is an approximation of the various tables' throughput over time. The interesting bits are kind of squeezed in there at the start of the plot, so let's zoom in&#8230;<\/p>\n<p align=\"center\"><a href=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-05-sequential-throughput-time-early.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-227 size-full\" title=\"Hash table benchmark | Inserting sequential integers | Throughput \/ time\" src=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-05-sequential-throughput-time-early.png\" alt=\"Hash table benchmark | Inserting sequential integers | Throughput \/ time\" width=\"1024\" height=\"600\" \/><\/a><\/p>\n<p>Hash table operations are constant time, right? Well &#8212; technically yes, but if you're unlucky and your insertion triggers a resize, you may end up waiting for almost as long as all the previous insertions put together. If you need predictable performance, you must either find a way to set the size in advance or use a slower-and-steadier data structure. Also noteworthy: Sparsehash (dense) is by far the fastest at peak performance, but it struggles a bit with resizes and comes in second.<\/p>\n<p align=\"center\"><a href=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-06-sequential-memtime-items.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-213 size-full\" title=\"Hash table benchmark | Inserting sequential integers | Timespace \/ ops\" src=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-06-sequential-memtime-items.png\" alt=\"Hash table benchmark | Inserting sequential integers | Timespace \/ ops\" width=\"1024\" height=\"600\" \/><\/a><\/p>\n<p>This one merits a little extra explanation: Some hash tables are fast and others are small, but how do you measure overall resource consumption? The answer is to multiply time by memory, which gets you a single dimension measured in byte-seconds. It's more abstract, and it's not immediately obvious if a particular Bs measurement is good or bad, but relatively it makes a lot of sense: If you use twice as much memory as your competitor, you can make up for it by being twice as fast. Since both resources are affected by diminishing returns, a good all-rounder like GHashTable should try to hit the sweet spot between the two.<\/p>\n<p>Memory use varies over time, and consequently it's not enough to multiply the final memory consumption by total run time. What you really want is the area under the graph in the first plot (memory use over time), and you get there by summing up all the little byte-second slices covered by your measurements. You plot the running sum as you sweep from left to right, and the result will look like the above.<\/p>\n<p align=\"center\"><a href=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-07-delete-memory-time.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-214 size-full\" title=\"Hash table benchmark | Deleting sequential integers | Memory used \/ time\" src=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-07-delete-memory-time.png\" alt=\"Hash table benchmark | Deleting sequential integers | Memory used \/ time\" width=\"1024\" height=\"600\" \/><\/a><\/p>\n<p>It's not enough to just insert items, sometimes you have to delete them too. In the above plot, I'm inserting 20M items as before, and then deleting them all in order. The hash table itself is not freed. As you can probably tell, there's a lot going on here!<\/p>\n<p>First, this is unfair to Ruby, since I suspect it needs a GC pass and I couldn't find an easy way to do it from the C API. Sorry. Python's plot is weird (and kind of good?) &#8212; it seems like it's trimming memory in small increments. In GLib land, GHashTable dutifully shrinks as we go, and since it keeps everything in a handful of big allocations that can be kept in separate memory maps, there is minimal heap pollution and no special tricks needed to return memory to the OS. Sparsehash only shrinks on insert (this makes it faster), so I added a single insert after the deletions to help it along. I also put in a <code>malloc_trim(0)<\/code> without which the other C++ implementations would be left sitting on a heap full of garbage &#8212; but even with the trim in, Boost and std::unordered_map bottom out at ~200MB in use. khash is unable to shrink at all; there's a TODO in the code saying as much.<\/p>\n<p>You may have noticed there are horizontal \"tails\" on this one &#8212; they're due to a short sleep at the end of each test to make sure we capture the final memory use.<\/p>\n<p align=\"center\"><a href=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-08-random-memory-time.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-215 size-full\" title=\"Hash table benchmark | Inserting random integers | Memory used \/ time\" src=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-08-random-memory-time.png\" alt=\"Hash table benchmark | Inserting random integers | Memory used \/ time\" width=\"1024\" height=\"600\" \/><\/a><\/p>\n<p>Another insertion test, but instead of a {1, 2, &#8230;} sequence it's now integers pseudorandomly chosen from the range [0 .. 2\u00b2\u2078-1] with a fixed seed. Ruby performs about the same as before, but everyone else got slower, putting it in the middle of the pack. This is also a more realistic assessment of GHashTable. Note that we're making calls to random() here, which adds a constant overhead &#8212; not a big problem, since it should affect everyone equally and we're not that interested in absolute timings &#8212; but a better test might use a pregenerated input array.<\/p>\n<p align=\"center\"><a href=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-09-random-memtime-items.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-216 size-full\" title=\"Hash table benchmark | Inserting random integers | Timespace \/ops\" src=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-09-random-memtime-items.png\" alt=\"Hash table benchmark | Inserting random integers | Timespace \/ops\" width=\"1024\" height=\"600\" \/><\/a><\/p>\n<p>GHashTable is neck-and-neck with Sparsehash (dense), but they're both outmatched by khash.<\/p>\n<p align=\"center\"><a href=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-10-spaced-memory-time.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-217 size-full\" title=\"Hash table benchmark | Inserting spaced integers | Memory used \/ time\" src=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-10-spaced-memory-time.png\" alt=\"Hash table benchmark | Inserting spaced integers | Memory used \/ time\" width=\"1024\" height=\"600\" \/><\/a><\/p>\n<p>With spaced integers in the sequence {256, 512, 768, &#8230;}, the wheels come off Sparsehash and khash. Both implementations use table sizes that are powers of two, likely because it makes the modulo operation faster (it's just a bitmask) and because they use quadratic probing, which is only really efficient in powers of two. The downside is that integers with equal low-order bits will be assigned to a small number of common buckets forcing them to probe a lot.<\/p>\n<p>GHashTable <em>also<\/em> uses quadratic probing in a power-of-two-sized table, but it's immune to this particular pattern due to a twist: For each lookup, the initial probe is made with a prime modulus, after which it switches to a bitmask. This means there is a small number of buckets unreachable to the initial probe, but this number is very small in practice. I.e. for the first few table sizes (8, 16, 32, 64, 128, 256), the number of unreachable buckets is 1, 3, 1, 3, 1, and 5. Subsequent probing will reach these.<\/p>\n<p>So how likely is this kind of key distribution in practice? I think it's fairly likely. One example I can think of is when your keys are pointers. This is reasonable enough when you're keeping sets of objects around and you're only interested in knowing whether an object belongs to a given set, or maybe you're associating objects with some extra data. Alignment ensures limited variation in the low-order key bits. Another example is if you have some kind of sparse multidimensional array with indexes packed into integer keys.<\/p>\n<p align=\"center\"><a href=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-11-spaced-memtime-items.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-219 size-full\" title=\"Hash table benchmark | Inserting spaced integers | Timespace \/ ops\" src=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-11-spaced-memtime-items.png\" alt=\"Hash table benchmark | Inserting spaced integers | Timespace \/ ops\" width=\"1024\" height=\"600\" \/><\/a><\/p>\n<p>Total resource consumption is as you'd expect. I should add that there are ways around this &#8212; Sparsehash and khash will both let you use a more complex hash function if you supply it yourself, at the cost of lower performance in the general case.<\/p>\n<p align=\"center\"><a href=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-12-aging-memory-time.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-220 size-full\" title=\"Hash table benchmark | Integer aging | Memory used \/ time\" src=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-12-aging-memory-time.png\" alt=\"Hash table benchmark | Integer aging | Memory used \/ time\" width=\"1024\" height=\"600\" \/><\/a><\/p>\n<p><strong>Yikes!<\/strong> This is the most complex test, consisting of populating the table with 20M random integers in the range 0-20M and then aging it by alternatingly deleting and inserting random integers for another 20M iterations. A busy database of objects indexed by integer IDs might look something like this, and if it were backed by GHashTable or Sparsehash (sparse), it'd be a painfully slow one. It's hard to tell from the plot, but khash completes in only 4s.<\/p>\n<p>Remember how I said GHashTable did suspiciously well in the first test? Well,<\/p>\n<p align=\"center\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full aligncenter\" src=\"https:\/\/i.imgur.com\/1g7WUlf.gif\" alt=\"\" width=\"360\" height=\"360\" \/><\/p>\n<p>We're paying the price. When there are a lot of keys smaller than the hash table modulus, they tend to concentrate in the low-numbered buckets, and when tombstones start to build up it gets clogged and probes take forever. Fortunately this is easy to fix &#8212; just multiply the generated hashes by a small, odd number before taking the modulo. The compiler reduces the multiplication to a few shifts and adds, but cache performance suffers slightly since keys are now more spread out.<\/p>\n<p>On a happier note&#8230;<\/p>\n<p align=\"center\"><a href=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-14-randomstring-memory-time.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-236 size-full\" title=\"Hash table benchmark | Inserting strings in random order | Memory used \/ time\" src=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-14-randomstring-memory-time.png\" alt=\"Hash table benchmark | Inserting strings in random order | Memory used \/ time\" width=\"1024\" height=\"600\" \/><\/a><\/p>\n<p>All the tests so far have been on integer keys, so here's one with strings of the form \"mystring-%d\" where the decimal part is a pseudorandomly chosen integer in the range [0 .. 2\u00b3\u00b9-1] resulting in strings that are 10-19 bytes long. khash and GHashTable, being the only tables that use plain C strings, come out on top. Ruby does okay with <code>rb_str_new2()<\/code>, Python and QHash are on par using <code>PyDict_SetItemString()<\/code> and <code>QString<\/code> respectively. The other C++ implementations use <code>std::string<\/code>, and Sparsehash is really not loving it. It almost looks like there's a bug somewhere &#8212; either it's copying\/hashing the strings too often, or it's just not made for <code>std::string<\/code>. My impression based on a cursory search is that people tend to write their own hash functions\/adapters for it.<\/p>\n<p>GHashTable's on its home turf here. Since it stores the calculated hashes, it can avoid rehashing the keys on resize, and it can skip most full key compares when probing. Longer strings would hand it an even bigger advantage.<\/p>\n<h3>Conclusion<\/h3>\n<p>This post is really all about the details, but if I had to extract some general lessons from it, I'd go with these:<\/p>\n<ul>\n<li>Tradeoffs, tradeoffs everywhere.<\/li>\n<li>Worst-case behavior matters.<\/li>\n<li>GHashTable is decent, except for one situation where it's patently awful.<\/li>\n<li>There's some room for improvement in general, cf. robustness and peak memory use.<\/li>\n<\/ul>\n<p>My <a href=\"https:\/\/github.com\/hpjansson\/hash-table-rehash\" rel=\"nofollow\">hash-table-shootout fork<\/a> is on Github. I was going to go into detail about potential GHashTable improvements, but I think I'll save that for another post. This one is way too long already. If you got this far, thanks for reading!<\/p>\n<p>Update: There's now a <a href=\"https:\/\/hpjansson.org\/blag\/2018\/08\/29\/what-ails-ghashtable\/\">follow-up post<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hash tables! They're everywhere. They're also pretty boring, but I've had GLib issue #1198 sitting around for a while, and the GNOME move to GitLab resulted in a helpful reminder (or two) being sent out that convinced me to look &hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11,3,6],"tags":[15,24,16],"class_list":["post-189","post","type-post","status-publish","format-standard","hentry","category-computing","category-gnome","category-technical","tag-benchmarks","tag-hash-tables","tag-performance"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>A hash table re-hash - Et tu, Cthulhu<\/title>\n<meta name=\"description\" content=\"What makes a good hash table? Analysis and benchmarks of some popular open source hash tables, with a focus on the various tradeoffs they make.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/hpjansson.org\/blag\/2018\/07\/24\/a-hash-table-re-hash\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A hash table re-hash - Et tu, Cthulhu\" \/>\n<meta property=\"og:description\" content=\"What makes a good hash table? Analysis and benchmarks of some popular open source hash tables, with a focus on the various tradeoffs they make.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/hpjansson.org\/blag\/2018\/07\/24\/a-hash-table-re-hash\/\" \/>\n<meta property=\"og:site_name\" content=\"Et tu, Cthulhu\" \/>\n<meta property=\"article:published_time\" content=\"2018-07-24T09:31:27+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-04-10T18:37:22+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-01-sequential-memory-time.png\" \/>\n<meta name=\"author\" content=\"Hans Petter Jansson\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@hpj\" \/>\n<meta name=\"twitter:site\" content=\"@hpj\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Hans Petter Jansson\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"17 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/2018\\\/07\\\/24\\\/a-hash-table-re-hash\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/2018\\\/07\\\/24\\\/a-hash-table-re-hash\\\/\"},\"author\":{\"name\":\"Hans Petter Jansson\",\"@id\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/#\\\/schema\\\/person\\\/a86f9dc39a36a8184d6e6f9a759f235c\"},\"headline\":\"A hash table re-hash\",\"datePublished\":\"2018-07-24T09:31:27+00:00\",\"dateModified\":\"2023-04-10T18:37:22+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/2018\\\/07\\\/24\\\/a-hash-table-re-hash\\\/\"},\"wordCount\":3149,\"commentCount\":2,\"publisher\":{\"@id\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/#\\\/schema\\\/person\\\/a86f9dc39a36a8184d6e6f9a759f235c\"},\"image\":{\"@id\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/2018\\\/07\\\/24\\\/a-hash-table-re-hash\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/wp-content\\\/uploads\\\/2018\\\/07\\\/ht2018-01-sequential-memory-time.png\",\"keywords\":[\"benchmarks\",\"hash tables\",\"performance\"],\"articleSection\":[\"Computing\",\"GNOME\",\"Technical\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/hpjansson.org\\\/blag\\\/2018\\\/07\\\/24\\\/a-hash-table-re-hash\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/2018\\\/07\\\/24\\\/a-hash-table-re-hash\\\/\",\"url\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/2018\\\/07\\\/24\\\/a-hash-table-re-hash\\\/\",\"name\":\"A hash table re-hash - Et tu, Cthulhu\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/2018\\\/07\\\/24\\\/a-hash-table-re-hash\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/2018\\\/07\\\/24\\\/a-hash-table-re-hash\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/wp-content\\\/uploads\\\/2018\\\/07\\\/ht2018-01-sequential-memory-time.png\",\"datePublished\":\"2018-07-24T09:31:27+00:00\",\"dateModified\":\"2023-04-10T18:37:22+00:00\",\"description\":\"What makes a good hash table? Analysis and benchmarks of some popular open source hash tables, with a focus on the various tradeoffs they make.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/2018\\\/07\\\/24\\\/a-hash-table-re-hash\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/hpjansson.org\\\/blag\\\/2018\\\/07\\\/24\\\/a-hash-table-re-hash\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/2018\\\/07\\\/24\\\/a-hash-table-re-hash\\\/#primaryimage\",\"url\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/wp-content\\\/uploads\\\/2018\\\/07\\\/ht2018-01-sequential-memory-time.png\",\"contentUrl\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/wp-content\\\/uploads\\\/2018\\\/07\\\/ht2018-01-sequential-memory-time.png\",\"width\":1024,\"height\":600,\"caption\":\"Hash table benchmark | Inserting sequential integers | Memory used \\\/ time\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/2018\\\/07\\\/24\\\/a-hash-table-re-hash\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"A hash table re-hash\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/#website\",\"url\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/\",\"name\":\"Et tu, Cthulhu\",\"description\":\"Personal blag of Hans Petter Jansson: Fun with computers edition\",\"publisher\":{\"@id\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/#\\\/schema\\\/person\\\/a86f9dc39a36a8184d6e6f9a759f235c\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/hpjansson.org\\\/blag\\\/#\\\/schema\\\/person\\\/a86f9dc39a36a8184d6e6f9a759f235c\",\"name\":\"Hans Petter Jansson\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/33402e5005b34e5ee4ba4f9fd0c5d754f4505d5fb455736e5c585676cb7f2075?s=96&d=retro&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/33402e5005b34e5ee4ba4f9fd0c5d754f4505d5fb455736e5c585676cb7f2075?s=96&d=retro&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/33402e5005b34e5ee4ba4f9fd0c5d754f4505d5fb455736e5c585676cb7f2075?s=96&d=retro&r=g\",\"caption\":\"Hans Petter Jansson\"},\"logo\":{\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/33402e5005b34e5ee4ba4f9fd0c5d754f4505d5fb455736e5c585676cb7f2075?s=96&d=retro&r=g\"},\"sameAs\":[\"https:\\\/\\\/hpjansson.org\\\/\",\"https:\\\/\\\/x.com\\\/hpj\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A hash table re-hash - Et tu, Cthulhu","description":"What makes a good hash table? Analysis and benchmarks of some popular open source hash tables, with a focus on the various tradeoffs they make.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/hpjansson.org\/blag\/2018\/07\/24\/a-hash-table-re-hash\/","og_locale":"en_US","og_type":"article","og_title":"A hash table re-hash - Et tu, Cthulhu","og_description":"What makes a good hash table? Analysis and benchmarks of some popular open source hash tables, with a focus on the various tradeoffs they make.","og_url":"https:\/\/hpjansson.org\/blag\/2018\/07\/24\/a-hash-table-re-hash\/","og_site_name":"Et tu, Cthulhu","article_published_time":"2018-07-24T09:31:27+00:00","article_modified_time":"2023-04-10T18:37:22+00:00","og_image":[{"url":"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-01-sequential-memory-time.png","type":"","width":"","height":""}],"author":"Hans Petter Jansson","twitter_card":"summary_large_image","twitter_creator":"@hpj","twitter_site":"@hpj","twitter_misc":{"Written by":"Hans Petter Jansson","Est. reading time":"17 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/hpjansson.org\/blag\/2018\/07\/24\/a-hash-table-re-hash\/#article","isPartOf":{"@id":"https:\/\/hpjansson.org\/blag\/2018\/07\/24\/a-hash-table-re-hash\/"},"author":{"name":"Hans Petter Jansson","@id":"https:\/\/hpjansson.org\/blag\/#\/schema\/person\/a86f9dc39a36a8184d6e6f9a759f235c"},"headline":"A hash table re-hash","datePublished":"2018-07-24T09:31:27+00:00","dateModified":"2023-04-10T18:37:22+00:00","mainEntityOfPage":{"@id":"https:\/\/hpjansson.org\/blag\/2018\/07\/24\/a-hash-table-re-hash\/"},"wordCount":3149,"commentCount":2,"publisher":{"@id":"https:\/\/hpjansson.org\/blag\/#\/schema\/person\/a86f9dc39a36a8184d6e6f9a759f235c"},"image":{"@id":"https:\/\/hpjansson.org\/blag\/2018\/07\/24\/a-hash-table-re-hash\/#primaryimage"},"thumbnailUrl":"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-01-sequential-memory-time.png","keywords":["benchmarks","hash tables","performance"],"articleSection":["Computing","GNOME","Technical"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/hpjansson.org\/blag\/2018\/07\/24\/a-hash-table-re-hash\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/hpjansson.org\/blag\/2018\/07\/24\/a-hash-table-re-hash\/","url":"https:\/\/hpjansson.org\/blag\/2018\/07\/24\/a-hash-table-re-hash\/","name":"A hash table re-hash - Et tu, Cthulhu","isPartOf":{"@id":"https:\/\/hpjansson.org\/blag\/#website"},"primaryImageOfPage":{"@id":"https:\/\/hpjansson.org\/blag\/2018\/07\/24\/a-hash-table-re-hash\/#primaryimage"},"image":{"@id":"https:\/\/hpjansson.org\/blag\/2018\/07\/24\/a-hash-table-re-hash\/#primaryimage"},"thumbnailUrl":"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-01-sequential-memory-time.png","datePublished":"2018-07-24T09:31:27+00:00","dateModified":"2023-04-10T18:37:22+00:00","description":"What makes a good hash table? Analysis and benchmarks of some popular open source hash tables, with a focus on the various tradeoffs they make.","breadcrumb":{"@id":"https:\/\/hpjansson.org\/blag\/2018\/07\/24\/a-hash-table-re-hash\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/hpjansson.org\/blag\/2018\/07\/24\/a-hash-table-re-hash\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/hpjansson.org\/blag\/2018\/07\/24\/a-hash-table-re-hash\/#primaryimage","url":"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-01-sequential-memory-time.png","contentUrl":"https:\/\/hpjansson.org\/blag\/wp-content\/uploads\/2018\/07\/ht2018-01-sequential-memory-time.png","width":1024,"height":600,"caption":"Hash table benchmark | Inserting sequential integers | Memory used \/ time"},{"@type":"BreadcrumbList","@id":"https:\/\/hpjansson.org\/blag\/2018\/07\/24\/a-hash-table-re-hash\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/hpjansson.org\/blag\/"},{"@type":"ListItem","position":2,"name":"A hash table re-hash"}]},{"@type":"WebSite","@id":"https:\/\/hpjansson.org\/blag\/#website","url":"https:\/\/hpjansson.org\/blag\/","name":"Et tu, Cthulhu","description":"Personal blag of Hans Petter Jansson: Fun with computers edition","publisher":{"@id":"https:\/\/hpjansson.org\/blag\/#\/schema\/person\/a86f9dc39a36a8184d6e6f9a759f235c"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/hpjansson.org\/blag\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/hpjansson.org\/blag\/#\/schema\/person\/a86f9dc39a36a8184d6e6f9a759f235c","name":"Hans Petter Jansson","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/33402e5005b34e5ee4ba4f9fd0c5d754f4505d5fb455736e5c585676cb7f2075?s=96&d=retro&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/33402e5005b34e5ee4ba4f9fd0c5d754f4505d5fb455736e5c585676cb7f2075?s=96&d=retro&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/33402e5005b34e5ee4ba4f9fd0c5d754f4505d5fb455736e5c585676cb7f2075?s=96&d=retro&r=g","caption":"Hans Petter Jansson"},"logo":{"@id":"https:\/\/secure.gravatar.com\/avatar\/33402e5005b34e5ee4ba4f9fd0c5d754f4505d5fb455736e5c585676cb7f2075?s=96&d=retro&r=g"},"sameAs":["https:\/\/hpjansson.org\/","https:\/\/x.com\/hpj"]}]}},"_links":{"self":[{"href":"https:\/\/hpjansson.org\/blag\/wp-json\/wp\/v2\/posts\/189","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hpjansson.org\/blag\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hpjansson.org\/blag\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hpjansson.org\/blag\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/hpjansson.org\/blag\/wp-json\/wp\/v2\/comments?post=189"}],"version-history":[{"count":50,"href":"https:\/\/hpjansson.org\/blag\/wp-json\/wp\/v2\/posts\/189\/revisions"}],"predecessor-version":[{"id":1276,"href":"https:\/\/hpjansson.org\/blag\/wp-json\/wp\/v2\/posts\/189\/revisions\/1276"}],"wp:attachment":[{"href":"https:\/\/hpjansson.org\/blag\/wp-json\/wp\/v2\/media?parent=189"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hpjansson.org\/blag\/wp-json\/wp\/v2\/categories?post=189"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hpjansson.org\/blag\/wp-json\/wp\/v2\/tags?post=189"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}