It’s gratifying to see GHashTable faring not-too-badly against an assortment of other hash table implementations in Nick Welch’s excellent benchmarks. I did some work on this previously, but there’s at least one thing that can still be done to reduce its memory footprint on 64-bit architectures. The table is a big array of structs that look like this:
The key_hash is useful, as it lets us skip over mismatches and resize the hash table quickly - and on a 32-bit arch, it adds only 4 bytes to every entry, for a total of 12 bytes. However, on a 64-bit arch, it causes entries to be padded so that each entry starts on a multiple of 8 bytes. That wastefulness can be remedied by packing key_hash values in a separate array - as a bonus, entry size becomes a power of two, which means offsets into the array can be calculated using a single shift. On the downside, we’ll incur the added overhead of maintaining two arrays and accessing an extra cache line for each lookup. I suspect it’s still worth it, though.
A couple of open questions/comments on the benchmarks themselves:
- Google’s dense_hash_map is suspiciously fast at integer deletion - is it shrinking the hash table at all, re-inserting the remaining items as it should? Or does it have some other trick up its sleeve?
- How robust are the different strategies with respect to poorly distributed keys? E.g. N*2^M spaced integers: 1024, 2048, 3072, 4096, etc.
- How about poor hash functions supplied by the API user? GHashTable attempts to tackle this by calculating the initial array index modulo a prime number, before applying a quadratic modulo on subsequent probes (faster, and required for quadratic probing).
- What’s the per-table overhead? How much memory will, say, some 100.000 tables with <100 items each consume? This is not uncommon in practice - nested hashes are often used to store complex properties, and given suitably large working sets, this can become a significant factor for some programs.
- Are the approaches affected by memory fragmentation? Do they cause it? This is hard to measure; maybe many tables could be grown simultaneously in the same address space.