Mouthfuls of UTF-8

Optimization historyFinal results Federico’s mention of g_utf8_offset_to_pointer() inspired me to take a look at this function; after all, it’s pretty low-level and bound to show up on profiles for a lot of widgets.

My initial reaction to the tiny bit of code there was that this cannot be optimized significantly. However, once I realized that loading and checking a single byte at a time is pretty wasteful when we know the minimum number of bytes we’re going to process, things started to get interesting.

It turns out that on a 32-bit arch, we can load 4 bytes at a time into a register, and then use that to skip several letters ahead per iteration for some common cases. These cases are: 4 ASCII letters, 2 2-byte letters and 2 3-byte letters (we only need the first byte of the sequence to know if it’s 3-byte, so we can fit two of these into our 4-byte register).

Initially I was a bit worried that unaligned accesses would slow us down, but it appears that at least on IA32 the penalty is much lower that what we’d get trying to ensure alignment (my initial try, which is not profiled here, did that). I hope people with access to other platforms can test those and fix pathological cases.

Thanks to Federico’s awesome plotting utility, I can now present to you the performance history of various approaches I tried to arrive at something that should be very close to the best we can do. There’s also a plot of the final gain, which is easier to read.

It appears I can’t attach text files to Wordpress posts, so I’ll just link to the patch. It’s a little half-baked currently - it needs to be made endianess-agnostic (currently it assumes little-endian).

Leave a Reply

Please copy the string qvvZAT to the field below: