Archive for November, 2005

Powered by Sorcery

Federico and I had one of our long, meandering discussions today, touching on a range of topics. One of the things we’ve been talking about recently is how to get beginners interested in the fundamentals of computing, including how said fundamentals should be taught, and how to enable exploration. We agreed that since the 80s, computers have become easier to use (for mundane tasks), but in many instances also harder to program.

For instance, one of the exciting things we could do when we were kids was make the computer draw pixels on the screen. When you learn how to do this, the possibilities for further exploration are endless (no I’m not coming out to play, I’m pushing pixels!). Maybe you’d show off your first wicked graphics hack (random pixel spray #5) to your dad, and he’d go “oooh,” and reward you with a floppy disk on which to save it. After a while, and with some help, you’d be exploring curves, fractals, automata or perhaps a Snake clone of some sort. Eventually you’d grow up and be excited about serious shit.

So how would you go about doing that today, on a PC running GNU/Linux? The answer is, you need to learn an API. If you’re lucky, someone will set you up with a sandbox, or you’ll run into SVGALib and get it to work on your graphics adapter. If you’re unlucky, you’ll try to do it with a GUI toolkit - and fail. Case in point, GtkDrawingArea takes some figuring out. You must be a very motivated beginner if you’re ever going to see your pixel using that.

Now, for fun-loving hackers who somehow do manage to produce pixels and such, I have a book recommendation; it’s called The Magic Machine, and is a gold mine of fun hacks, based on a column in Scientific American from back when it didn’t suck quite as much. Ignore the title and cheesy cover - the book introduces topics like fractals, artificial life and text processing (including markov chaining) in a way that makes for easy implementation, but without being tied to any specific architecture, technology or knowledge base. I lost my copy somewhere between Fredrikstad, Norway and Xalapa, Mexico, but I remember it as one of the books that got me excited about programming. Technology books like XML in a Nutshell don’t do that.

Ready, on, boil

Got a new boiler installed today. This one is automatic, so we now have hot water on demand - which means I can stumble out of bed and directly into the shower, instead of getting dressed, stumbling outside, fiddling with flammable gas supply, stumbling back inside, staring at my desk for 20 minutes, undressing, showering and remembering to turn the boiler back off. Quite an improvement.

Telmex has been dicking around with my precious Internet all day. After weeks of different technicians demanding I cycle the modem’s power, replace my filters, replace most of my phone cabling (which I, out of sheer desperation, let them do), &c, they’ve been slowly arriving at the conclusion I suggested to begin with: Something’s wrong at the central.

I’ve been patient, letting them do their thing by the book, remembering that if 1) you fail to listen to an engineer and 2) something bad happens, you may be subject to 3) additional screwage, even if 1) and 2) are unrelated.

The central may apparently take another couple of days to fix. To their credit, though, this hasn’t cost me a thing apart from my copious spare time.


Elves are overrated.



To which race of Middle Earth do you belong?
brought to you by Quizilla

Or is it?


Optimizing really tiny things is hard, and maybe my initial feeling was correct in that this cannot be optimized. However, I ran Behdad’s test on my computer, and it came out as above. This is the exact same test on the dataset that federico provided (however, I profiled only the original and Behdad’s code, as that should be sufficient to demonstrate any differences between his test setup and mine). It still shows a significant gain, especially for latin-1 languages.

I compiled the test with -g -O2, as is necessary to see the benefits of optimized C code. Behdad: I don’t know which flags you used. If we’re using the same compiler flags, the difference must be down to the fact that I’m testing on an Athlon and you’re on a P4. That would be pretty amusing.

Oh, and a profile of a GtkTextIter test case that I got from Larry the other day had g_utf8_offset_to_pointer() show up as using 50% of the CPU time. I don’t know if it’s a realistic case, though. At least we’re learning about optimization in the context of distributed efforts. Or something.

Mouthfuls of UTF-8

Optimization historyFinal results Federico’s mention of g_utf8_offset_to_pointer() inspired me to take a look at this function; after all, it’s pretty low-level and bound to show up on profiles for a lot of widgets.

My initial reaction to the tiny bit of code there was that this cannot be optimized significantly. However, once I realized that loading and checking a single byte at a time is pretty wasteful when we know the minimum number of bytes we’re going to process, things started to get interesting.

It turns out that on a 32-bit arch, we can load 4 bytes at a time into a register, and then use that to skip several letters ahead per iteration for some common cases. These cases are: 4 ASCII letters, 2 2-byte letters and 2 3-byte letters (we only need the first byte of the sequence to know if it’s 3-byte, so we can fit two of these into our 4-byte register).

Initially I was a bit worried that unaligned accesses would slow us down, but it appears that at least on IA32 the penalty is much lower that what we’d get trying to ensure alignment (my initial try, which is not profiled here, did that). I hope people with access to other platforms can test those and fix pathological cases.

Thanks to Federico’s awesome plotting utility, I can now present to you the performance history of various approaches I tried to arrive at something that should be very close to the best we can do. There’s also a plot of the final gain, which is easier to read.

It appears I can’t attach text files to Wordpress posts, so I’ll just link to the patch. It’s a little half-baked currently - it needs to be made endianess-agnostic (currently it assumes little-endian).