Archive for the 'Technical' Category

GNOME and Rust

I’ve been keeping an eye on Rust for a while now, so when I read Alberto’s statement of support for more Rust use in GNOME, I couldn’t resist piling on…

From the perspective of someone who’s quite used to C, it does indeed seem to tick all the boxes. High performance, suitability for low-level tasks and C ABI compatibility tend to be sticking points with new languages — and Rust kills it in those departments. Anyone who needs further convincing should read up on Raph Levien’s font renderer. The usual caveat about details vis-a-vis the Devil applies, but the general idea looks exactly right. Rust’s expressiveness and lack of baggage means it could even outperform C for non-trivial code, on top of all the other advantages.

There are risks too, of course. I’d worry about adoption, growth and the availability of bindings/libraries/other features, like a good optional GC for high-level apps (there is at least one in the works, but it doesn’t seem to be quite ready for prime-time yet). Rust is on an upwards trajectory, and there doesn’t seem to be many tasks where it’s eminently unsuitable, so in theory, it could have a wide reach: operating systems, platform libraries, both client- and server-side applications, games and so on. However, it doesn’t appear to be the de facto language in many contexts yet. Consider the statement “If I learn language X, I will then be able to work on Y.” Substitute for X: Java, Javascript, Python, ObjC, C, C++, C# or even Visual Basic — and Y becomes obvious. How does Rust fare?

That is, of course, a very conservative argument, while in my mind the GNOME project represents, for better or worse, and C use notwithstanding, a more radical F/OSS philosophy. Its founding was essentially formulated as a revolt against the Qt license (and a limited choice of programming languages!), it was an early adopter of Git for version control, and it’s a driver for Wayland and Flatpak now. For what it’s worth, speaking as mostly a downstream integrator, I wouldn’t mind it if GNOME embraced its DNA yet again and fully opened the door to Rust.

Hash Table Shootout!

It’s gratifying to see GHashTable faring not-too-badly against an assortment of other hash table implementations in Nick Welch’s excellent benchmarks. I did some work on this previously, but there’s at least one thing that can still be done to reduce its memory footprint on 64-bit architectures. The table is a big array of structs that look like this:

struct _GHashNode
{
  gpointer   key;
  gpointer   value;
  guint      key_hash;
};

The key_hash is useful, as it lets us skip over mismatches and resize the hash table quickly - and on a 32-bit arch, it adds only 4 bytes to every entry, for a total of 12 bytes. However, on a 64-bit arch, it causes entries to be padded so that each entry starts on a multiple of 8 bytes. That wastefulness can be remedied by packing key_hash values in a separate array - as a bonus, entry size becomes a power of two, which means offsets into the array can be calculated using a single shift. On the downside, we’ll incur the added overhead of maintaining two arrays and accessing an extra cache line for each lookup. I suspect it’s still worth it, though.

A couple of open questions/comments on the benchmarks themselves:

  • Google’s dense_hash_map is suspiciously fast at integer deletion - is it shrinking the hash table at all, re-inserting the remaining items as it should? Or does it have some other trick up its sleeve?
  • How robust are the different strategies with respect to poorly distributed keys? E.g. N*2^M spaced integers: 1024, 2048, 3072, 4096, etc.
  • How about poor hash functions supplied by the API user? GHashTable attempts to tackle this by calculating the initial array index modulo a prime number, before applying a quadratic modulo on subsequent probes (faster, and required for quadratic probing).
  • What’s the per-table overhead? How much memory will, say, some 100.000 tables with <100 items each consume? This is not uncommon in practice - nested hashes are often used to store complex properties, and given suitably large working sets, this can become a significant factor for some programs.
  • Are the approaches affected by memory fragmentation? Do they cause it? This is hard to measure; maybe many tables could be grown simultaneously in the same address space.

Shoveling snow

The snow-shoveling I’ve been taking part in over the last couple of weeks is best described with a set of graphs:

openSUSE boot time improvements on netbook

So far, we’ve been able to lop about 23 seconds - or 48% - off the time it takes to boot openSUSE 11.1 on this particular netbook, without sacrificing much in the way of functionality. It boots straight into GNOME and its usual trappings, including the panel, Nautilus, “slab” main menu, nm-applet, PackageKit updater, printing applet (written in Python…), CUPS, etc.

It’s important to note that this time is measured from the moment bootchart starts until everything settles and is ready to use, easily identified in the chart as the moment where CPU activity falls to the baseline of noise from bootchartd itself.

It’s also important to note that this is on a netbook with a slow CPU, slow-to-init X driver/graphics hardware and fast SSD I/O. I’m hearing a lot of numbers being bandied about these days, e.g. “distribution Foo boots in 10 seconds”, and these numbers are meaningless without hardware specifications and a list of features you get. GNOME delivers a different feature set from Xfce, and netbooks and workstations usually perform very differently. Then there are questions of flexibility; is the system open-ended? Can you get server features by just installing packages and configuring them?

IMO, openSUSE has had unacceptable boot times on workstations for a long time now. Hopefully these changes will make it into future releases, upstream where possible.

For more details, see the wiki page. Note that for various reasons I haven’t been able to keep the text up to date. The graphs are representative, though.

Evolution goodness (and some badness)

My laptop went south a couple of days ago, so I’m having to make do with a screen that is bigger but endowed with fewer pixels. This has been a source of frustration, especially in Evolution, where I depend on the efficiency afforded me by the tri-pane view. Crank down the resolution a bit, and it’s suddenly not so efficient - there isn’t enough space to display the subjects in the message list anymore. The problem is compounded by useless mailer “Re:” and mailing list prefixes.

So, since I don’t need to see the mailing list and reply status repeated for every single mail, I cooked up a little patch to trim the subjects in the message view. When applied, it makes available a new column in View -> Current View -> Define Views… -> Edit -> Fields Shown… -> Available Fields. This column implements the trimming, and can be used instead of the traditional Subject one:

Evolution, traditional subjects Evolution, trimmed subjects

Traditional

Trimmed

The patch applies to both Evolution 2.22 and 2.24, although unfortunately, a couple of nasty, new bugs are preventing me from running the latter. If you happen to be running openSUSE Factory like me, and Evolution 2.24 is preventing you from getting work done, you can get my unofficial 2.22 build for Factory from the build service. It includes the above patch as an added bonus.

GDK indirect rendering backend online

I just finished touching up that GDK backend for rendering toplevel windows to Cairo image surfaces I briefly presented at GUADEC. It now compiles with the latest GTK+ trunk, and it even has a page with instructions for how to check it out and build it. There are some examples too.

With time, I can hopefully get this to production quality levels.

It’s not a picnic table, part 2

As a followup to my previous GHashTable post, Benjamin encouraged me to time the swfdec test suite - “make check” in the tests dir is consistently 4% faster with the patch.

Also, I have patched packages for our brave openSUSE Factory users: 1-click install

It’s not a picnic table

 
Picture by Mike Bitzenhofer. License:
Because I’m just as cool as Luis.

For some time now, I’ve been considering the possibility of tightening the bolts on GHashTable, turning the idea over and over in my mind but not having had a chance to put it into practice. This can be pretty frustrating, since it’s hard for me to stop thinking about a salient idea like that without at least investing the work needed to convince myself that it’s not worth doing. It’s like an unidentified glimmer at the bottom of a lake. A silver hoard or an old bicycle? Probably the latter. But who knows?

So today I finally dove in and brought back some results.

First off, why optimize GHashTable? A hash table is a trivial data structure, so it should already be pretty optimal, right?

Let’s take a look.

As we know, any hash table will have collisions, where different keys hash to the same value. GHashTable resolves such collisions using chaining; each bucket points to the head of a linked list of elements that hash to that location. This is simple and robust, but it also has some disadvantages: One is poor locality of reference (oh, the locality!) and another is that you have to allocate and free memory, contributing to memory fragmentation over time. You also lose some memory to the extra pointers stored (one per item plus one per chain).

To improve on this, we can use open addressing: Instead of storing pointers to lists of items, we store the items in the table itself. Whenever there’s a collision, we use an algorithm to determine another suitable slot for the item, and if that slot is also occupied, we run the algorithm again until an empty slot is found. This is called probing and is obviously a more complicated business, considering that we want a) a relatively simple algorithm, b) a predictably low amount of probing and c) the ability to quickly remove items without affecting the probe sequences for other items in the table.

The simplest possible probing algorithm is called linear probing, and consists of just looking at the next slot in the table (modulo the table size) until you find what you’re looking for. Since this also has the best locality, it’s the first approach I tried. Unfortunately, due to clustering, the performance degrades too much before you reach a satisfactory load factor, even with good locality.

Quadratic probing is the next level of complexity and the one I settled with. Here, you use a quadratic polynomial to calculate the next location, which means you take bigger and bigger steps, avoiding clustering. This also affects cache performance somewhat, but considering that the few first steps are small, it’s still much better than chaining.

Since an element can be part of many probe sequences, you have to leave a special marker - commonly called a tombstone - when you remove it, so lookups don’t terminate prematurely. This isn’t a big problem, since tombstones can be reused, but you have to clean them up now and then so you don’t get too many - according to my benchmarks, this will almost never happen in GNOME apps, so shouldn’t affect performance much.

A bigger problem is that in order to guarantee that a probe will find an empty slot with load factors greater than 0.5, the table’s size has to be a power of two, which means the modulo operation will simply discard the high bits of the computed hash values. GHashTable currently uses prime-sized tables, which yields a better distribution with poor hash functions. On the other hand, taking a modulo of a power of two is a logical AND operation with a mask, which is much faster than the implied division required by prime modulos. Anyway - again, I couldn’t find any existing use cases where this caused performance problems.

Which brings me to the fact that this exercise would be meaningless without proper benchmarking. Now, I admit that due a lack of time and good tools, I haven’t been able to measure the effect on fragmentation, but it’s fairly obvious that allocating one big chunk of memory is better than one big chunk and a variable number of small chunks.

Apart from reducing fragmentation, which was my primary goal, it would be nice if speed and memory use wasn’t affected too badly. And that at least, I’ve measured.

Since synthetic tests are rarely representative, my test case was “Starting up Evolution offline with 3 gigabytes of local mail and 13 vfolders, waiting for it to finish setting up the vfolders, then quitting by closing the window with no interaction otherwise”. I measured memory use before quitting, using pmap, and total CPU time spent, using time. I realize you can’t repeat this test, it being tied to my local mail setup, but tests are rarely repeatable across computers anyway, since they have differing specs and configurations. Get the patch from the bug I filed, and run your own tests. I’d love to hear about your results!

Comparing the best out of 5 Evolution runs with old and new hash table code yielded the following numbers:

  • Time on CPU (user+sys): 956.3s, down from 1023.1s (6.5% improvement)
  • Private, writable memory: 623.5MiB, down from 645.8MiB (3.5% improvement)
  • Heap size ([heap] mapping): 310.6MiB, down from 350.0MiB (11.3% improvement)

It looks modest, but considering hash tables account for only a small part of Evolution’s resource usage, I’m rather pleased with this. It cuts memory use of most GNOME processes by a slight amount - for instance, the GNOME calculator in scientific mode gave up 140KiB of private, writable memory. You may see greater returns in apps that use Glade, because it keeps widget lookup tables around.

End of hack week

The second Novell hack week is at an end, and I’ve set up a public Trac instance for Sterling. Development will go on there. I’ll devote some ITO (”innovation time off”) to it next time I get a chance.

Novell hack week, day 4

Four days down. How time flies when you’re having fun. I must admit that Sterling is barely past the point where it creates a panel applet that does nothing - but on the other hand, I’ve absorbed a bit of how-to on C# and Mono tools. It’s rather pleasant so far.

Apart from the panel applet business, I’ve gotten the build framework for GConf schemas, gettext, icons and UI files out of the way, so I think it’s fair to say it’s getting ready to administer kicks to the proverbial hindside. I’ll have a public Git repo up sometime tomorrow.

Sterling panel applet
Hey, it’s a start.

Novell hack week, day 3

Federico and Tambet came and hacked in addition to Mike and me. Got some stuff done, but not as much as I’d have liked to. The arrachera I had for lunch sort of knocked me out.

I’d better get some sleep now. More tomorrow.