Archive for the 'Positive' Category
My sister made me a wonderful laptop case for my birthday - complete with zipper and everything! Thanks, sis.
It’s gratifying to see GHashTable faring not-too-badly against an assortment of other hash table implementations in Nick Welch’s excellent benchmarks. I did some work on this previously, but there’s at least one thing that can still be done to reduce its memory footprint on 64-bit architectures. The table is a big array of structs that look like this:
The key_hash is useful, as it lets us skip over mismatches and resize the hash table quickly - and on a 32-bit arch, it adds only 4 bytes to every entry, for a total of 12 bytes. However, on a 64-bit arch, it causes entries to be padded so that each entry starts on a multiple of 8 bytes. That wastefulness can be remedied by packing key_hash values in a separate array - as a bonus, entry size becomes a power of two, which means offsets into the array can be calculated using a single shift. On the downside, we’ll incur the added overhead of maintaining two arrays and accessing an extra cache line for each lookup. I suspect it’s still worth it, though.
A couple of open questions/comments on the benchmarks themselves:
- Google’s dense_hash_map is suspiciously fast at integer deletion - is it shrinking the hash table at all, re-inserting the remaining items as it should? Or does it have some other trick up its sleeve?
- How robust are the different strategies with respect to poorly distributed keys? E.g. N*2^M spaced integers: 1024, 2048, 3072, 4096, etc.
- How about poor hash functions supplied by the API user? GHashTable attempts to tackle this by calculating the initial array index modulo a prime number, before applying a quadratic modulo on subsequent probes (faster, and required for quadratic probing).
- What’s the per-table overhead? How much memory will, say, some 100.000 tables with <100 items each consume? This is not uncommon in practice - nested hashes are often used to store complex properties, and given suitably large working sets, this can become a significant factor for some programs.
- Are the approaches affected by memory fragmentation? Do they cause it? This is hard to measure; maybe many tables could be grown simultaneously in the same address space.
Since we’re leaving the country for good (or at least for a very long time), we thought it’d be nice to do the full-on tourist thing and take a bunch of pictures - actually the only time we’ve done so in the ten years I’ve lived here - taking a little piece of Mexico with us.
So Maru and I just got back from two weeks of
vacations non-essential travel. We’ve had an excellent time, spending the first week in the northern states of Chihuahua and Sinaloa - taking the Chepe train through Copper Canyon territory and reaching an altitude of about 2600m - and the second week on the southern island of Cozumel, scuba diving down to -8m.
The influenza outbreak took us by surprise - we’ve passed through the Mexico City airport three times since the 18th of April, and hope to do so again in another couple of days - but we are apparently both healthy at this point. It’d be a bummer if our flight out gets cancelled or - even worse - if we’re quarantined in Europe, though. Fortunately, the way things are looking now, there isn’t a huge chance of that happening.
On the upside, we had Cozumel almost to ourselves (we were referred to as “the only two tourists left on the island” at least once), as people kept leaving and no more were arriving. I feel bad for anyone working in the tourist business here, though, especially our friend Hilda who lent us her
battered charming open VW beetle so we could cruise around the island in style.
One of the rivers winding through the Copper Canyon
SUSE rocks (I suspect Bryen will love this)
Ghost island Cozumel
Can you believe they actually let us through the security checkpoints dressed like this?
The snow-shoveling I’ve been taking part in over the last couple of weeks is best described with a set of graphs:
So far, we’ve been able to lop about 23 seconds - or 48% - off the time it takes to boot openSUSE 11.1 on this particular netbook, without sacrificing much in the way of functionality. It boots straight into GNOME and its usual trappings, including the panel, Nautilus, “slab” main menu, nm-applet, PackageKit updater, printing applet (written in Python…), CUPS, etc.
It’s important to note that this time is measured from the moment bootchart starts until everything settles and is ready to use, easily identified in the chart as the moment where CPU activity falls to the baseline of noise from bootchartd itself.
It’s also important to note that this is on a netbook with a slow CPU, slow-to-init X driver/graphics hardware and fast SSD I/O. I’m hearing a lot of numbers being bandied about these days, e.g. “distribution Foo boots in 10 seconds”, and these numbers are meaningless without hardware specifications and a list of features you get. GNOME delivers a different feature set from Xfce, and netbooks and workstations usually perform very differently. Then there are questions of flexibility; is the system open-ended? Can you get server features by just installing packages and configuring them?
IMO, openSUSE has had unacceptable boot times on workstations for a long time now. Hopefully these changes will make it into future releases, upstream where possible.
For more details, see the wiki page. Note that for various reasons I haven’t been able to keep the text up to date. The graphs are representative, though.
My talk, La comunidad GNOME para principiantes (The GNOME community for beginners), seems to have gone over well here at ENLi 2008 (the 2008 National Linux Meeting in Puebla, Mexico), with a big audience and interesting followup questions. The slides are available as a collection of plain PNG and JPEG images in a zip archive (use the link above).
I’m having an excellent time. Will post some pictures from the conference later.
My wonderful audience
I clearly didn’t bring enough openSUSE discs
My laptop went south a couple of days ago, so I’m having to make do with a screen that is bigger but endowed with fewer pixels. This has been a source of frustration, especially in Evolution, where I depend on the efficiency afforded me by the tri-pane view. Crank down the resolution a bit, and it’s suddenly not so efficient - there isn’t enough space to display the subjects in the message list anymore. The problem is compounded by useless mailer “Re:” and mailing list prefixes.
So, since I don’t need to see the mailing list and reply status repeated for every single mail, I cooked up a little patch to trim the subjects in the message view. When applied, it makes available a new column in View -> Current View -> Define Views… -> Edit -> Fields Shown… -> Available Fields. This column implements the trimming, and can be used instead of the traditional Subject one:
The patch applies to both Evolution 2.22 and 2.24, although unfortunately, a couple of nasty, new bugs are preventing me from running the latter. If you happen to be running openSUSE Factory like me, and Evolution 2.24 is preventing you from getting work done, you can get my unofficial 2.22 build for Factory from the build service. It includes the above patch as an added bonus.
A friend of mine, Vegard Munthe, works for FAIR, an aid organization that ships used but working computers from rich (or industrialized, or first world, or whatever you want to call it) countries to poorer countries for re-use in school labs there. As part of the deal, the computers are shipped back for reprocessing when they are no longer working, to avoid them piling up and causing all kinds of environmental problems. Not so long ago they received their first return shipment - according to Vegard, getting the permits to ship and import what basically amounts to a pile of toxic waste was quite the challenge.
Fun fun fun! Congratulations to Vegard & crew on this important milestone.