GVFS Benchmarking

For Novell’s hack week, I wrote some benchmarking code for GVFS. It may not sound that exciting, but performance interests me, and it needed to be done. So far, the results are much better than I feared – for remote URIs, requests are proxied through a daemon over a D-Bus bus, and that had me worried.

In my particular setup, creating a file on a remote SMB share, filling it with 50MB data and reading it back took 16% longer using GVFS calls compared to bare POSIX and a kernel mount, and about twice as much CPU. As expected, for local FS operations the performance is pretty much equal.

There’s also a many-small-files test, in which I suspect GVFS will fare a lot worse, but I haven’t been able to make a good comparison due to some incomplete code paths in GVFS.

The code is on my GVFS branch in the new ‘test’ directory.

GVFS Progress

Alexander Larsson has been hacking like a whirlwind, bringing us the next generation in VFS services for the desktop, GVFS. By now, a lot of the planned functionality is done, and we even have a partially done FUSE frontend which will let legacy apps that can’t or won’t link with GVFS access the user’s mounts under ~/.vfs/.

Alex’ master repository does not have the FUSE module yet, so you can get it from my repository in the meantime.

Unfortunately, the SMB backend is pretty flaky, frequently locking up when reading directory information or file data from remote shares. So if you’re a debugging hotshot and you want to help bring desktop file browsing to the next level, here’s your chance!

GVFS is pretty easy to set up and test:

  • Clone, build, install to $prefix.
  • Make sure you have a working D-Bus setup.
  • Make sure you have the FUSE kernel module installed.
  • mkdir ~/.vfs
  • $prefix/libexec/gvfs-daemon
  • $prefix/libexec/gvfs-fuse-daemon -f ~/.vfs
  • $prefix/bin/gvfs-mount smb://server/share

The new mount should show up under ~/.vfs/, and you can explore it from there.

Priorities

In addition to the kernel’s per-process CPU and I/O priorities, it would be nice to have memory residency priorities. That way, we could hint the kernel into keeping proportionally more pages of latency-sensitive desktop processes in RAM – like the main menu, the taskbar, the file manager and maybe some applets. Disk cache could have its own priority, a la the swappiness setting in /proc/sys/vm/swappiness. It could be a practical* way to mitigate the “my main menu takes several seconds from click to paint” problem.

* Since I’m not a kernel dev, I’m just guessing here.

Well Situated, Friendly Neighborhood

Ben has some good news for anyone interested in reducing the memory usage of Linux programs. Smaps is great. But does it tell the whole truth?

For PCs with (basically) limitless swap space, it’s possible that it doesn’t even come close. The reason: Under memory pressure, only memory pages that are actually being accessed will be kept in main memory. These pages are usually 4kB each. So how much of a hog your process will be is determined by its access pattern. Which brings us to our dear old friend, Locality. It’s easy enough to deduce that a process with many tiny, frequently accessed memory blocks sprinkled in between “low urgency” blocks over a lot of pages will be a disaster compared to a process with frequently-accessed data allocated close together.

So, it’s possible that an even better measure of a process’ memory badness would be a running count of “number of different pages accessed in the last 10 seconds”, possibly with a falloff function (think load average). Maybe this could be done with a Valgrind tool, a la Cachegrind but for VM pages instead of CPU cache lines? Maybe it could even tell you where the N most frequently accessed memory blocks were allocated, and for each block, provide a list of locations where it was accessed (sorted by time last accessed or by frequency).

That sounds like a fun project!

Session-wide valgrind

Now that we know that even Evolution runs under Valgrind, we need a bigger challenge. So, how about the entire GNOME session?

I’ve written a couple of tiny scripts that lower the threshold to doing this. They take care of properly launching your session in Valgrind, collecting (and filtering slightly) the resulting logs and cleaning up (important, since you can get lots of lingering valgrind processes if you don’t).

I made an OpenSuSE package (for the 10.2 beta, possibly 10.1) that integrates this functionality as a pair of standard GDM sessions that you can select on login. Just click “Session” in the login screen and select one of the GNOME Valgrind ones, then log in using that. When you log out, it’ll generate a log file in “$HOME/valgrind-session.$N”.

The generated reports from a typical session say something about our code quality. Especially the leak report is interesting – the log file, after everything but “definite leaks” (i.e. allocated blocks to which no pointers exist, neither to the beginning nor to an internal offset) is removed, is about 2MB for a login + immediate logout here. Even though there are many repetitions and fairly harmless leaks, there are some serious-looking ones in there:

[~] grep “definitely” valgrind-session.0 | wc -l
1104

Just install and restart GDM. At least one billion bytes of RAM recommended to run.

Not everything works in an instrumented session (su, sudo definitely don’t, and I’ve had problems with “recent items” and logging out using the slab – remedied by adding a logout button to your panel), but overall it’s not bad. You can browse the web, read mail, use Nautilus, customize your desktop, launch applications (which will themselves be instrumented) etc.

If you like proactive bug fixing (and have fairly powerful hardware), I encourage you to check it out and maybe even improve on the concept (there’s a lot that could be done).

Evolution is da logic bomb

You know the story. Random crashes preventing you from reading your mail all morning. This time, though, there’s a twist (and a moral).

The twist is that instead of complaining on IRC – ok, I mean in addition to complaining on IRC – I actually ran the crashy bugger through valgrind, much like you would run a zombie head through a blender. Sifting through the resulting goop provided me with enough information to file patches for buffer overrun 1, buffer overrun 2 and bug of the theoretical variety. All three bugs have been around for a really long time (several years).

As for the moral:

1) Valgrind works extremely well these days, even on large and complex programs like Evolution. It is nothing short of a masterpiece. It did not interfere with operation apart from the expected slowdown, and pinpointed the bug I was looking for (and then some) in a matter of minutes. It is highly recommended that programs be valground regularly with a “typical use” regimen, even if they appear to work fine. At the very least, this should be on all maintainers’ pre-release checklists.

2) If you’re a programmer, and a particular program is misbehaving for you, take the time to actually look for the bug. Valgrind makes it easy, and you’ll find trivial bugs even in large and complex programs. So there’s no reason to be intimidated. Even if you can’t immediately say what’s causing the problem, valgrind logs make for valuable bugzilla attachments.

3) Valgrind’s performance isn’t too bad, but it’s still the best excuse today for getting a faster computer. Start using it so you can justify the expense.

4) With a little time investment, Evolution is totally salvageable. If you were thinking of giving up on it, don’t. Version 2.8 has a tri-pane mail view and global search, making it an awesome mailer.

Flow in CVS

I just imported Flow to GNOME CVS (module “flow”). It has a fairly detailed HACKING file for those interested.

The low-level I/O is done (for Unix), modulo a little polish. Mid-level stream fundamentals are mostly done, but I need to write stream elements encapsulating all the low-level features. Work on the high-level easy-to-use interfaces has not yet started.

FIXME: Article summary here

How many FIXMEs do the various projects in my jhbuild directory have? Curiosity got the better of me today:

hpj [~/work/jhbuild/gnome-2.16] for PKG in $(find . -maxdepth 1 -mindepth 1 -type d | cut -b 3- | sort); do N_FIXME=$(find $PKG -iregex ‘.*.(c|h|py|cpp|cc|hh|cs)$’ -exec grep -i FIXME {} ; | wc -l); printf “% 5d %sn” $N_FIXME $PKG; done | sort -n -r

718 evolution
444 evolution-data-server
429 gtk+
151 gstreamer
148 gtkhtml
147 nautilus
146 gst-plugins-base
125 gnome-vfs
123 ORBit2
107 gst-plugins-good
104 mozilla
89 metacity

…and so on. Try it yourself!

I expect these come in all difficulty levels – from the usually quite easy “should I free this?” to the harder “oh my god I’ve painted myself into a corner and need to re-architect a large chunk of the program”. Some will be worth fixing, others will not. The good thing is that they pinpoint potential problems for free, without having to run a debugger. Something for those oh-so-frequent idle moments?

Garbage in, garbage out

Or with verbs: Connect. Send. Receive.

For a while now, I’ve been of the opinion that we need a generic streaming/networking library for the GNOME core platform, for two reasons:

1. The Unix I/O API is hard to use correctly, and many applications get it wrong.

For instance, a “readable” condition followed by a zero-byte read() means that a socket was disconnected in an orderly fashion. This is not good API.

Problems caused by this API include:

  • Blocking the UI while waiting for a network or file operation. People don’t want to implement a buffering async pump every single time. By the way, did you know that local file I/O always blocks on Linux (and probably most other Unices), even if you set O_NONBLOCK?
  • Spinning on a slow, non-blocking FD, pegging the CPU.
  • Failure to save errno early. This leads to the infamous “Connection failed: Success” and other equally confusing error messages.
  • Not checking for the EINTR non-error, terminating a perfectly good FD and potentially losing data.
  • Writing to a socket that was closed in the remote end, resulting in an unhandled SIGPIPE and terminating the process.
  • Not checking the return value from close(). Very few apps do this. See the close(2) man page for why you should.
  • Closing an already closed FD. Since nobody checks the return value from close(), this goes largely undetected until the code is used in a threaded app. Then it will cause heisenbugs when the FD is re-used by another thread between the first and second close() statements. Unless you know what you’re looking for, this is incredibly hard to pin down.

It would be nice to have a single place in which to fix this. Also, we can provide Windows portability for applications that insist on this (transparently handling the fact that on Windows, files and sockets are very different things).

2. There is too little code reuse in stream implementations.

We need a way to compartmentalize stream elements in C, so that they can be re-combined to achieve specific tasks. For instance, you may want to add a rate control/throttling element to a file transfer protocol, or construct a pipeline of exec() subprocesses – or you may want to do something more adventurous, like SSL + uuencode + unencrypted IM protocol. We can also provide policy to help ease the pain of shunting data back and forth. If you’ve written code that asynchronously forwards data from a pipe to another, handling slow consumer/fast producer and vice versa, you know how ridiculously complex such code can get. What we want (or at least, what I want) is the ability to just connect the black boxes and call it a day.

GStreamer has had success with such an element-based architecture, and I think it makes sense in the more generic case too.

A solution?

I’m working on a library to resolve this situation, with the working title “Flow”. Currently it’s about 1/3 finished, with 10k LOC written. It depends on GLib, GObject and GThread. I’m reusing old network code of mine that is known to compile and run on Linux, FreeBSD and Windows, which should ease portability once the initial API is done.

You can get more details about the implementation I’m aiming for, as well as a preliminary tarball with totally unfinished code in it (but it passes distcheck!). If the response is positive, I’ll transfer it to the GNOME wiki.

I’m aware of other efforts, like gnet and gnetwork/gio, but for various reasons I don’t like them. gnet is too naive, and although James is a great guy, I disagree with him about some of his gio/gnetwork design goals – specifically that elements are always two-way and the decision to implement loadable modules with an XML registry. So rather than spend my time arguing, I’m doing this. I hope I’m not offending anyone too much by doing so.