The GNOME project turned 23 this year, and despite equally persistent rumors to the contrary, it's still alive and kicking.
Just how alive, though? All I know is this: Where the topic of GNOME's health goes, accurate data rarely follows. Of course, there is data — lots of it in fact, in public source code repositories. Though flawed in many ways, it allows us to make comparisons to the past — and maybe predictions for the future: Are a few organizations carrying most of the workload, making them critical points of failure? Are new contributors able to pick up the slack from those who leave? Is the project graying (i.e. increasingly dominated by veterans)?
In one of my occasional fits of hubris, I set out to process this data to see if I could shake out anything meaningful. I'm usually fine with just satisfying my own curiosity and leaving it at that, but it's one of those times where the results seem interesting enough for a blog post. So here we are.
I'm going to lead with the nice graphs and follow on with a section on methodology. The latter is long, boring, and mandatory reading.
The stacked histogram above shows the number of contributors who touched the project on a yearly basis. Each contributor is assigned to a generational cohort based on the year of their first contribution. The cohorts tend to shrink over time as people leave.
There's a special "drive-by" cohort (in a fetching shade of off-white) for contributors who were only briefly involved, meaning all their activity fits in a three-month window. It's a big group. In a typical year, it numbers 200-400 persons who were not seen before or since. Most of them contribute a single commit.
According to this, GNOME peaked at slightly above 1,400 contributors in 2010 and went into decline with the GNOME 3.0 release the following year. However, 2020 saw the most contributors in a long time, even with preliminary data — there's still two weeks to go. Who knows if it's an anomaly or not. It's been an atypical year across the board.
This is the same histogram, but with per-month bins. There's a clear periodicity caused by the semiannual release cycle. The peak month was March 2011, right before the GNOME 3.0 release. About 450 contributors got involved that month.
The drive-by cohort is relatively smaller on a monthly basis. This makes sense, as it has little overlap from month to month, and the per-year bins tend to add them all up.
Above, the top 15 affiliations of active contributors. I've excluded personal accounts. This is pretty flawed (details below), but interesting nonetheless. For what it's worth, it mostly lines up with my memory of things.
The pattern tracks well with the total despite only capturing a minority portion of it. I think this means that paid and unpaid contributions are driven by the same underlying trends, or that there's a lot of the former hiding in the latter.
Here I'm counting the number of commits per year in the various cohorts.
At first glance, this looks much less dire. However, note how newcomers are having a smaller impact, especially from 2014 on. And the 2018-2020 bounce is entirely due to a handful of veterans making a comeback.
Half the commits in 2020 were made by contributors who've been with the project for ten years or more. Also noteworthy, drive-by commits are a vanishingly small portion of the total.
Top 15 affiliations again, but now ordered by commit counts. It's safe to say that GNOME is dependent on paid developers in a big way. Specifically, and to no one's surprise, it leans heavily on Red Hat.
A few observations can be made with confidence:
- By F/OSS standards, the project is not unhealthy. It has hundreds of experienced and first-time contributors every year. It is well-organized and arguably well-funded compared to its peers. But:
- Every metric has the project peaking around 2010.
- A diminishing number of veterans is doing an increasing share of the work.
- Although recruitment is stable, newcomers don't seem to be hitting their stride in terms of commits.
- Corporate sponsorship is probably necessary to keep the project going, but the field of sponsors has kept thinning.
I think GNOME is addressing the risk factors competently by modernizing infrastructure (GitLab, Discourse). This has obvious value even in the absence of quantifiable results, but it'll be interesting to see if the effect can be measured over the next couple of years.
Diminished enthusiasm may also be due to there being fewer ways for a new contributor to make their mark or assume a role of responsibility. GNOME has become more conservative, certainly much more so than it was a decade ago in the run-up to GNOME 3. The rationale and phrasing in the announcement of the new versioning scheme (e.g. "Radical technological and design changes are too disruptive for maintainers, users, and developers") seems indicative of this trend1.
Notes on methodology
So what's wrong with this analysis? If you're so inclined, you can find the details under the next couple of subheadings and pass harsh, harsh judgement.
I've set the unscientific rigor bar high enough to hopefully yield something useful, but low enough that I could do it in my spare time and not get stuck in the dreaded state commonly known as "90% done".
I aggregated data from 189 Git repositories. The vast majority of these are hosted on
gnome.org, with a handful from
github.com. Commits are uniquely identified by their commit hash, meaning trivial duplicates are counted only once.
GNOME has always been a decentralized, big-tent project, so it's not obvious how to delineate it. I've tried to be fair by including most of the repositories from a full meta-gnome-desktop jhbuild, including fairly low-level dependencies like Cairo, Pango, and Pipewire, as well as past, present and would-be flagship applications under the GNOME umbrella. Documentation and infrastructure is represented, as are many archived projects (e.g. ORBit2, Bonobo, Sabayon, GAL).
I was a little uncertain about what to do with X.Org and Wayland. In the end I decided to include the latter, but not the former, since Wayland has close ties to GNOME (it even references GTK+ in its TODO file), while X.Org has its roots in the much older XFree86.
Mono is another project I resisted including; its development was tangential to GNOME proper, diverging completely in the most recent decade. However, I did include GtkSharp and several GNOME-hosted C# applications common on desktops in the 2005-2010 time frame.
Since I haven't established hard criteria for module selection, it's subject to various biases. Older code is probably underrepresented, since providers of important functionality were more loosely attached to the project early on (e.g. GNOME Online Accounts and Telepathy got pulled in, should I have included Gaim or Pidgin too? How about XChat?).
Anyway, the list isn't terrible, but there's room for improvement.
Similar studies often identify contributors by their e-mail addresses. I used full author names instead, since there's good reason to think they're more stable over a 20-year time span. We're fairly consistent in spelling our own names, and we change them rarely (often never). On the other hand, e-mail addresses come and go with different hosting arrangements, employers, etc.
An added challenge with this approach is that sometimes different people have the exact same name. In practice, I'm not aware of any instances of this happening in GNOME. It seems to be rare enough that I doubt it'd introduce significant error in most projects.
I should add here that the drive-by cohort depends on a fair amount of hindsight (you never know when someone might come back with more contributions, but the likelihood drops off quickly as time passes). This means the cohorts for 2020 are preliminary. They'll be a lot more accurate with another run late next year.
I'm using e-mail domain names as a proxy for organizations in some of the graphs. This is a notoriously unreliable approach for at least three reasons:
- Contributors often use personal e-mail addresses for paid work, leading to significant undercounting in general.
- Specific companies may require their employees (or ask them nicely) to use company e-mail for collaboration. Out of the listed companies, I know of at least one that definitely did this. However, there are many that don't, and these will be comparatively less well represented.
- The mapping between DNS and organizations isn't one-to-one. A company may operate under multiple names or TLDs (e.g.
Despite these weaknesses, it's common to slice the data this way. It's difficult to do better without access to semi-closed data troves, and depending on your views on privacy and ability to handle PII safely, it might not be something you'd want to get into anyway. But I bet you'd be well-positioned for it if you were, say, the corporate owner of both LinkedIn and GitHub.
When grouping by organization, the goal is to get an idea of which outside entities are sponsoring contributions. Therefore, I've filtered out addresses from the biggest mass e-mail providers like
@gmail.com and project-centric providers of personal accounts (e.g.
I took the liberty of reassigning the personal domains of a few extra prolific authors who would've otherwise showed up as individual organizations. Since there's no way I'm doing it for everyone, this introduces some bias. The full details are in the project's metadata file (see: code).
Version control systems
Changeovers in version control systems divide GNOME's VCS history into three eras with noticeable discontinuities between them.
Before 1998: Dark ages
In the Bad Old Days, Free Software would often use plain RCS or no version control at all. I have basically no data for this era: The GIMP, being the ur-project from which GTK+ spawned, was imported to CVS in November 1997, but by then it had already been in development since at least mid-1995. It may be possible to reconstruct it somewhat by diffing old tarball releases. Linux historians have done this for the kernel.
GNOME projects were mostly maintained in CVS from 1998 on, with infrastructure provided by Red Hat. A few companies (e.g. Ximian) maintained projects in their own CVS instances that were later consolidated under GNOME.
CVS had many limitations. For instance, history edits and other complex operations — like, oh, renaming a file — fell under the technical term "surgery" and the auspices of a competent server-side surgeon. The centralization of accounts also fostered a workflow where outside contributions were committed without any formal authorship metadata. This shows up in my plots as undercounting of active contributors.
GNOME moved to Subversion in 2007. While technically superior to CVS, it was still a centralized file-tracking solution and didn't change the workflow very much.
Subversion didn't last long; 2009 saw the move to Git. The active contributor count shot up that year, and part of this is due to more accurate authorship metadata. I think there's a case to be made that involvement had been gradually increasing even before Git's introduction, but moving to a proper DVCS certainly didn't hurt.
Since a lot of contributors moved off
@gnome.org in this switch, and affiliations are assigned based on e-mail addresses, the discontinuity is most visible in these graphs.
I expected the improved history management (and reduced commit anxiety) in Git's wake would also have produced more numerous commits. The data doesn't really bear this out — the count did increase the following year, but it's hard to distinguish from the general momentum leading up to GNOME 3.
I wrote a small program to automate this somewhat. It's nothing much, but at least it can serve as a humorous example of what can happen when your reach starts to exceed awk's grasp and it occurs to you that hey, I should use Rust for scripting!
I've uploaded the report data used in the charts in CSV format. It should be fairly self-explanatory and can be imported directly in LibreOffice (UTF-8, comma-separated).
According to a quick tally, I've done enough work on GNOME projects for a place in the top 3% of committers2. That's decent enough, but the lion's share of it is, shall we say, not very recent. I don't presume to speak for the project or, in fact, any group at all.
1 Not necessarily a bad thing. There's something to be said for not constantly yanking the rug out from beneath everyone's feet.
2 Humblebrag aside, I'd like to emphasize that since there are so many small contributions ("long tail"), it's easy to end up in a high percentile even with a modest commitment.