February 2010 – LSD::RELOAD

For the last couple of years I’ve been using subversion on all the commercial software projects I’ve done. At joost and after that at the BBC we’ve usually used long-lived stable branches for most of the codebases. Since I cannot find a good explanation of the pattern online I thought I’d write up the basics.

Working on trunk

Imagine a brand new software project. There’s two developers: Bob and Fred. They create a new project with a new trunk and happily code away for a while:

Stable branch for release

At some point (after r7, in fact) the project is ready to start getting some QA, and its Bob’s job to cut a first release and get it to the QA team. Bob creates the new stable branch (svn cp -r7 ../trunk ../branches/stable, resulting in r8). Then he fixes one last thing (r9), which he merges to stable (using svnmerge, r10). (Not paying much attention to the release work, Fred’s continued working and fixed a bug in r11) Bob then makes a tag of the stable branch (svn cp brances/stable tags/1.0.0, r12) to create the first release.

QA reproduce the bug Fred has already fixed, so Bob merges that change to stable (r14) and tags 1.0.1 (r15). 1.0.1 passes all tests and is eventually deployed to live.

Release branch for maintenance

A few weeks later, a problem is found on the live environment. Since it looks like a serious problem, Bob and Fred both drop what they were doing (working on the 1.1 release) and hook up on IRC to troubleshoot. Fred finds the bug and commits the fix to trunk (r52), tells Bob on IRC, and then continues hacking away at 1.1 (r55). Bob merges the fix to stable (r53) and makes the first 1.1 release (1.1.0, r54) so that QA can verify the bug is fixed. It turns out Fred did fix the bug, so Bob creates a new release branch for the 1.0 series (r56), merges the fix to the 1.0 release branch (r57) and tags a new release 1.0.2 (r58). QA run regression tests on 1.0.2 and tests for the production bug. All seems ok so 1.0.2 is rolled to live.

Interaction with continuous integration

Every commit on trunk may trigger a trunk build. The trunk build has a stable period of just a few minutes. Every successful trunk build may trigger an integration deploy. The integration deploy has a longer stable period, about an hour or two. It is also frequently triggered manually when an integration deploy failed or deployed broken software.

Ideally the integration deploy takes the artifacts from the latest successful trunk build and deploys those, but due to the way maven projects are frequently set up it may have to rebuild trunk before deploying it.

Every merge to stable may trigger a stable build. The stable build also has a stable period of just a few minutes, but it doesn’t run as frequently as the trunk build simply because merges are not done as frequently as trunk commits. The test deploy is not automatic – an explicit decision is made to deploy to the test environment and typically a specific version or svn revisions is deployed.

Reflections

Main benefits of this approach

Reasonably easy to understand (even for the average java weenie that’s a little scared of merging, or the tester that doesn’t touch version control at all).
Controlled release process.
Development (on trunk) never stops, so that there is usually no need for feature branches (though you can still use them if you need to) and communication overhead between developers is limited.
Subversion commit history tells the story of what actually happened reasonably well.

Why just one stable?

A lot of people seeing this might expect to see a 1.0-STABLE, 1.1-STABLE, and such and so forth. The BSDs and mozilla do things that way, for example. The reason not to have those comes down to tool support – with a typical svn / maven / hudson / jira toolchain, branching is not quite as cheap as you’d like it to be, especially on large crufty java projects. It’s simpler to work with just one stable branch, and you can often get away with it.

From a communication perspective it’s also just slightly easier this way – rather than talk about “the current stable branch” or “the 1.0 stable branch”, you can just say “the stable branch” (or “merge to stable”) and it is not ever ambiguous.

Why a long-lived stable?

In the example above, Bob and Fred have continued to evolve stable as they worked on the 1.1 release series – for example we can see that Bob merged r46,47,49 to stable. When continuously integrating on trunk, it’s quite common to see a lot of commits to trunk that in retrospect are best grouped together and considered a single logical change set. By identifying and merging those change sets early on, the story of the code evolution on stable gives a neat story of what features were code complete when, and it allows for providing QA with probably reasonably stable code drops early on.

This is usually not quite cherry-picking — it’s more likely melon-picking, where related chunks of code are kept out of stable for a while and then merged as they become stable. The more coarse-grained chunking tends to be rather necessary on “agile” java projects where there can be a lot of refactoring, which tends to make merging hard.

Why not just release from trunk?

The simplest model does not have a stable branch, and it simply cuts 1.0.0 / 1.0.1 / 1.1.0 from trunk. When a maintenance problem presents itself, you then branch from the tag for 1.0.2.

The challenge with this approach is sort-of shown in these examples — Fred’s commit r13 should not make it into 1.0.1. By using a long-lived stable branch Bob can essentially avoid creating the 1.0 maintenance branch. It doesn’t look like there’s a benefit here, but when you consider 1.1, 1.2, 1.3, and so forth, it starts to matter.

The alternative trunk-only approach (telling Fred to hold off committing r13 until 1.0 is in production) is absolutely horrible for what are hopefully obvious reasons, and I will shout at you if you suggest it to me.

For small and/or mature projects I do often revert back to having just a trunk. When you have high quality code that’s evolving in a controlled fashion, with small incremental changes that are released frequently, the need to do maintenance fixes becomes very rare and you can pick up some speed by not having a stable branch.

What about developing on stable?

It’s important to limit commits (rather than merges) that go directly to stable to an absolute minimum. By always committing to trunk first, you ensure that the latest version of the codebase really has all the latest features and bugfixes. Secondly, merging in just one direction greatly simplifies merge management and helps avoid conflicts. That’s relatively important with subversion because its ability to untangle complex merge trees without help is still a bit limited.

But, but, this is all massively inferior to distributed version control!

From an expert coders’ perspective, definitely.

For a team that incorporates people that are not all that used to version control and working with multiple parallel versions of a code base, this is very close to the limit of what can be understood and communicated. Since 80% of the cost of a typical (commercial) software project has nothing to do with coding, that’s a very significant argument. The expert coders just have to suck it up and sacrifice some productivity for the benefit of everyone else.

So the typical stance I end up taking is that those expert coders can use git-svn to get most of what they need, and they assume responsibility for transforming their own many-branches view back to a trunk+stable model for consumption by everyone else. This is quite annoying when you have three expert coders that really want to use git together. I’ve not found a good solution for that scenario; the cost of setting up decent server-side git hosting is quite difficult to justify even when you’re not constrained by audit-ability rules.

But, but this is a lot of work!

Usually when explaining this model to a new group of developers they realize at some point someone (like Bob) or some people will have to do the work of merging changes from trunk to stable, and that the tool support for stuff like that is a bit limited. They’ll also need extra hudson builds and worry a great deal how on earth to deal with maven’s need to have the version number inside the pom.xml file.

To many teams it just seems easier to avoid all this branching mess altogether, and instead they will just be extra good at their TDD and their agile skills. Surely it isn’t that much of a problem to avoid committing for a few hours and working on your local copy while people are sorting out how to bake a release with the right code in it. Right?

The resolution usually comes from the project managers, release managers, product managers, and testers. In service-oriented architecture setups it can also come from other developers. All those stakeholders quickly realize that all this extra work that the developers don’t really want to do is exactly the work that they do want the developers to do. They can see that if the developers spend some extra effort as they go along to think about what is “stable” and what isn’t, the chance of getting a decent code drop goes up.

Month: February 2010

Using long-lived stable branches