Control the source

February 17, 2004

MurphysLaw-February 17, 2004

A version control system is indispensable for large development teams, and even on small projects it offers advantages.

If you're the sole programmer on a small project, you may never stop and consider if your build procedures are good enough to ensure that the executable delivered is the one intended. One build directory and a backup made at release time may be all that's required to manage builds and releases.

However, if you're managing a large project with multiple programmers, your build procedures are much more complex. This month I'll talk about building a release—specifically how to use version control, and how to avoid excessive branching within that version control system.

Tight control
An integrated development environment (IDE) enables you to add all of the C files to a list, check a few tick boxes to indicate what memory model and optimization options you want, and select "build" from a menu—viola! the executable is produced. If you have multiple programmers working on the project, how do you know that they all have the same options set in their environment? Well, you don't. Most tools store these options in a configuration file. You could mandate that all developers use exactly the same configuration file, but that's not an appropriate solution because developers often use different options to get, for example, different levels of debug information during development. Instead you need to ensure that there's one configuration file designated as the configuration appropriate to a product release.

It's important that the build is tightly controlled. Using a make file or a script helps you control the build more easily than managing configuration files from an IDE. Even with scripts, the environmental variables can have an impact on the build. For example the location where the compiler looks for include files can sometimes be set in an environmental variable. The release build script should override all such options, and the same user identity should always be used to build the release. Preferably this user should be one created solely for the purpose of building the release software.

If a problem crops up with the build, you may wish to investigate afterwards what options were used. Perhaps some debugging messages appeared on the serial port of the device, leading to a suspicion that some conditional code was unintentionally included in the release. Checking the build script will help. The script, however, could have changed since the build took place. Use of version control will help here, and we'll return to that topic shortly. However, no version control system will tell you which user ran the build, or what environmental variables they had set. Neither can you tell what arguments were passed into the script when it was called. I find it useful to redirect all of the output from the build to an output file that can later be examined to see exactly what command line options were passed to the compiler, what directories were used, and what compiler warnings were generated.

Version control
A version control system (VCS) is vital for managing projects where multiple programmers need to edit the same source files. The basic principle of any good VCS is to enable programmers to apply a lock to a file, so that if another programmer wishes to edit it the second programmer will be warned that someone is already editing that file. When the first programmer is finished with the file he can check it into the VCS, and this new version becomes the master copy of that file. Another programmer can subsequently check it out and add his improvements.

For the files that aren't being changed, the VCS can provide the developer with a read-only copy. By tagging the files as read only, the VCS avoids the case where a programmer edits a file thinking that he has it locked, when in fact he has not.

Each time a file is checked in, the developer is asked to describe the changes. This change history is maintained for each file. It's possible to have this history automatically inserted in the comments in the file, though I find it clutters up the file. The history can always be requested directly from the VCS.

Revision Control System, which has a GNU license, is the most popular VCS on Unix systems. Free versions of this tool are also available for Microsoft Windows. Commercial VCS systems offer GUI clients and web-based interfaces.

Is it important to have a GUI? I've found that most of the operations that I need to do with a VCS are as easily executed from the command line. However, because simple slips can be expensive it's helpful to see your last command on the screen so you know what you've done. A GUI provides a nice way to see an overall view of what files are locked and by whom, but it's hard to beat the command line when it comes to giving instructions. It's often important to write scripts that perform actions on the repository, so even if you have a GUI available, you'll still need to know the command line equivalents. Some tools offer a nice compromise of allowing you to check files in and out via the GUI and also display the command line equivalent of the action performed.

Using a command-line interface makes it possible to check files in and out from a script, which you can then modify to apply other checks to the code. For instance, one company was trying to encourage the use of lint. Lint generates a list of warnings for each source file. The programmers at this company set the VCS to automatically run lint each time a file was checked out and record the number of warnings for that file. When the programmer checked in the file, the number of warnings had to be equal or smaller than when the file was checked out. If the file failed this test it was rejected. This system is a useful approach for projects that are originally developed without lint checks and present problems for removing all lint warnings later.

Solo run
For large teams a VCS is indispensable. On smaller projects with just one or two programmers, it's not as necessary but still has advantages. The problem with a single programmer working in one directory is that there's no distinction between the software being edited and the latest, best code. It's possible to save off a copy of the software at certain points when it's considered stable, though this requires some discipline.

Consider the case where you're trying to fix a bug and this leads you to comment out a line of code as an experiment to see if the code behaves differently in some way. Several tests and experiments later you discover and fix the problem. The code that was commented out is long forgotten. You've inserted a bug that might not get discovered until much later.

If you use VCS, however, you'll have a couple of opportunities to catch this problem. If you didn't intend to alter the file permanently, the damaged file wouldn't be checked out of VCS and would, therefore, never be checked in. The other possibility is that the file was checked out, since other changes within the file might have been legitimately required. In this case the problem will get spotted if you follow this procedure. Whenever a file is ready to be checked in, examine the differences between the new file and the version in VCS. Most VCS products make this a straightforward task. A quick glance will let you approve all of the changes you're about to check in. You could then quickly spot errors, such as the one I described earlier. If you get into the habit of checking the differences just before checking in the code, you'll regularly catch more minor offences.

Using VCS creates a strict barrier between the code intended for the product and the work-in-progress. Most developers find that once they use it on one project, they never want to work without it.

Labeling
As a file is checked in and out, its version number increases and the number for each file progresses independently. Sometimes it's nice to unite a set of files at one point in time by giving them a text label—say "customer_beta_release," making all of those files easy to access as a group.

Some of you might conclude that this snapshot in time means that you can regenerate the release build completely from the source. While this is theoretically possible and will work most of the time, it's no replacement for keeping the exact directory used to build the release, the binaries, and any generated files such as a memory map or symbol file. Each release needs to be uniquely backed up regardless of how good a record is kept in VCS.

Integration
In an effort to impress us, many of the VCS vendors have merged some of their functionality with bug-tracking tools or with the programmer's IDE. I've seen a lot of these integrations, and they always fail miserably. The VCS GUI provides all of the options and commands necessary for managing the files. The other GUI client, for example the bug-tracking tool, provides some commands, but never the full set, and many of the user options aren't visible. Sometimes you're not sure if the options you picked in the VCS client will still be applied if you check in or out from the secondary tool.

In theory it's nice to be able to check out files from the bug-tracking tool and have the files automatically link to that defect—perhaps with some text automatically inserted to the change history. In practice the advantages are minimal, and the risk of making a mistake, like checking out to the wrong directory, is increased because of the more restrictive interface. I believe that these integrations were invented for marketing reasons but have little to offer the developer.

Branching
At times you'll want a source tree to follow two different paths. For example, after you've given a release to your customer, your coworker Anne starts developing lots of cool new features for the next major release. In the meantime, Bob has been given the job of fixing some minor bugs in the current release. He doesn't want to use the same files as Anne because none of her new features should be in the minor bug-fix release. But Anne has already checked in some of those changes before Bob has even started on his fixes. The solution is to branch the source. One branch is the release-one-bug-fix branch and the other is the mainline.

The thing to watch out for when branching is that it's far too easy to check a file out from the wrong branch. If you discover when checking in the file that you've made this mistake, you may have to redo the changes to the correct branch—or worse, you may not notice at all. Branching is a great feature of VCS, but too many branches can cause headaches.

Most VCS tools allow you to merge a branch back into the mainline. Say Anne and Andy are working on one new feature while Bob and Bill are working on a different feature. If each group takes a branch from the mainline for each feature, then Anne and Andy can make changes on one branch while Bob and Bill make changes on the other branch. When one pair has the feature working well enough for general consumption, they can merge the changes back into the mainline.

If the set of files changed by each pair is separate and no other changes are happening on the mainline, merging the files back into the mainline will be seamless and painless. This, however, is rarely the case. So when the final branch is being merged, you'll probably need Anne, Andy, Bob, and Bill all sitting around the same screen reviewing the differences and deciding which pieces to merge and which to modify to a new form that satisfies both features. This process is very error prone.

You can reduce the need for branching by managing your team well. Consider the feature that Anne and Andy are working on. If it's one feature, having two programmers working on it increases the number of interactions between those programmers. If the task can be performed by one programmer in less than twice the time it takes two programmers, you can reduce the total number of developer-hours by assigning just one programmer to the task. If a second programmer's input is necessary, have that programmer give design input at the start and review the changes at the end.

Now that you've reduced the feature to one programmer, that programmer can check out the files to be changed. Since Anne doesn't need to make these files available to Andy for changes via the VCS system, she can simply check the files out with a lock and not check them in until the new feature is complete. The need for a branch and the intermediate checking in of parts of the new feature has vanished. We're left with the issue that no one else on the team can alter the files while Anne has them locked, but that disadvantage is usually outweighed by the advantages of avoiding the branch. The following example explores further how we can reduce the contention for editing the same files within a team.

Now consider a much bigger feature that requires more than two programmers and a lot of time to implement. We can no longer shrink our subteam to one programmer. Another option here is to finish one major feature completely before allowing anyone to begin on the next feature. That way the second feature is built on top of the first feature on the mainline, rather than branching. There may be scheduling reasons for attempting to develop both features in parallel, but I would argue that scheduling reasons that drive programmers to work in a less optimal way are usually bad reasons.

If you have that many programmers queuing up to edit the same code then you have a more fundamental problem. If this number of developers is available, their duties should be divided to match the design divisions in the software. Divide the project into subunits with strict interfaces between them. Get Anne and Andy to work on the communications library, while Bob and Bill work on the user interface. If these two areas can be kept distinct, they can exist as separate software units each with their own VCS area. If the changes to the communications library are dramatic enough that the function-call interface to that library has to be updated, then that constitutes a new release of the library to the "customers," Bob and Bill. Now the transfer of code from Anne and Andy to Bob and Bill is done as an entire library release, rather than as individual source files within a shared VCS area.

This decomposition of the design has other advantages. Forcing a stricter interface on the communications part of the software will make it possible to write a suite of test software that tests the communications in isolation from the user interface—and do not forget to keep your test code under VCS as well.

The need for merging branches is often a symptom of other problems in the system. Always consider merging branches to be a bit like cutting off your own leg—only to be done as a last resort. If you have to do it, a good VCS tool is like a sharp scalpel—it may make the procedure more painless, but you'd prefer to not do it at all.

The branching I'm warning against here are the branches that have to be merged back to the mainline at some point. It's the merging back that's troublesome, not the splitting out. Earlier I gave an example of Bob doing some bug fixes on a release that was already with customers while other developers worked on the next major release. The release-one-bug-fix branch will have a limited lifespan, because you intend to stop supporting release one sometime after release two becomes available. The branch would then be frozen and never merged to the mainline.

The fundamental property of VCS is that it prevents multiple programmers changing the same file at the same time. The fundamental property of a branch is to allow multiple developers edit the same file at the same time. This basic contradiction is the reason why I discourage branching for changes that will need to be merged later.

Parallel universes
Sometimes a branch is formed to support a different platform—maybe different hardware, maybe a different communications protocol. In this case you can build two different releases, one from each branch. If the need to support both platforms is short term and one of the branches can later be frozen, this approach is manageable. It has the advantage of not polluting the mainline with code for a platform that will only be supported for a brief period.

However, if both environments need to be supported in the long term, you'll find yourself applying many changes independently to each branch and that duplication of labor is wasteful. In these cases consider whether both code streams can be supported in one source set with careful use of conditionally compiled code. If some of the files are dramatically different for each version, consider whether an interface can be placed on top of each of the two, or more, platforms and dedicate a few modules to performing the work specific to each platform. All of the common code would then call this newly defined interface and at link time the library that matched the desired platform is used. You then have a library for each platform which implements an abstraction layer. This is another example of design decomposition being used to avoid branching.

Managing the process
Many advocates of branching will consider the previous few paragraphs to be unjustified. Some companies apply a process in which a new branch is created for each work item (bug fix or new feature), and that item is only merged when it has been decided that the mainline is ready for that fix. This divides the integration of the feature into two steps. One involves checking in completed work. The merging step is then a decision taken based on what features/bug fixes the project is ready to release. This may be helpful from a change management point of view, but it still leaves the system vulnerable to situations where two programmers can make changes to the same original file, and those changes need a later merge. Anne is given no warning that the file she has just checked out is already being edited by Bob in another branch. The warning would have been given if Anne and Bob were working in the same branch, but that's no longer the case. Now either Bob or Anne will discover that a merge is necessary at the time the branch is reunited with the mainline. If Anne had known this ahead of time, she might have chosen to defer this work until Bob was finished his.

If you want to make the integration of changes into a two-step process, I would advocate the rule that the only files that can be changed on a branch are the files that have not been changed on another branch.

Niall Murphy has been writing software for user interfaces and medical systems for ten years. He is the author of Front Panel: Designing Software for Embedded User Interfaces. Murphy's training and consulting business is based in Galway, Ireland. He welcomes feedback and can be reached at nmurphy@panelsoft.com. Reader feedback to this column can be found at www.panelsoft.com/murphyslaw.

Loading comments...