Although Agile methodologies have been in use since the late 1990's itis still rare to find more than anecdotal evidence for how well theyreally work. In evaluating whether to use agile methods, engineers havevery little on which to base a judgment unless they happen to have somedirect knowledge of an agile software project.
This paper describes a developer-led conversion to agile methods where thesoftware team themselves recorded detailed data throughout the project.They used a very simple home-made unit test framework for developmentin C. Since the close of that project the senior members of thesoftware team built a better unit test framework intended for doingagile software development in C. This paper gives a brief overview ofthe Catsrunner framework (CATS = C Automated Test System).
More analysis has been done on the data collected during theproject, and some additional work has been completed to compare theteam's results with the software industry in general. The purpose ofthis paper is two-fold:
1) close the gap inquantitative understanding of Agile methods for embedded softwaredevelopment (at least, as much as can be done given that the scopecovers only one project).
2) describe a test frameworkfor embedded C that was developed based on our agile experience.
The Grain Monitor System (GMS) project entailed building a ruggedized,mobile spectrometer initially for farming applications. Usingspectroscopy principles the technology could quantify the components ofa material, e.g. how much protein is in a wheat sample.
The team size varied between 4 and 6 members, and development wenton for three years. The initial field units were ready about 6 monthsinto the project but so much was learned in the process of deployingthem with a partner farm equipment company, that the team continued onto support further work and implement many more new features.
At the start of the project there were many unknowns and technologyrisks that made it impossible to use waterfall techniques for thiswork. They include:
1) New scientific algorithmto decode near infra-red signals from grain samples
2) Early customer for new MPC555microprocessor
3) First use of this operatingsystem by the team
4) First customer of operatingsystem port to 555
5) New prototype near infra-redsensor hardware
6) Early algorithms use toomuch MIPS for any known microprocessor
7) Must handle extremes oftemperature, vibration
8) Very low-noise circuitryrequired
9) No experience with CANbusprotocol
10) CAN bus protocol standardnot in finalized form
11) Difficulty getting earlyMPC555 chips
12) Team lacked experience inmultitasked apps
The team used generic agile practices at first ” strong unit tests,iterations, common ownership of the code ” and transitioned to fullExtreme Programming methods during the project. As with the vastmajority of agile teams, this one didn't implement every practice fullyor flawlessly.
This was a “green field” project. Using the practices described herefor legacy software would not be easy but might be worthwhile,especially if it is a safety-critical application. The advantages ofagile methods for safety critical applications are covered in anotherpaper  . Whether to back-fitagile unit tests to a legacy code base is a question that can only beanswered case by case, and is outside the scope of this paper.
Data Gathering Methods
My role in the project was as Software Technical Lead. As such Icompiled a list of all the defects that were found in integration testor later stages, including any found after delivery to our customer(the partner company that was conducting field trials of the GMS unitson real farm machinery). For each defect I wrote up a root causeanalysis at the time the defect was resolved.
We had independent testers engaged for the later part of theproject, but for most of it, the software team delivered their codedirectly to internal users and to the partner company. Labor data wasreported weekly by the team members themselves, and tracked by theteam. The company's official time records were not available to us, andweren't broken out in categories useful to us. Data on source code sizeand cyclomatic complexity was obtained using C-Metricv. 1.0 from Software Blacksmiths.
At the end of three years of development the product was fully readyfor manufacturing. There had been a grand total of 51 software defectssince the start of the project (seeFigure 1, below ). There were never more than two open defects atany one time throughout the project. The team had produced 29,500 (noncomment) lines of tested, working embedded code, plus several sets ofrelated utility software that is outside the scope of this paper.
|Figure1. Defects and software releases over three years|
The embedded GMS C code was equivalent to 230 function points (perthe conversion given here.The team's productivity in the first iteration was just under threetimes the industry norm for embedded software teams. The team becameincreasingly adept at delivering code on time according to thecommitments made at the start of each iteration. Early iterationlengths varied from two to eight weeks but two weeks became typical,and toward the end of the project, a new release could be turned aroundin one day.
Team size varied during the project from 4 to 6 people, and is shown inthe below staffing profile, in Figure2, below.
|Figure2. Staffing profile for the GMS project|
The team used the following set of categories in Figure 3 below to track laborthrough the first two years of the project. Year three's labor was nottracked, but there is no reason to think it would vary much from therest.
|Figure3. Labor tracking categories used for GMS software|
The labor distribution charts below in Figure 4, Figure 5, and Figure 6 give a view into theactivities of the first iteration, first full year (including firstiteration), and the second full year. Note that the labor in iteration1 reflects the activities of a new team that has not worked togetherbefore, e.g. much time spent working out team processes.
|Figure4. Team's labor distribution for first iteration|
|Figure5. Team's labor distribution for first year of the project|
|Figure6. Team's labor distribution for second year of project|
The code base for the GMS embedded software grew from zero to a rawlines count of 60,638 (see Figure 7,below ). C-Metric does a count that omits blank lines, commentlines, and lines with a single brace “}” on them. That filtered countof “effective source lines of code” (ESLOC) was 29,500 for the softwareat the end of the project. Short header files with long preambles, andlengthy change history blocks in all files is mainly the cause for thehigh percentage of non-code lines.
|Figure7. Growth of the code base over three years|
It isn't possible to directly compute a figure for labor per line ofcode for two reasons: Much of the coding was change activity, not netadditional code; and the team worked on utility applications to letusers create and load calibration tables, exercise the hardware fortest, or import new algorithm test data into our test harness. Thelabor for those utility code bases was not broken out separately.
Early in the project, before changing to Extreme Programming methods, theteam had difficulty delivering by a target date. There aren't anyfigures to illustrate this. One of the reasons that Extreme Programmingseemed appealing is its practice called “the Planning Game”, whichbrings developers and management into partnership to negotiate thedeliverables for each iteration.
The early use of the Planning Game gave us some difficulty. Thatexperience is described in detail in an earlier paper  . Once the team mastered thePlanning Game technique, their releases were never more than a coupledays late unless there was some drastic unforeseen circumstance(happened only once).
The defect rate remained fairly constant over the development period,despite the growing size of the code base. The team averaged about 1.5defects per month. The open bug list never held more than two items allthrough development. In the belowFigure 8 , defects are grouped according to the quarter in whichthey were reported.
|Figure8. Absolute number of defects per quarter|
Because the defect rate stayed low, independent of the code size, Iconclude that the team's techniques of software development wereeffective at handling complexity. C-Metric was used to take a look atcyclomatic complexity. Four of the later releases were analyzed and theresult was an average cyclomatic complexity of 6 or 7 for each of thereleases. For more detail on this metric in our code, refer to .
Although we used agile development, the software still had phasessuch as detailed design, coding, test, and so on. In agile there is atight loop of doing requirements, design and coding all in shortincrements of time so that you can re-run your unit tests about every10 to 30 minutes.
Illustrated in Figure 9 below is a look at the phase wherebugs were inserted and where they were found. This information comesfrom the root cause analysis of each defect. More discussion on thenature of the defects found is given in .
|Figure9. Defect life span, year 1 of project|
It should be mentioned that the numerous software releases showntoward the end of the project (in Figure1) do not represent panicky bug fix activity. Rather this wasthe software team creating custom releases to help electrical andoptics engineers to isolate difficult system-level problems that onlyappeared when the whole system was running. The software was verystable and the team could deliver well-tested releases on a 1-dayturnaround.
Comparison With Software Industry
I was able to make use of three industry sources of data for comparisonof this team's performance. The first two are covered briefly sincethey will not be generally available to readers for measuring their ownteam's capability. The third (the data from Capers Jones) is somethingthat anyone can make use of if their code can be characterized in termsof function points. This paper will therefore discuss that in somedetail.
SEER SEM Estimation Data .Before the start of the project, our management considered anestimating tool called SEER SEM from Galorath. Consultants from that company did an estimate as part of demonstratingthe tool. It gave a breakdown of staffers needed for each waterfallstyle phase and the hours that would be used by each, all based on afigure for lines of code at completion, which they got from me.
The one thing the prediction software could not foresee is thecompleted size of the application. The point is that with this data Icould figure out the value for ESLOC/developer-hour that their databaseuses for this type of project. It was 1.2 ESLOC/hour. That's for fullytested, working embedded code in C. When iteration 1 was complete, thenumbers showed the team had delivered 3.5 ESLOC/hour, or 292% of theindustry norm, as given by Galorath's database. 9.2.
QSM Industry Data. QSMAssociates Inc. also supplies software planning tools, andused to offer a free service via their website to compare your team'sproject data with their database of thousands of projects. I took theopportunity to input data for our iteration 1, such as the number ofpeople on the team, duration, lines of code delivered, defects found,etc. The result was that the “Productivity Index” they calculated forthe GMS Iteration 1 ranked us in the 90th percentile! This index, asthey compute it, covers code complexity (based on size), schedule,efficiency, effort, and reliability.
Capers Jones Industry Data. Capers Jones, a principal at Software Productivity Research hasaccumulated data from a wide variety of software projects, expressed inFigure 12 below .
|Figure10. Software defect data from Capers Jones with GMS data point added|
The only thing necessary for anyone to compare their team's datawith the information from Capers Jones is to be able to state theirdefects per function point. We did not count function points in ourproject. Knowing the ESLOC, you can simply look up a conversion tofunction points on the SPR website. Seehttp://www.theadvisors.com/langcomparison.htm
The data in Figure 12 canbe expressed in terms of defects delivered to the customer. The “BestIn Class” software teams had 2.0 defects per function point (FP), and adefect removal efficiency of 95%. Defects to customer = Total FP *defects per FP * (1.0 – defect removal efficiency).
|Table1. Defects delivered to customer per Capers Jones, tabular form|
Let's look at how the “Best In Class” teams would perform if theircode was the same size as GMS, that is, 230 function points. Theirtotalnumber of defects would be 230 * 2, or 460. Then they'd remove 95% ofthose: 460 *(1.0 ” 0.95) = 23 per Table1, above . They would deliver 23 defects to the customer. The GMSembedded team delivered 21 bugs to their customer, according to Figures 9 – 11 .
How to Achieve These Results forYour Team
Lean Thinking is the fundamental concept underlying Agile softwaredevelopment practices  . Thetwo essentials you must have in place to succeed with this approachare:
1) You must match the amountof work undertaken to your capacity
2) You must mistake-proof thesteps you use to produce the work
The first item is satisfied by using agile iteration planningtechniques, and is outside the scope of this paper. For a developer-ledagile conversion, regulation of the work stream is often very difficultto achieve because management must support it ” or at least tolerateit. The second item is covered by a previous paper on agile testtechniques for embedded software .
The remaining sections of this paper discuss the most powerful wayof mistake-proofing your software:; the use of an appropriate testharness to efficiently catch bugs early.
Dual-platform Unit Testing as Key
For embedded software the hardware represents an extra dimension thatmust be addressed in the testing strategy. The GMS team built all thecode as “dual target” software. It could run on a desktop PC as well ason the target MPC555 microprocessor, through the use of compile-timeswitches.
This strategy allowed the software to be tested first on the PCwhere hardware was stable. Timing would be incorrect but the logiccould be fully exercised. Other compile-time switches would bypasssensor hardware and inject dummy grain data to drive computations.
The team's unit tests consisted of a conditionally-compiled “main()”within each file that held a set of related functions. This 'testermain' had calls to each function in the module, often multiple calls tothe same function but with parameters intended to test boundary cases.
There were perl scripts to execute the 'tester main' routines of allthe modules and report the pass-fail status of all the tests. Thissimple test framework had tests designed to run on both platforms, andwas used throughout the duration of the project.
Catsrunner ” A Better Technique
The experience gained via the simple unit test framework of GMS led, afew years later, to the development of Catsrunner and CATS (C AutomatedTest System) by the partners at Agile Rules, some of whom were on theGMS project. Catsrunner has a more consistent way of inputting testparameters, and its output is easier to interpret. It allows separationof test code from production code. Also it behaves exactly the same onthe PC and the target platform.
In short, it's the test framework we wish we'd had time to writeduring the GMS project. Catsrunner is a C software unit and acceptancetesting suite based on CATS (seeFigure 11, below ). CATS is a cross-platform testing frameworkfor C, especially designed to work well in embedded and multi-platformenvironments. Catsrunner provides the wrapper that calls the test andreports the results. Catsrunner is open source software released underGPL 2.0. See  fordownloading the Catsrunner software.
|Figure11. Top: Catsrunner executing on Host, Bottom: executing on Target|
Catsrunner does three basic things:1) Reads, from the host PC, a list of unit tests to be run; 2) Runs each unit test, in turn, and3) Sends the results of eachtest back to the host PC The middle step ” running each unit test ” canoccur on either platform. Platform is determined by environmentvariable settings when building the Catsrunner executable. The presentversion of Catsrunner runs on a PC and on an ARM7 core.
Catsrunner calls CATS, which looks up the name of the test in atable holding pointers to the testing functions. At the heart of theCATS unit testing framework is an array of structures associating thenames of functions with pointers to those functions.
When the name of a test function and its input parameters are passedto CATS, it looks up the function name in this array. When Catsrunnerexecutes on the target hardware, it must communicate with the host toknow which test to run next, and then to store the result of the test.
A module named “Hostops” is part of Catsrunner, and in the case ofthe ARM7 target, hostops makes use of the Angel background debugmonitor to accomplish the data transfer to the host. A user wishing toport Catsrunner to a new target will have to create a version ofhostops that makes use of its I/O capabilities to do the equivalentdata transfers.
A Catsrunner Test Examined
Catsrunner's approach to testing divides all the software into twocategories: software that is inherently platform-independent, andsoftware that “touches hardware”. Platform-independent code can easilybe run in an automated fashion but when software drives a motor orturns on a LED, the result of that test cannot be captured withoutspecial test hardware (which was out of the question for us).
When testing hardware-related code on the target platform, we usedmanual tests. That is, the test code is contained in the unit testfile, but when testing actual hardware we'd step through it by hand towatch the behavior of the hardware. Catsrunner uses this philosophy, asillustrated in Figure 12 below (“pure software” indicates platform-independent code).
|Figure12. Unit test concept for software that drives hardware|
When testing hardware-related software on the PC, we'd captureoutputs that would otherwise go to hardware, and the tester code couldvalidate their correctness. For sensor input data, we'd just bring indummy data in order to let the software continue on.
These practices are reflected in the code by having some moduleswith layered directories. For a LED module, the main directory wouldcontain the platform-independent parts of the code and be called “led”.Below that are directories for each platform, in this case ARM and PC,which contain functions having the same names which are implementeddifferently on the platforms.
The linker will bring in the platform-independent code from “led”directory, and only one of the code sets from the lower directories,either “ARCH_ARM ” or “ARCH_PC “. The prefix “ARCH ” indicatesarchitecture-specific software. The directory layout is illustrated in Figure 13 below.
|Figure13. Directory of LED driver|
It would seem that manually stepping through hardware-related codewould slow development unacceptably. In practice, the GMS team found itto be no problem because those parts of the code changed little oncethey were written, and they were well encapsulated. (The team used amore primitive test framework that had this same philosophy for testinghardware-related code.)
This has been a brief introduction to the Catsrunner agile testframework. A complete user manual with much more detail is availablewith the open source download package.
The GMS team was a group of ordinary developers who achieved highlyextraordinary results through the power of an idea. The team did notwork excessive hours. Most needed to learn some significant skill onthe job. They didn't follow the agile practices 100%, and didn't haveany outside coaching or mentoring in how to use agile developmentpractices.
It has been said that in order to do Extreme Programming you need ateam of hand-picked gurus. Not so. All you need is people empowered togovern their work. The powerful idea is simply this: If you make iteasier to find bugs than it is to create new ones, you have thepossibility of producing bug-free software.
Bug-free software lets you build trust with your sponsors andcustomers, spend more of your time productively (troubleshooting iswaste!), and stay in control of your project. These results are withinreach for every software team whose management will support sufficientempowerment.
Nancy Van Schooenderwoert of Agile Rules / XPEmbedded, has extensiveexperience in building large-scale, real-time systems for flightsimulation and ship sonars, as well as software development forsafety-critical applications such as factory machine control andmedical devices.