We're all superprogrammers or at least perform in the stratospheric top quartile of developers. Maybe Joe Coder down the hall is “average.” But us? We're the tops.
Somehow we make this assessment based on a qualitative “feel” driven more by ego than hard data. Nothing costs more than firmware, yet no other industry manages productivity so poorly. That used car salesman with the crooked teeth and untrustworthy smile watches his sales numbers like a hawk. The widgets we design are manufactured in a facility that tracks defects, rework, worker productivity and more, usually with graphs posted in the halls so everyone knows just how well things are going.
But us? We'll work hard. Real hard. It'll be done when it's done.
Most classical software estimation techniques naively yet wisely expect developers to design the software, decompose it into modules, and then figure the number of lines of code for each module. Then we solve one of the most interesting equations extant:
Schedule = (lines of code) / (lines of code produced per month)
when no one has a clue as to the size of the second term. Is it 50 lines/month? 500? 5 million?
At this point the usual chorus waltzes in from stage right to complain that “lines of code” (LOC) is a very poor metric, subject to huge variations depending on styles and more. I'll sing along, too, brother!
Academics, especially, fault the LOC metric and generally advocate some form of function points as a replacement. But, in industry, no one uses functions points, so if I tell you a routine will be 50 FPs you'll have no gut feel for the module's complexity. If, on the other hand, I tell you it's 2500 LOC you'll have that visceral understanding of its complexity. If tomorrow a charismatic leader took over the country and told us all gas mileage calculations will be figured in furlongs per dram, well the numbers will be just as accurate as liters/kilometer or gallons/mile but will baffle consumers. Familiarity has value.
Oddly, the literature is full of conversions between function points and LOC. On average, across all languages, one FP burns around 100 lines of code. For C++ the number is in the 50s. So the two metrics are, for engineering purposes at least, equivalent.
Sometimes the Function Point debate is merely FUD tossed in to avoid the real issue ” that of measuring something, anything to get some kind of assessment into our productivity numbers. There are many ways to measure productivity; two companies I know quite literally monitor network traffic to log keystrokes per hour. Those engineers write a lot of long-winded comments.
Why take such data? If we're lucky enough to have the time to decompose a problem into subunits and LOC before estimating, then the numbers give us a reasonable shot at coming up with a realistic schedule. (For an alternative approach to estimation see “The Middle Way“). Barry Boehm has shown us that the schedule is proportional to the number of thousands of lines of code raised to some exponent. His Constructive Cost Model (COCOMO) gives various exponents and scaling coefficients to predict development time of various kinds of software.
But even when we're just furiously coding to a capricious deadline tracking LOC/hour tells us, when measured over the long term, if we're improving, or just stagnating at our efforts.
This is a whole new world, folks, one that was almost inconceivable just a few years ago. The Wal-Martization of software development means Joe Coder sitting comfortably in downtown San Jose is competing directly with a very smart, highly trained developer a half a world away, thrilled to work for just a fraction of Joe's salary. If we're not constantly becoming more productive we'll be retreads on the grinding wheel of global capitalism.
Track LOC/hour, measuring total time invested from the beginning of the project till the day it's finally done. That encompasses a lot more than coding and debugging but all of the other development activities are real costs more or less proportional to the size of the project.
Track project size as well. You might crank 200 LOC/month on a 100k LOC project, but your rates will be much higher for smaller systems.
And log bug rates. It's easy to write lots of buggy code fast.
Do you track any sort of programmer productivity figures? Why not?
Jack G. Ganssle is a lecturer and consultant on embedded development issues. He conducts seminars on embedded systems and helps companies with their embedded challenges. Contact him at . His website is .<>Reader Response >
The following STL one-liner remove_copy_if(a.begin(),a.end(),back_inserter(b),mem_fun(&A::delay));
is almost worth one Function Point, yet it costs the writer enormously on his LOC account balance. Those among us whorefactor by removing dead code achieve negative LOC. No wonder softwares are getting bigger, buggier, and slower.
– David J. Liu
Jack Replies Refactoring should be a pretty constant cost across projects within anorganization. If you have a project where you're tossing out code at a furious rate, there's something else wrong with the project. Even if one embraces eXtreme Programming, when starting a new project you have 0 lines of code. At the end there are x lines. What happens to go from 0 tox is what we need to measure.
A side note on your software cost estimation: To assume that developers are allowed to virtually design thesoftware to the point of knowing functions and estimated LOC before creating a schedule is a bad assumption. Theschedule is often created by gut feel, artificial deadlines that must be met, and management changes (for better orworse, but usually the latter).
Developing embedded products for government applications often means a bid and proposal stage, where you not only includethe estimated costs (which becomes fixed), but also the estimated schedule (which also becomes fixed). It is rare whenwe get a chance to actually create a rough design of the software where we can then estimate functions, LOC, andschedule. More often than not, a number is picked because it sounds good, or “that's about what it took last time.”
Management is at fault for expecting any software engineer to estimate hours, costs, and schedule without doing any moredesign than back-of-the-envelope estimates. To them, software is intangible, and something that is just done. Themechanical groups get to do preliminary designs, the electrical groups get to do some estimated block diagrams. Often wedon't have any more hours assigned to bid than to create the simplist of flowcharts to estimate from.
And, of course, if we blow our estimates, it's our fault. If we try to design more of it to create a more correctestimate, we get blamed for spending too much up-front time on a contract we might not win.
Damned if you do, damned if you don't.
– John Patrick
Jack, did you ever count LOC for your employees? If so, what did you do with the numbers? If not, why not?
If they went down or up, how did you modify your behavior? Do you punish people with lower LOC rates?
It seems to me that tracking lines of code is analogous to tracking the number of people a used car saleman talks with,rather than his sales. With both LOC and number of people, typically the more the better, but neither is a definiteindicator of performance.
– Kyle Heironimus
I have to disagree with you on two points. Fist, very few projects start with 0 lines of code. Managers hate to throwaway code they have already paid for, and often insist on using it even when it would be cheaper to start over. Second,refactoring is not constant across projects. How much refactoring you should do depends on how well your spec and yourdesign match what your coustomer really wants.
– William Ames
Well, actually LOC based estimation is really usefull when applied to particular set of parameters.  LOC/Day figures, if your team is same as the last one and figures are derived from the last project you're pretty close.  Phase wise data – Take the total LOC of your last project w.r.t to development phase and entire lifecycle, you'll get two figures and so you'll have two choices (a) An accurate estimate of development phase or a reasonable figure for the current lifecycle.
 What's the basis for choosing productivity figures, afterall a team has performers as well as liabilities and newbies? So, its better to have multiple sets of LOC figures for different team configuration.
– S Roy
Something obvious, but till software design and “manufacture” is as automated as say gearbox design, the numerator in the formula is also imprecise. We, particularly in embedded need more “holistic” concepts and tools for this “manufacture” to be implementable.
– kalpak dabir
Sitting at the other half of Joe's world isn't easy either! Situation there is simply as described by John Patrick. Bid and proposal is aconstant practice not only in govt. organizations, but also in privately held companies looking to outsource.”Gut Feel” and “Winning Bid Estimate” are the only measures that I see working today in most cases.Yes, LOC has been a useful metric for projects on the same domain with the same set of people. In fact we estimated aproject with the same people working on the same domain perfectly by the day! Even our model could accomodate change inestimate based on a change in requirement. I am talking of team productivity since estimates cannot depend only on anindividual.But most software projects have a lot of unknowns – reason for fickleness.I have not yet seen wideband Delphi in practice because the manager hasn't even time to talk to the experts during thebid!
I would like to again state what I commented on “Analogies for Software Development”, stating “we are talking of tools(those futuristic) that can create tools (of the present)”. Unless we have such established visualizable tools (thatcould also give & track metrics) for software development that filter out bursts in productivity – *Estimates will remainfickle, whether based on LOC or FCs*.
Answer to your question – yes we do track programmer productivity figures in LOC (mind – tested code), but for the sakeof tracking it!
– Saravanan T S
If you have not been tracking LOC/MM etc. and realize you need to, it is pretty simple to check each baselineof code out into a directory structure and run a tool like C-DOC, or a LOC counter util. Then one can look at the dateson baselines and features added, items refactored, and LOC changes for a better estimate at what something took to do. This does not work so well when changing core processors or from C to C++, or starting from scratch, but can still givean quick rough idea of where the effort and difficult spots are in something that is an improvement or somewhat similarproduct, and can let one play what if with things like adding staff to an effort with a tool like REVIC, etc.
– William Murray
Several Thoughts on Productivity….
Do I get to count libraries I've written for a prior project in this project's LOC? I suppose you'd have to count the hours spent on a lib over its useful lifetime to properly account for the effort. . . but by definition, you can't do that as a running metric of productivity because you don't know the demoninator til it's over.
And I must agree that LOC is a lousy metric except when applied to the same person using the same language — and all this in reference to a baseline and a coding standard (controlling white space, for example, and line complexity).
LOC is way too loose to be called a metric of anything other than the number of 0x0A characters in the source files.
So, okay, how do you measure productivity absent counting LOC? If you have good requirements, I suppose you could measure satisfied requirements per time worked. This would be a fine metric for evaluating a done thing, but likely not so good for projects underway which often do not reach critical mass (i.e., anything actually doing anything you can measure) until they are quite far along.
I'd have to say that a great deal more consideration is due to the subject of software productivity. We are dealing with very abstract concepts here – not beans (LOC) to be counted. Until we can identify and count the abstract concepts, we aren't measuring anything meaningful. Just like certain concepts could not be computed before the calculus, we, too, need a new math for measuring software.
Counting lines or bytes is only meaningful when estimating the paper consumed by the printer or the size of the flash memory you'll need to store the executable.
– Daniel Singer