Important in almost all embedded applications- but absolutelycrucialfor code written for SoC designs and for programs which must bedelivered over slow networks – is the ability to produce codethat isconcise, small and fast, and with minimal start-up time.
This requires that conceptually constant data be optimized and thatany internal preprocessing (pre-computing, setup) of them be moved fromruntime to compile time. Unfortunately, these requirements makesoftware maintenance more involved and more susceptible to humanerrors.
To optimize and “constant-ize” data and to automate the maintenance,an external tool is required, as programming languages are notexpressive enough. A good preprocessor utility may be the tool ofchoice, and provide additional perks.
Example: displaying status messageson the LCD
As a very simple yet realistic illustration, consider a task ofdisplaying pre-defined up-to-twenty-character “status” messages on theLCD according to some status bit array.
A message is displayed for, say, 5 seconds, provided that thecorrespondingbit is set, and then is replaced by the next (modulo the number ofstatus bits) message with a status bit set.
A “naïve” implementation say, in C, would probably define aconst array of pointers to const strings. The ordinal number of thestatus bit would also be the index into the array of pointers, so thecorresponding message string can be accessed. There are two problemswith this idea.
The first is memory consumption. Assuming four-byte pointers, we'vegot five bytes of overhead per string (a pointer and the terminatingnull character). For a twenty-byte payload, it's 25%.
If most messages are shorter than 20 characters, it's even more. Forexample, if the average length of the messages is 10 characters, wehave 50% overhead. That's not counting any data alignment overhead.Dummy pointers corresponding to undefined status bits have not beencounted either.
The second problem is maintainability. If, for some reason, a bitthat corresponds to some status had its number changed from 15 to 4,the array of pointers has to be modified accordingly. If a previouslyundefined bit gets defined, or a previously defined bit is no longer,then again, the array of pointers needs to be updated.
Data optimization and growingmaintainability problems
To address the first problem, we can choose a different data structure.
Let's have a (large) string comprising all message stringsconcatenated together (and even without the terminating null).
Let's further have an array of indices into this large string suchthat the nth element of the array is the index to the beginning of thenth message string. The last (extra) index in the array is the lengthof our large string. The length of the message n to display is thedifference between indices n+1 and n: no information is lost.
Typically, two bytes would be enough to hold an index. Since thelarge string has no terminating nulls, the overhead of this datastructure is 10% (down from 25%) or, with average counting, 20% (downfrom 50%), plus a fixed two-byte expense on the last index.
Great. However, on the maintainability front things just got muchworse: Maintaining the array of indices is very error-prone. Even if itwas not, it would still be yet another thing to maintain, thank youvery much.
Preprocessor to the rescue
A zero-maintenance solution to both the original and the newly createdmaintainability problem is to define a status bit number and thecorresponding text message in a single statement, like so:
DefineStatus(3, “my fault”)
DefineStatus(8, “mea culpa”)………………………
We want these statements to execute at compile time and produce thesource code with the bit array definition and the constant datastructure we invented previously.
To achieve this goal, we are willing to do an extra work (once!) anddescribe, to some conversion tool, how to execute those statements andproduce the C source snippets that we want. Guess what? We are talkingabout some preprocessor and about writing macros for it.
To reiterate: we naturally identified a need in a preprocessor inour effort to reduce (to zero if possible) error-prone maintenancework, especially in cases of optimized data structures.
A programming language may already have a built-in preprocessor ofits own, as is the case with C and C++. If such a preprocessor existsand is expressive enough for the tasks, that's wonderful. Otherwise,we've got to use an external preprocessor.
Some tasks for the preprocessor todo
Here are some of the tasks where a good preprocessor can be of greathelp:
Tabulatedfunctions. A hard-to-compute function can be tabulated forfaster performance. Tabulating at compile time removes the tablegenerating code from the final build. Additionally, the resulting tableresides in ROM, which saves precious RAM and, in some applications, theneed to test its integrity.
Preprocesseddata. More generally, any data set may call for a processingalgorithm that requires one-time preprocessing of the data set. Some ofthe examples include lookup tables, perfect hashes, dictionary trees ofall sorts etc.
When the data set is constant for the project, so is its associatedpreprocessed (derived) data. In this case, the derived data can bepre-computed at compile time. As with tabulated functions, thechallenge is to find a tool capable of sufficiently complexcompile-time processing.
Loop unrolling. A decision to unroll a time-critical loop should not be left to thecompiler's heuristics: they have no knowledge of time criticality inyour application. Unrolling a loop manually eliminates a runtimevariable ” the loop counter ” but creates a maintainability challenge(and an implied constant parameter, the number of repetitions of theloop body).
Projectconfiguration management. In a context of a project family, agood architecture for software project configuration management isproject-independent code processing project definition data, the latterbeing of course constant for a given project. The project-dependentdata have to be shared across disparate languages (e.g., to a C sourceand to the linker command file)
Dedicated code generators vs.preprocessors
An extreme case of a preprocessor is a dedicated tool working for aspecific data set. For instance, the macros in our example of statusmessages can take the form
3 “my fault”
All the smarts of converting this to the C source we want are in thetool itself; the data definition has no trace of what needs to be donewith it.
This approach is (or may be) better than none at all but is bestavoided if a suitable preprocessor is available. The first reason forthat is that a dedicated code generating tool (whether written in C++or Perl or anything) requires maintenance of its own, or else the datadesign becomes unjustifiably rigid.
Secondly, there can be (and probably is, right in your project) morethan one data definition of this kind, which is to produce an entirelydifferent output, according to an entirely different data design. Itwould therefore require an entirely different code generator; this isvery difficult to justify unless all data designs are extremely stable.
Thirdly, it is highly desirable that our macros can be plugged in anotherwise normal source file. This has to do with aesthetics not tounderestimate: the source code sprinkled with preprocessor statementsstill preserves the look and feel of the target programming language.
Even more importantly, it has to do with visibility (and linkage) ofthe generated output. Writing a code generator supporting this featureis no small feat.
Of course, a solution to all these problems is to split the codegenerator into two pieces: a conceptually simple yet flexible commonlanguage to describe how we process our definitions, and a common toolthat recognizes and processes these description statements in a perhapsotherwise normal source file.
This (of course) means a normal preprocessor.
What to look for in a preprocessor
When choosing a preprocessor, you may want to consider the followingcriteria:
A basic criterion, is it possible to arrange a re-scan of (acompile-time loop over) a segment of the source code? Another basiccriterion, does the language provide sufficient arithmeticcapabilities?
Maintaining and managing optimized code across a family of projectsrequires serious attention to the data structures that are constantwithin a given project build. It is advantageous to use a preprocessorto pre-compute any derived data and to share data among differentlanguages.
Ark Khasin, PhD, is withMacroExpressions which specializes in development of original softwareengineering tools including Unimal, an advanced preprocessor