m1: A Mini Macro Processor

Jon Bentley

July 02, 2007

Jon BentleyJuly 02, 2007

The UNIX system provides several macro processors. The shell contains powerful mechanisms for text manipulation; the C language has a macro preprocessor; document preparation tools like Troff, Pic, and Eqn all have macros; and the m4 macro language is a general-purpose tool useful in many contexts. m1, a basic macro language, is at least three notches below m4 (it may well be six below, but m2 was too hard to type).

m1's implementation grew from a dozen-line Awk program that provides rudimentary services to a limited but useful two-page program. But why should programmers study a kind of program built in the 1950s? A few reasons I find convincing.

  • Macro processors provide a fine playground for learning about programming techniques. (B.W. Kerrnighan and P.J. Plauger devote the final chapter of Software Tools in Pascal to the topic.)
  • The implementation described here illustrates a number of devices useful in building Awk programs. (This article assumes familiarity with Awk; for more information, see The Awk Programming Language.)
  • Investigating the design considerations of this simple macro processor can help you appreciate the macro languages you use. (Studies indicate that programmers spend 1.7 percent of their time cursing unexpected side effects of macros.) If those reasons aren't good enough, here's the most convincing argument: programmers have studied macro processors since the beginning of time, so you have to, too. It's a rite of passage.

The Problem

Given that the UNIX system on which I live already has so many macro processors, why did I even consider building one more? Most macro languages assume that the input is divided into tokens by the rules of an underlying language. I recently faced a problem that didn't come in such a neatly wrapped package. I needed to make substitutions in the middle of strings, as in:

@define Condition under
You are clearly @Condition@worked.

The first line defines the string Condition, and in the second line that string (surrounded by the special @ characters) is replaced by the text under. Definitions must start on a new line with the string @define; the name is the next field separated by whitespace (blanks and tabs), and the replacement text is the rest of the line. Replacements are insensitive to context: the string @Condition@ is always replaced, even if it is inside quotes or not set apart by whitespace.

A simple macro language like this was sufficient to solve my immediate problem. But once I had it, more applications of the macro processor started to wander across my terminal screen.

(click here to read more)

Jon Bentley is the author of Writing Efficient Programs, Programming Pearls, and More Programming Pearls: Confessions of a Coder, among other books and papers

Loading comments...

Most Commented