Over the past few years, I've spent a large amount of my time consulting with and training software development teams that were in the midst of rearchitecture. These teams had already developed the firmware inside successful long-lived products or product families. But to keep moving forward, reduce bugs, and speed new feature development, they needed to take the best of their old code and plug it into a better firmware architecture.
In the process, I've collected substantial anecdotal evidence that leads me to conclude that few programmers, technical managers, or teams truly understand what good firmware architecture is, how to achieve it, or even how to recognize it when they see it. That includes the most experienced individual developers on a team. Yet, despite the fact that these teams work in a range of very different industries (including safety-critical medical devices), the rearchitecture process is remarkably similar from my point of view. And there are numerous ways that our clients' products and engineering teams would have benefited from getting their firmware architecture right from the beginning.
Although learning to create solid firmware architecture and simultaneously rearchitecting legacy software may take a team months of hard work, five key steps are easily identified. So whether you are designing firmware architecture from scratch for a new product or launching a rearchitecture effort of your own, you can use this step-by-step process to help your team get started on the right foot.
Step 1: Identify the requirements
Before we can begin to (re)architect an embedded system or its firmware, we must have clear requirements. Properly written requirements define the WHAT of a product. WHAT does the product do for the user, specifically? For example, if the product is a ventilator, the list of WHAT it does may include a statement such as:
“If power is lost during operation, the ventilator shall resume operation according to its last programmed settings within 250 ms of power up.”
Note that a properly written requirement is silent about HOW this particular part of the overall WHAT is to be achieved. The implementation could be purely electronics or a combination of electronics and firmware; the firmware, if present, might contain an RTOS or it might not. From the point of view of the requirement writer, then, there may as well be a gnome living inside the product that fulfills the requirement.1 (So long as the gnome is trustworthy and immortal, of course!)
Each requirement statement must also be two other things: unambiguous and testable. An unambiguous statement requires no further explanation. It is as clear and as concise as possible. If the requirement includes a mathematical model of expected system behavior, it is helpful to include the equations.2
Testability is key. If a requirement is written properly, a set of tests can be easily constructed to verify that requirement is met. Decoupling the tests from the particulars of the implementation, in this manner, is of critical importance. Many organizations perform extensive testing of the wrong stuff. Any coupling between the test and the implementation is problematic.
A proper set of requirements is a written list of statements each of which contains the key phrase ” the [product] shall …” and is silent about how it is implemented, unambiguous, and testable. This may seem like a subject unrelated to architecture, but too often it is poor requirements that constrain architecture. Thus good architecture depends in part on good requirements.3
Step 2: Distinguish architecture from design
Over the years, I have found that many engineers (as well as their managers) struggle to separate the various elements or layers of firmware engineering. For example, Netrino is barraged with requests for “design reviews” that turn out to be “code reviews” because the customer is confused about the meaning of “design.” This even happens in organizations that follow a defined software development lifecycle. We need to clear this up.
The architecture of a system is the outermost layer of HOW. Architecture describes persistent features; the architecture is hard to change and must be got right through careful thinking about intended and permissible uses of the product. By analogy, an architect describes a new office building only very broadly. A scale model and drawings show the outer dimensions, foundation, and number of floors. The number of rooms on each floor and their specific uses are not part of the architecture.4
Architecture is best documented via a collection of block diagrams, with directional arrows connecting subsystems. The system architecture diagram identifies data flows and shows partitioning at the hardware vs. firmware level. Drilling down, the firmware architecture diagram identifies subsystem-level blocks such as device drivers, RTOS, middleware, and major application components. These architectural diagrams should not have to change even as roadmap features are added to the product—at least for the next few years. Architectural diagrams should also pass the “six-pack test,” which says that even after drinking a six pack of beer, every member of the team should still be able to understand the architecture; it is devoid of confusing details and has as few named components as possible.5
The design of a system is the middle layer of HOW. The architecture does not include function or variable names. A firmware design document identifies these fine-grained details, such as the names and responsibilities of tasks within the specific subsystems or device drivers, the brand of RTOS (if one is used), and the details of the interfaces between subsystems. The design documents class, task, function/method, parameter, and variable names that must be agreed upon by all implementers. This is similar to how a design firm hired by the renter of a floor on the office building describes the interior and exterior of the new building in finer detail than the architect. Designers locate and name rooms and give them specific purposes (e.g., cube farm, corner office, or conference room).
An implementation is the lowest layer of HOW. There need be no document, other than the source code or schematics, to describe the implementation details. If the interfaces are defined sufficiently at the design level above, individual engineers are able to begin implementation of the various component parts in parallel. This is similar to the way that a carpenter, plumber, and electrician work in parallel in nearby space, applying their own judgment about the finer details of component placement, after the design has been approved by the lessee.
Of course, there is architecture and there is good architecture. Good architecture makes the most difficult parts of the project easy. These difficult parts vary in importance somewhat from industry to industry, but always center on three big challenges that must be traded off against each other: meeting real-time deadlines, testing, and diversity management. Addressing those issues comprise the final three steps.
Step 3: Manage time
Some of your product's requirements will mention explicit amounts of time. For example, consider the earlier ventilator requirement about doing something “within 250 ms of power up.” That is a timeliness requirement. “Within 250 ms of power up” is just one deadline for the ventilator implementation team to meet. (And something to be tested under a variety of scenarios.) The architecture should make it easy to meet this deadline, as well as to be certain it will always be met.
Most products feature a mix of non-real-time, soft-real-time, and hard-real-time requirements. Soft deadlines are usually the most challenging to define in an unambiguous manner, test, and implement. For example, in set-top box design it may be acceptable to drop a frame of video once in a while, but never more than two in a row, and never any audio, which arrives in the same digital input stream. The simplest way to handle soft deadlines is to treat them as hard deadlines that must always be met.
With deadlines identified, the first step in architecture is to push as many of the timeliness requirements as possible out of the software and onto the hardware. Figure 1 shows the preferred placement of real-time functionality. As indicated, an FPGA or a dedicated CPU is the ideal place to put real-time functionality (irrespective of the length of the deadline). Only when that is not possible, should an interrupt service routine (ISR) be used instead. And only when an ISR won't work should a high-priority task be used.
Keeping the real-time functionality separate from the bulk of the software is valuable for two important reasons. First, because it simplifies the design and implementation of the non-real-time software. With timeliness requirements architected out of the bulk of the software, code written by novice implementers can be used without affecting user safety.6
The second advantage of keeping the real-time functionality together is it simplifies the analysis involved in proving all deadlines are always met. If all of the real-time software is segregated into ISRs and high-priority tasks, the amount of work required to perform rate monotonic analysis (RMA) is significantly reduced. Additionally, once the RMA analysis is completed, it need not be revised every time the non-real-time code is tweaked or added to.
Step 4: Design for test
Every embedded system needs to be tested. Generally, it is also valuable or mandatory that testing be performed at several levels. The most common levels of testing are:
- System tests verify that the product as a whole meets or exceeds the stated requirements. System tests are generally best developed outside of the engineering department, though they may fit into a test harness developed by engineers.7
- Integration tests verify that a subset of the subsystems identified in the architecture diagrams interact as expected and produce reasonable outcomes. Integration tests are generally best developed by a testing group or person within software engineering.
- Unit tests verify that individual software components identified at the intermediate design level perform as their implementers expect. That is, they test at the level of the public API the component presents to other components. Unit tests are generally best developed by the same people that write the code under test.8
Of the three, system tests are most easily developed, as those test the product at its exposed hardware interfaces to the world (e.g., does the dialysis machine perform as required). Of course, a test harness may need to be developed for engineering and/or factory acceptance tests. But this is generally still easier than integration and unit tests, which demand additional visibility inside the device as it operates.
To make the development, use, and maintenance of integration and unit tests easy, it is valuable to architect the firmware in a manner compatible with a software test framework. The single best way to do this is to architect the interactions between all software components at the levels you want to test so they are based on publish-subscribe event passing (a.k.a., message passing).
Interaction based on a publish-subscribe model allows a lightweight test framework like the one shown in Figure 2 to be inserted alongside the software component(s) under test. Any interface between the test framework and the outside world, such as a serial port, provides an easy way to inject or log events. A test engine on the other side of that communications interface can then be designed to accept test “scripts” as input, log subscribed event occurrences, and off-line check logged events against valid result sequences. Adding timestamps to the event logger and scripting language features like delay(time) and waitfor(event) significantly increases testing capability.
It is unfortunate that the publish-subscribe component interaction model is at odds with proven methods of analyzing software schedulability (e.g., RMA). The sheer number of possible message arrival orders, queue depths, and other details make the analysis portion of guaranteeing timeliness difficult and fragile against minor implementation changes. This is, in fact, why it is important to separate the code that must meet deadlines from the rest of the software. In this architecture, though, the real-time functionality remains difficult to test other than at the system level.9
Step 5: Plan for change
The third key consideration during the firmware architecture phase of the project is the management of feature diversity and product customizations. Many companies use a single source code base to build firmware for a family of related products. For example, consider microwave ovens; though one high-end model may feature a dedicated “popcorn” button, another may lack this. The architecture of any new product's firmware will also soon be tested and stretched in the direction of foreseeable planned feature additions along the product road map.
To plan for change, you must first understand the types of changes that occur in your specific product. Then architect the firmware so that those sorts of changes are the easiest to make. If the software is architected well, feature diversity can be managed through a single software build with compile-time and/or run-time behavioral switches in the firmware. Similarly, new features can be added easily to a good architecture without breaking the existing product's functionality.
An architectural approach that handles product family diversity particularly well is one in which groups of related software components are collected into “packages”. Each such package is effectively an internal widget from which larger products can be built. The source code and unit tests for each particular package should be maintained by a team of “package developers” focused primarily on their stability and ease of use.
Teams of “product developers” combine stable releases of packages that contain the features they need, customize each as appropriate (e.g., via compile- or run-time mechanisms, or both) to their particular product, and add product-specific “glue.” Typically, all of the products in a related product family are built upon a common “Foundation” package (think API). For example a Model X microwave might be built from Foundation + Package A + Package B; whereas Model Y might consist of Foundation + A' + B + C, where package A' is a compile-time variant of package A and package C contains optional high-level cooking features, such as “Popcorn.”
Using this approach in a large organization, a new product built from a selection of stable bug-free packages can be brought to market quickly–and all products share an easy upgrade path as their core packages are improved. The main challenge in planning for change of this sort is in striking the right balance between packages that are too small and packages that are too large. Like many of the details of firmware architecture, achieving that balance for a number of years is more of an art than a science.
I hope the five-step “architecture road map” presented here is useful to you. I plan to drill down into more of the details in articles and columns over the coming months. Meanwhile your constructive feedback is welcome via the discussion forum or e-mail.
Michael Barr is the author of three books and over fifty articles about embedded systems design, as well as a former editor-in-chief of this magazine. Michael is also a popular speaker at the Embedded Systems Conference, a former adjunct professor at the University of Maryland, and the president of Netrino. He has assisted in the design and implementation of products ranging from safety-critical medical devices to satellite TV receivers. You can reach him via e-mail at or read more of what he has to say at his blog ().
1. This sounds silly, but it's true. If an individual requirement doesn't pass the “gnome test” then it merits further rewriting.
2. However, we must remember that this is only a WHAT requirement. The HOW implementer may not choose to include that equation in the code and still meet the requirement. For example, a mathematical equation may be converted into a look-up table plus interpolations, which may be executed by either hardware or software (or by a gnome).
3. If you can't get proper requirements from outside engineering, you may have to expend effort inside engineering translating the customer or marketing requirements you do have into a proper set of requirements for internal use.
4. With the exception of anything involving plumbing, which is almost as hard to change as the supporting columns.
5. Non-drinkers may prefer the phrase “30,000 foot view” which is the same standard.
6. To give an extreme example, marketing can have engineering add the game “Pong” in a low-priority task running on the ventilator without affecting patient safety.
7. The system tests should treat the firmware as a black box. In fact, they would ideally be structured at the “gnome level.”
8. Proponents of test-driven development advocate that tests at this level be written in advance of the functions or classes that they are intended to verify.
9. But isn't it always true that testing real-time behaviors intrusively is an oxymoron?