The basics of being Agile in a real-time embedded systems environment - Embedded.com

The basics of being Agile in a real-time embedded systems environment

Different people mean different things when they use the term agile. In computing, the term was first used to describe a lightweight approach to performing project development after the original term, Extreme Programming (XP), failed to inspire legions of managers entrusted to oversee development projects.

Basically, Agile software development refers to a loosely integrated set of principles and practices focused on getting the software development job done in an economical and efficient fashion.

This series begins by considering why we need agile approaches to software development and then discusses agile in the context of real-time and embedded systems. It then turns to the advantages of agile development processes as compared to more traditional approaches.

The Agile Manifesto
A good place to start to understand agile methods is with the agile manifesto. The manifesto is a public declaration of intent by the Agile Alliance, consisting of 17 signatories including Kent Beck, Martin Fowler, Ron Jeffries, Robert Martin, and others. Originally drafted in 2001, this manifesto is summed up in four key priorities:

* Individuals and interactions over processes and tools
* Working software over comprehensive documentation
* Customer collaboration over contract negotiation
* Responding to change over following a plan

To support these statements, they give a set of 12 principles. I'll state them here to set the context of the following discussion:

1) Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.

2) Welcome changing requirements, even late in development. Agile processes harness change for the customer's competitive advantage.

3) Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.

4) Business people and developers must work together daily throughout the project.

5) Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done.

6) The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.

7) Working software is the primary measure of progress.

8) Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.

9) Continuous attention to technical excellence and good design enhances agility.

10) Simplicity — the art of maximizing the amount of work not done — is essential.

11) The best architectures, requirements, and designs emerge from self-organizing teams.

12 At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly.

Agile methods have their roots in the XP (Extreme Programming) movement based largely on the work of Kent Beck and Ward Cunningham. Both agile and XP have been mostly concerned with IT systems and are heavily code-based.

In this series of articles, I will focus on how to effectively harness the manifesto's statements and principles to real-time and embedded applications, and how to combine them with modeling to gain the synergistic benefits of model-driven development (MDD) approaches. (A good place for more information about agile modeling is Scott Ambler's agile modeling Web site .

Why Agile?
But why the need for a concept such as “agile” to describe software development? Aren't current software development processes good enough?

No, not really.

A process, in this context, can be defined as “a planned set of work tasks performed by workers in specific roles resulting in changes of attributes, state, or other characteristics of one or more work products.” The underlying assumptions are the following:

1) The results of using the process are repeatable, resulting in a product with expected properties (e.g., functionality and quality).

2) The production of the goal state of the work products is highly predictable when executing the process in terms of the project (e.g., cost, effort, calendar time) and product (e.g., functionality, timeliness, and robustness) properties.

3) People can be treated as anonymous, largely interchangeable resources.

4) The problems of software development are infinitely scalable; that is, doubling the resources will always result in halving the calendar time.

As it turns out, software is hard to develop. Most existing development processes are most certainly not repeatable or predictable in the sense above. There are many reasons proposed for why that is. For myself, I think software is fundamentally complex—that is, it embodies the “stuff” of complexity.

That's what software is best at — capturing how algorithms and state machines manipulate multitudes of data within vast ranges to achieve a set of computational results. It's “thought stuff,” and that's hard.

The best story I've heard about software predictability is from a blog on the SlickEdit Web site by Scott Westfall called the “The Parable of the Cave.” Estimating software projects turns out to be remarkably similar to estimating how long it will take to explore an unknown cave referred to in the story, yet managers often insist on being given highly precise estimates.

In addition, the scope of software is increasing rapidly. Compared to the scope of the software functionality in decades past, software these days does orders of magnitude more.

Back in the day, my first IBM PC had 64kB of memory and ran a basic disk operating system called DOS. DOS fit on a single 360kB floppy disk. (I know I'm dating myself, but my IBM PC was my fifth computer. I still remember fondly the days of my TRS-80 model I computer with its 4kB of memory. )

Windows XP weighs in at well over 30 million lines of code; drives hundreds of different printers, disks, displays, and other peripherals; and needs a gigabyte of memory to run comfortably. These software-intensive systems deliver far more functionality than the electronic-only devices they replace.

Compare, for example, a traditional phone handset with a modern cell phone. Or compare a traditional electrocardiogram (ECG) that drove a paper recorder like the one I used in medical school with a modern ECG machine— the difference is remarkable.

The modern machine can do everything the old machine did, plus detect a wide range of arrhythmias, track patient data, produce reports, and measure noninvasive blood pressure, blood oxygen concentration, a variety of temperatures, and even cardiac output.

Last, software development is really invention, and invention is not a highly predictable thing. In electronic and mechanical engineering, a great deal of the work is conceptually simply putting pieces together to achieve a desired goal, but in software those pieces are most often invented (or reinvented) for every project.

This is not to oversimplify the problems of electronic or mechanical design but merely to point out that the underlying physics of those disciplines is far more mature and well understood than that of software.

But it doesn't really matter if you believe my explanations; the empirical results of decades of software development are available. Most products are late. Most products are delivered with numerous and often significant defects. Most products don't deliver all the planned functionality.

We have become used to rebooting our devices, but 30 years ago it would have been unthinkable that we would have to turn our phones off, remove the batteries, count to 30, reinsert the batteries, and reboot our phones. (Much as I love my BlackBerry, I was amazed that a customer service representative recommended removing the battery to reboot the device daily. ) Unfortunately, that is the “state of the art” today.

To this end, many bright people have proposed processes as a means of combating the problem, reasoning that if people engineered software rather than hacked away at it, the results would be better.

And they have, to a large degree, been better. Nevertheless, these approaches have been based on the premise that software development can be treated the same as an industrial manufacturing process and achieve the same results.

Industrial automation problems are highly predictable, and so this approach makes a great deal of sense when the underlying mechanisms driving the process are very well understood and are inherently linear (i.e., a small change in input results in an equally small change in output. )

It makes less sense when the underlying mechanisms are not fully understood or the process is highly nonlinear. Unfortunately, software development is neither fully understood nor even remotely linear.

It is like the difference in the application of fuzzy logic and neural networks to nonlinear control systems. Fuzzy logic systems work by applying the concept of partial membership and using a centroid computation to determine outputs.

The partial membership of different sets (mapping to different equations) is defined by set membership rules, so fuzzy logic systems are best applied when the rules are known and understood, such as in speed control systems.

Neural networks, on the other hand, don't know or care about rules. They work by training clusters of simple but deeply interconnected processing units (neurons). The training involves applying known inputs (“exemplars”) and adjusting the weights of the connections until you get the expected outputs.

Once trained, the neural network can produce results from previously unseen data input sets and produce control outputs. The neural network learns the effects of the underlying mechanisms from actual data, but it doesn't in any significant way “understand” those mechanisms. Neural networks are best used when the underlying mechanisms are not well understood because they can learn the data transformations inherent in the mechanisms.

Rigorously planned processes are akin to fuzzy logic—they make a priori assumptions about the underlying mechanisms. When they are right, a highly predictable scheme results.

However, if those a priori assumptions are either wrong or missing, then they yield less successful results. In this case, the approach must be tuned with empirical data. To this end, most traditional processes do “extra” work and produce “extra” products to help manage the process. These typically include:

* Schedules
* Management plans
* Metrics (e.g., source lines of code [SLOC] or defect density)
* Peer and management reviews and walk-throughs
* Progress reports

And so on.

The idea is that the execution of these tasks and the production of the work products correlate closely with project timeliness and product functionality and quality. However, many of the tasks and measures used don't correlate very well, even if they are easy to measure. Even when they do correlate well, they incur extra cost and time.

Agile methods are a reaction in the developer community to the high cost and effort of these industrial approaches to software development. The mechanisms by which we invent software are not so well understood as to be highly predictable. Further, small changes in requirements or architecture can result in huge differences in development approach and effort.

Because of this, empiricism, discipline, quality focus, and stakeholder focus must all be present in our development processes. To this end, agile methods are not about hacking code but instead are about focusing effort on the things that demonstrably add value and defocusing on efforts that do not.

Why Real-Time Embedded Systems need Agile
Of course, software development is hard. Embedded software development is harder. Real-time embedded software is even harder than that. This is not to minimize the difficulty in reliably developing application software, but there are a host of concerns with real-time and embedded systems that don't appear in the production of typical applications.

An embedded system is one that contains at least one CPU but does not provide general computing services to the end users. A cell phone is considered an embedded computing platform because it contains one or more CPUs but provides a dedicated set of services (although the distinction is blurred in many contemporary cell phones).

Our modern society is filled with embedded computing devices: clothes washers, air traffic control computers, laser printers, televisions, patient ventilators, cardiac pacemakers, missiles, global positioning systems (GPS), and even automobiles—the list is virtually endless.

The issues that appear in real-time embedded systems manifest themselves on four primary fronts. First, the optimization required to effectively run in highly resource-constrained environments makes embedded systems more challenging to create. It is true that embedded systems run the gamut from 8-bit processes in dishwashers and similar machinery up to collaborating sets of 64-bit computers.

Nevertheless, most (but not all) embedded systems are constrained in terms of processor speed, memory, and user interface (UI). This means that many of the standard approaches to application development are inadequate alone and must be optimized to ?t into the computing environment and perform their tasks.

Thus embedded systems typically require far more optimization than standard desktop applications. I remember writing a real-time operating system (RTOS) for a cardiac pacemaker that had 32kB of static memory for what amounted to an embedded 6502 processor. (It even had a small file system to manage different pacing and monitoring applications ) Now that's an embedded system!

Along with the highly constrained environments, there is usually a need to write more device-driver-level software for embedded systems than for standard application development.

This is because these systems are more likely to have custom hardware for which drivers do not exist, but even when they do exist, they often do not meet the platform constraints. This means that not only must the primary functionality be developed, but the low-level device drivers must be written as well.

The real-time nature of many embedded systems means that predictability and schedulability affect the correctness of the application. In addition, many such systems have high reliability and safety requirements.

These characteristics require additional analyses, such as schedulability (e.g., rate monotonic analysis, or RMA), reliability (e.g., failure modes and effects analysis, or FMEA), and safety (e.g., fault tree analysis, or FTA) analysis. In addition to “doing the math,” effort must be made to ensure that these additional requirements are met.

Last, a big difference between embedded and traditional applications is the nature of the so-called target environment—that is, the computing platform on which the application will run. Most desktop applications are “hosted” (written) on the same standard desktop computer that serves as the target platform. This means that a rich set of testing and debugging tools is available for verifying and validating the application.

In contrast, most embedded systems are “cross-compiled” from a desktop host to an embedded target. The embedded target lacks the visibility and control of the program execution found on the host, and most of the desktop tools are useless for debugging or testing the application on its embedded target.

The debugging tools used in embedded systems development are almost always more primitive and less powerful than their desktop counterparts. Not only are the embedded applications more complex (due to the optimization), and not only do they have to drive low-level devices, and not only must they meet additional sets of quality-of-service (QoS) requirements, but the debugging tools are far less capable as well.

It should be noted that another difference exists between embedded and “IT” software development. IT systems are often maintained systems that constantly provide services, and software work, for the most part, consists of small incremental efforts to remove defects and add functionality.

Embedded systems differ in that they are released at an instant in time and provide functionality at that instant. It is a larger effort to update embedded systems, so that they are often, in fact, replaced rather than being “maintained” in the IT sense. This means that IT software can be maintained in smaller incremental pieces than can embedded systems, and “releases” have more signi?cance in embedded software development.

A “real-time system” is one in which timeliness is important to correctness. Many developers incorrectly assume that “real-time” means “real fast.” It clearly does not. Real-time systems are “predictably fast enough” to perform their tasks.

If processing your eBay order takes an extra couple of seconds, the server application can still perform its job. Such systems are not usually considered real-time, although they may be optimized to handle thousands of transactions per second, because if the system slows down, it doesn't affect the system's correctness.

Real-time systems are different. If a cardiac pacemaker fails to induce current through the heart muscle at the right time, the patient's heart can go into fibrillation. If the missile guidance system fails to make timely corrections to its attitude, it can hit the wrong target. If the GPS satellite doesn't keep a highly precise measure of time, position calculations based on its signal will simply be wrong.

Real-time systems are categorized in many ways. The most common is the broad grouping into “hard” and “soft.” “Hard” real-time systems exhibit significant failure if every single action doesn't execute within its time frame. The measure of timeliness is called a deadline – the time after action initiation by which the action must be complete. Not all deadlines must be in the microsecond time frame to be real-time.

The F2T2EA (Find, Fix, Track, Target, Engage, Assess ) Kill Chain is a fundamental aspect of almost all combat systems; the end-to-end deadline for this compound action might be on the order of 10 minutes, but pilots absolutely must achieve these deadlines for combat effectiveness.

The value of the completion of an action as a function of time is an important concept in real-time systems and is expressed as a “utility function” as shown in Figure 1.1 below . This figure expresses the value of the completion of an action to the user of the system. In reality, utility functions are smooth curves but are most often modeled as discontinuous step functions because this eases their mathematical analysis.

Figure 1.1 Utility function

In the figure, the value of the completion of an action is high until an instant in time, known as the deadline; at this point, the value of the completion of the action is zero. The length of time from the current time to the deadline is a measure of the urgency of the action.

The height of the function is a measure of the criticality or importance of the completion of the action. Criticality and urgency are important orthogonal properties of actions in any real-time system. Different scheduling schemas optimize urgency, others optimize importance, and still others support a fairness (all actions move forward at about the same rate) doctrine.

Actions are the primitive building blocks of concurrency units, such as tasks or threads. A concurrency unit is a sequence of actions in which the order is known; the concurrency unit may have branch points, but the sequence of actions within a set of branches is fully deterministic. This is not true for the actions between concurrency units. Between concurrency units, the sequence of actions is not known, or cared about, except at explicit synchronization points.

Figure 1.2 below illustrates this point. The flow in each of the three tasks (shown on a UML activity diagram) is fully specified. In Task 1, for example, the sequence is that Action A occurs first, followed by Action B and then either Action C or Action D.

Similarly, the sequence for the other two tasks is fully defined. What is not de?ned is the sequence between the tasks. Does Action C occur before or after Action W or Action Gamma? The answer is You don't know and you don't care.

However, we know that before Action F, Action X, and Action Zeta can occur, Action E, Action Z, and Action Gamma have all occurred. This is what is meant by a task synchronization point.

Figure 1.2. Concurrency units

Because in real-time systems synchronization points, as well as resource sharing, are common, they require special attention in real-time systems not often found in the development of IT systems.

Within a task, several different properties are important and must be modeled and understood for the task to operate correctly (Figure 1.3 below ). Tasks that are time-based occur with a certain frequency, called the period.

Figure 1.3. Task time

The period is the time between invocations of the task. The variation around the period is called jitter. For event-based task initiation, the time between task invocations is called the interarrival time. For most schedulability analyses, the shortest such time, called the minimum interarrival time, is used for analysis.

The time from the initiation of the task to the point at which its set of actions must be complete is known as the deadline. When tasks share resources, it is possible that a needed resource isn't available. When a necessary resource is locked by a lower-priority task, the current task must block and allow the lower-priority task to complete its use of the resource before the original task can run.

The length of time the higher-priority task is prevented from running is known as the blocking time. The fact that a lower-priority task must run even though a higher-priority task is ready to run is known as priority inversion and is a property of all priority-scheduled systems that share resources among task threads. Priority inversion is unavoidable when tasks share resources, but when uncontrolled, it can lead to missed deadlines.

One of the things real-time systems must do is bound priority inversion (e.g., limit blocking to the depth of a single task) to ensure system timeliness. The period of time that a task requires to perform its actions, including any potential blocking time, is called the task execution time.

For analysis, it is common to use the longest such time period, the worst-case execution time, to ensure that the system can always meet its deadlines.

Finally, the time between the end of the execution and the deadline is known as the slack time. In real-time systems, it is important to capture, characterize, and manage all these task properties.

Real-time systems are most often embedded systems as well and carry those burdens of development. In addition, real-time systems have timeliness and schedulability constraints.

Real-time systems must be timely—that is, they must meet their task completion time constraints. The entire set of tasks is said to be schedulable if all the tasks are timely.

Real-time systems are not necessarily (or even usually) deterministic, but they must be predictably bounded in time. Methods exist to mathematically analyze systems for schedulability, and there are tools11 to support that analysis.

Safety-critical and high-reliability systems are special cases of real-time and embedded systems. The term safety means “freedom from accidents or losses” and is usually concerned with safety in the absence of faults as well as in the presence of single-point faults. Reliability is usually a stochastic measure of the percentage of the time the system delivers services.

Safety-critical systems are real-time systems because safety analysis includes the property of fault tolerance time—the length of time a fault can be tolerated before it leads to an accident. They are almost always embedded systems as well and provide critical services such as life support, flight management for aircraft, medical monitoring, and so on.

Safety and reliability are assured through the use of additional analysis, such as FTA, FMEA, failure mode, effects, and criticality analysis (FMECA), and often result in a document called the hazard analysis.

This combines fault likelihood, fault severity, risk (the product of the previous two), hazardous conditions, fault protection means, fault tolerance time, fault detection time, and fault protection action time together. Safety-critical and high-reliability systems require additional analysis and documentation to achieve approval from regulatory agencies such as the FAA and FDA.

It is not at all uncommon for companies and projects to specify very heavyweight processes for the development of these kinds of systems—safety-critical, high-reliability, real-time, or embedded—as a way of injecting quality into those systems.

And it works, to a degree. However, it works at a very high cost. Agile methods provide an alternative perspective on the development of these kinds of systems that is lighter-weight but does not sacrifice quality.

Next in Part 2: Benefits of Agile Methods .

Used with the permission of the publisher, Addison-Wesley, an imprint of Pearson Higher Education, this series of three articles is based on material from “Real Time Agility” by Bruce Powel Douglass .

Bruce Powel Douglass has worked as a software developer in real-time systems for over 25 years and is a well-known speaker, author, and consultant in the area of real-time embedded systems. He is on the Advisory Board of the Embedded Systems Conference where he has taught courses in software estimation and scheduling, project management, object-oriented analysis and design, communications protocols, finite state machines, design patterns, and safety-critical systems design. He develops and teaches courses and consults in real-time object-oriented analysis and design and project management and has done so for many years. He has authored articles for a many journals and periodicals, especially in the real-time domain.

He is the chief evangelist for Rational/IBM, a leading producer of tools for software and systems development. Bruce worked with various UML partners on the specification of the UM, both versions 1 and 2. He is a former co-chairs of the Object Management Group's Real-Time Analysis and Design Working Group. He is the author of several other books on software, including Doing Hard Time: Developing Real-Time Systems with UML, Objects, Frameworks and Patterns (Addison-Wesley, 1999), Real-Time Design Patterns: Robust Scalable Architecture for Real-Time Systems (Addison-Wesley, 2002), Real-Time UML 3rd Edition: Advances in the UML for Real-Time Systems (Addison-Wesley, 2004), Real-Time UML Workshop for Embedded Systems (Elsevier Press, 2006) and several others, including a short textbook on table tennis.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.