In addition to all of the usual testing concerns, interactive graphical applications present some complicated issues. But testing around these complexities is possible, as shown here.
The automated testing of a traditional sequential program, such as a compiler, is well understood and practiced in many areas of the computing industry. By contrast, the regular automated testing of interactive programs, especially their presentational aspects, requires additional techniques and rigors to assure operational quality and "correct" behavior and output over the program's lifetime. This article introduces the concepts of automated regression testing and its advantages over traditional methods, illustrated with a specific development and testing regime.
We are all familiar with the mantra that testing is a good thing. But what does this entail when the software in question is a graphical interactive program running on a range of embedded platforms? How does testing fit in, not only with the quality assurance process, but also with the simple day-to-day engineering activities? How can frameworks be constructed to anticipate future testing needs as well as those known and understood when a testing framework is introduced?
This article uses a hypothetical embedded graphical application to explore many of the principles embodied within this testing regime. With this one example, it is possible to explore a range of different issues, some or all of which might be directly relevant to the reader. Both the philosophical abstractions of such testing and the practicalities of implementing this approach are covered.
What sort of software are we testing?
Software portability is a major consideration to many software engineering departments in these days of Internet appliances. Portability is often accomplished by employing a software philosophy and structure wherein the application executes through the assistance of a target-specific portability library. When addressing a new target platform, only the implementation of the portability layer needs specific attention-the application and user interface are compiled with the appropriate toolchain and do not require modification.
All the code is written in ANSI C for further portability. This means that varying only the implementation of the portability layer permits the same application and user interface code to execute on different platforms.
This has obvious benefits from a software development viewpoint, but it also gives rise to a powerful testing mechanism to separate the behavior of an application and user interface from a target platform. Significantly, it permits us to take advantage of the greater speed of desktop workstations, which are typically about a decimal magnitude faster than the majority of embedded target platforms. By developing implementations of the portability layer for Unix and Win32, we are able to perform the development and testing of applications and their user interfaces on fast workstations, rather than on (potentially) slow target platforms. Workstation testing enables significantly more testing to be performed in a given time period than direct testing on the target embedded platform.
With suitably defined rigorous abstractions between a portable application and the target-specific implementation of the portability layer, it is possible to make powerful, reasoned statements about how much software needs testing on a given target platform. Further, if the main application itself fails on a particular target platform but not on a development workstation, the list of potential causes can be drastically reduced through some simple logical thinking. Typically, such failures are caused by incorrect implementations of the portability layer, toolchain faults, and hardware faults. As can readily be seen, such scope reduction is incredibly valuable.
To help illustrate the range of ideas and practices we have explored, this article uses a hypothetical program called GVLize. This is a program for visualising graphs of connected nodes. It takes data sets describing a large number of nodes and their interconnections, and produces a graphical representation with which the user can interact. The complexity of the layout task, both in terms of the number of possible combinations and the ability to produce a visually pleasing layout, means that there is no simple test for "correctness." Further, the user can "nudge" the layout to perform small modifications to the chosen layout. GVLize is intended to run across a range of embedded platforms, with the main development performed on workstations. The majority of the code is implemented on top of a small portability library.
GVLize typically runs as a "full screen" application on the target embedded platforms. Data sets are obtained from both local capabilities and through one or more network connections.
GVLize thus illustrates a range of characteristics typical of many real-world applications, even if most real applications only possess a subset of these characteristics. This presents a number of additional testing challenges beyond those associated with standard workstation programs such as compilers:
- Primary output is graphical
- Interactive in nature (possesses a main event loop)
- "Correctness" is difficult to fully determine programmatically
- Cross-platform nature introduces a range of additional hazards
What is to be tested?
All aspects of behavior, including interactivity and presentation, are to be tested. These include:
- Machine checks (otherwise known as exceptions, traps, segmentation faults, or bus errors)
- Internal assertion checks
- Internal data structure consistency checks
- Ensure operations complete without looping indefinitely or taking an excessively long time
- Heap integrity
- Heap exhaustion behavior
- Error recovery behavior
- User input simulation
- Correct screen contents
Additional factors that must be taken into account due to the nature of both the applications and the target platforms upon which they execute are:
- An interactive program has no obvious termination point
- The complexities of the interaction of GVLize means that correct behavior can require operator verification
- Network behavior can introduce varying behavior
- The potential "remoteness" caused by executing on a target platform must be addressed
- Long term heap fragmentation and leakage
Having identified a number of very specific issues relating to the testing to be performed, the next few sections build up the structure of the testing harness. This harness is designed to provide flexible and long-term testing. As such, its design is well suited to many other testing requirements.
Note that we assume you're using version control to track the specific source code that goes into specific builds and releases of your software. Without that, this sort of automated testing will only help you detect bugs, and fixing them will be considerably more difficult.
What's a test?
The fundamental purpose of testing is to determine whether something specific works or not. This gives rise to the first attribute of a test-its status. The range of different values that can be recorded by the status is deliberately limited to the absolute minimum. The four status values that must be distinguishable are shown in Table 1.
| Table 1: Four test status values that must be distinguishable |
| Test status |
Ascribed meaning |
| Never |
The test has never been performed |
| Passed |
Testing did not find any reasons to fault the test |
| Failed |
One or more reasons to fault the test were found |
| Broken |
Testing could not be performed (for example, equipment absence) |
It is important to distinguish between not having obtained a test result and either the passed or failed status values. Consider the situation where the target platform necessary to perform a test is not powered up. It would be inaccurate to say that being unable to perform such a test equated to that test either failing or passing. Not having the necessary equipment to perform a test is simply not a predictor of whether a test will pass or fail. In a similar fashion, the fact that a test has never been attempted is an important piece of information to be able to distinguish.
The next characteristic of a test is identification. We use two different ways to identify a test-a unique integer test number (the test ID) and a unique textual name (the test name). Over time it is to be expected that new tests will arrive and old tests will be retired. Thus, test ID values should be allocated atomically, with no reuse of values, with the expectation that the sequence may contain holes. Test names are by convention constructed in a hierarchical fashion to encapsulate progressively more specific aspects of a test. Consider an example test name:
/gvlize/x86-linux-gcc/xgvlize/
1shot/gis/cambs/dset1203
The standard forward-slash-separator approach is used. The leftmost component is the most significant. The meaning of this test name is deconstructed in Table 2.
| Table 2: Deconstruction of a test name |
Test name: /gvlize/x86-linux-gcc/xgvlize/1shot/ gis/cambs/dset1203 |
| Name component |
Component meaning |
| gvlize |
The product being tested is GVLize |
| x86-linux-gcc |
The executable is targed at Linux running on an x86 processor and has been compiled with GCC |
| xgvlize |
The particular variant of GVLize is the reference X Windows implementation |
| 1shot |
The test category is the "1-shot" category |
| gis |
The sub-category is GIS node data set |
| cambs |
The sub-sub-category is data sets of Cambridge |
| dset1203 |
The particular test data set |
The final characteristic of a test represents a practical compromise between maintaining a clean abstraction within the core of the test harness and the many assorted tests and their variants that it controls in a real-world environment. This is a list of test attributes, stored as a semi-colon separated list of strings. This provides a mechanism whereby the test harness records and preserves test-specific information without the harness itself examining or (necessarily) being directly influenced by such stored attributes. These attributes are used for such things as indicating which tests have unusually long timeout values or are expected to generate error situations, and so on.
See Table 3 for a complete list of test characteristics.
| Table 3: Test characteristics |
| Test characteristic |
Description |
| Status |
One of passed, failed, broken, and never |
| ID |
Unique integer identifier |
| Name |
Unique textual identifier |
| Attributes |
Semi-colon separated list of strings |
Adding a test database
Practical use of a testing mechanism within day-to-day development activities and the quality assurance process requires some history to be associated with each test. Thus, the details of all tests known to the test harness are recorded in a database. To permit historical observations to be made, a number of dates are recorded for each test, as shown in Table 4.
| Table 4: Test database |
| Database field |
Information recorded |
| Created |
Date test was first entered in the database |
| Last passed |
Date the test was passed |
| Last failed |
Date the test last failed |
The date a test was last run is readily determined from the latest date recorded in the passed, failed, and broken fields.
A short aside: when catering for tests that take significant amounts of time to execute, one might create a separate last run field in which to record the time at which testing started. The passed, failed, and broken fields would then record the time when the new status was actually determined and testing finished. We have not yet found a significant requirement for this level of subtle timing.
How is a test conceptually implemented?
Having sketched out a testing framework (or test harness), let's look at how a number of tests themselves interact with the framework. A test presents itself to the test harness as a function pointer or function reference. This function is referred to as the implementor function. The basic contract between the test harness and the implementor function is as follows:
- A test registers itself with the test harness to become part of the testing system, supplying an implementor function, a test name, and an initial set of test attributes
- The test harness is responsible for permanently recording all relevant information
- The implementor function is responsible for all aspects of executing the actual test, ensuring a tidy state after this testing, and yielding a new status value
- The implementor function is also responsible for reporting the detailed results of a previous test