Coming up with a Test Plan is the hardest job in a product company. Engineers like to develop cool products; marketing likes to tout the list of features; and QA/Test lingers in the background with little investment and a very tiny place in the schedule. Productivity is only as great as the quality of your product.
This article discusses our experience at Mirabilis Design in developing a comprehensive test and support plan. The complexity of the software required that we split the testing into sections- Graphical User Interface, feature library, Documentation, Web Links, Simulator or analytics and Database. This allowed some of the tests to be be fully automated while others require visual inspections.
The purpose of the test plan is to maximize the product quality, test the largest number of operating scenarios, ensure that models are upward compatible and identify incorrect operations.
In addition, the Test Plan is designed to address several unique features. In the case of our software these were mixed domains simulation and hierarchical considerations for memory, virtual connection, and virtual machine operations. In our discussion of the plan we came up with we will describe these aspects of software testing:
1. Background, Scope, Defects and Failures, Compatibility, Input Combinations and Preconditions, Static vs. Dynamic Testing, Software Verification and Validation, Software Testing Team, Software Quality Assurance (SQA)
2. Concept of Baseline Functionality
3. Baseline Libraries
4. New Libraries Based on Baseline Functionality
5. Regression Testing
6. Other Specific Block Issues to be Addressed
Mirabilis Design provides modeling and simulation solutions for exploring the performance and power consumption of applications running on complex embedded systems. Using the graphical environment, VisualSim, developers can create virtual environments to test their application performance and size the hardware platform based on metrics such as end-to-end response time, task deadline, scheduling schemes and reliability.
In our experience, as software evolves, the reemergence of faults is quite common. Sometimes it occurs because a fix gets lost through poor revision control practices (or simple human error in revision control). Often a fix for a problem will be “fragile”. The update fixes the current problem in the narrow case where it was first observed but not in more general cases which may arise over the lifetime of the software.
Finally, it has often been the case that when some feature is redesigned, the same mistakes will be made in the redesign that were made in the original implementation of the feature.
Therefore, in most software development situations it is considered good practice that when a bug is located and fixed, a test that exposes the bug is recorded and regularly retested after subsequent changes to the program.
Although this may be done through manual testing procedures using programming techniques, it is often done using automated testing tools. Such a test suite contains software tools that allow the testing environment to execute all the regression test cases automatically; some projects even set up automated systems to automatically re-run all regression tests at specified intervals and report any failures (which could imply a regression or an out-of-date test). Common strategies are to run such a system after every successful compile (for small projects), every night, or once a week.
|Figure 1 Flow Diagram of the Regression Testing (Source: base77.com)|
Regression testing (Figure 1, above ) is an integral part of the extreme programming software development method. In this method, design documents are replaced by extensive, repeatable, and automated testing of the entire software package at every stage in the software development cycle.
Traditionally, in the corporate world, regression testing has been performed by a software quality assurance team after the development team has completed work. However, defects found at this stage are the most costly to fix. This problem is being addressed by the rise of developer testing.
Although developers have always written test cases as part of the development cycle, these test cases have generally been either functional tests or unit tests that verify only intended outcomes. Developer testing compels a developer to focus on unit testing and to include both positive and negative test cases.
Defining the scope of the testing
At Mirabilis Design, we create detailed flow diagrams and design documents before scheduling the development. During development and after the code has reached a critical stage, we run the tests every night after all development has completed. Around the clock operation allows us to immediately identify the bugs.
The primary purpose for testing is to detect software failures so that defects may be uncovered and corrected (Figure 2, below ). This is a non-trivial pursuit. Testing cannot establish that a product functions properly under all conditions but can only establish that it does not function properly under specific conditions.
The scope of software testing often includes examination of code as well as execution of that code in various environments and conditions as well as examining the aspects of code: does it do what it is supposed to do and do what it needs to do.
|Figure 2. Major Testing Categories (Source: protocoltesting.com)|
In the current culture of software development, a testing organization may be separate from the development team. There are various roles for testing team members. Information derived from software testing may be used to correct the process by which software is developed.
We have a round-table where each member shares their code with the team. The team then does a functional inspection and quizzes on the logic and code flow. In effect, multiple team members are aware of the detail of each code block.
For the graphical editors, we assemble models using the documented features to utilize all the features. We also attempt features that are not documented but could cause incorrectly operation. During this process, we try to bridge error messages to any operation that might create unexpected operations.
Defects and Failures
Not all software defects are caused by coding errors. One common source of expensive defects is caused by requirements gaps, e.g., unrecognized requirements that result in errors of omission by the program designer.
A common source of requirements gaps is non-functional requirements such as testability, scalability, maintainability, usability, performance, and security. Software faults occur through the following processes. A programmer makes an error (mistake), which results in a defect (fault, bug) in the software source code.
If this defect is executed, in certain situations the system will produce wrong results, causing a failure. Not all defects will necessarily result in failures. For example, defects in dead code will never result in failures. A defect can turn into a failure when the environment is changed.
Examples of these changes in environment include the software being run on a new hardware platform, alterations in source data or interacting with different software. A single defect may result in a wide range of failure symptoms.
Our product, VisualSim, is built entirely in Java. So, the testing on multiple platforms has not been a major focus. Nevertheless, we occasionally find issues that can be major showstopper like illegal directory separators or unsupported image formats. Interfaces to VisualSim need to be tested on different platforms. These include MatLab, C, C++, Verilog, SystemC and Satellite Toolkit.
A frequent cause of software failure is compatibility with another application, a new operating system, or, increasingly, web browser version. In the case of lack of backward compatibility, this can occur (for example…) because the programmers have only considered coding their programs for, or testing the software upon, “the latest version of” this-or-that operating system.
The unintended consequence of this fact is that: their latest work might not be fully compatible with earlier mixtures of software/hardware, or it might not be fully compatible with another important operating system.
In any case, these differences, whatever they might be, may have resulted in (unintended…) software failures, as witnessed by some significant population of computer users. This could be considered a “prevention oriented strategy” that fits well with the latest testing phase.
Java has ensured the backward-compatibility between versions of applications at the underlying level and between Operating Systems (OS). We compares results of models built with previous generation tools and determines the reason for any change in statistics output. Java reduces the backward compatibility testing but does not eliminate it.
Input Combinations and Preconditions
A fundamental problem with software testing is that testing under all combinations of inputs and preconditions (initial state) is not feasible, even with a simple product.
This means that the number of defects in a software product can be very large and defects that occur infrequently are difficult to find in testing. More significantly, non-functional dimensions of quality (how it is supposed to be versus what it is supposed to do) — for example, usability, scalability, performance, compatibility, and reliability—can be highly subjective; something that constitutes sufficient value to one person may be intolerable to another.
Static vs. Dynamic Testing
There are many approaches to software testing. Reviews, walkthroughs or inspections are considered as static testing, whereas actually executing programmed code with a given set of test cases is referred to as dynamic testing.
The former can be, (and unfortunately in practice often is) omitted, whereas the latter takes place when programs begin to be used for the first time – which is normally considered the beginning of the testing stage.
This may actually begin before the program is 100% complete in order to test particular sections of code (modules or discrete functions). Typical techniques for this are using stubs / drivers, or execution from a debugger environment.
For example, Spreadsheet programs are tested to a large extent “on the fly” during the build process. The results of the calculation or text manipulation are shown interactively immediately after each formula is entered.
Mirabilis Design has two types of messages embedded into software code- one is for internal debugging messages and the other is for user-level messages.
The user-level messages help users in using the software package; eliminating syntactical and incorrect operation errors; and providing guidance when something unexpected occurs. These messages become an important validation tool for the users to identify system bottlenecks.
Software Verification and Validation
Software testing is used in association with verification and validation:
Verification: Have we built the software right (i.e., does it match the specification?)? It is process based.
Validation: Have we built the right software (i.e., is this what the customer wants?)? It is product based.
The terms verification and validation are commonly used interchangeably in the industry; it is also common to see these two terms incorrectly defined.
According to the IEEE Standard Glossary of Software Engineering Terminology: Verification is the process of evaluating a system or component to determine whether the products of a given development phase satisfy the conditions imposed at the start of that phase.
Validation is the process of evaluating a system or component during or at the end of the development process to determine whether it satisfies specified requirements.
|Figure 3 Functional Testing (Source: Adventnet.com)|
Software Testing Team
Software testing (Figure 3, above can be done by software testers. Until the 1980s the term “software tester” was used generally, but later it was also seen as a separate profession.
Regarding the periods and the different goals in software testing, different roles have been established: manager, test lead, test designer, tester, automation developer, and test administrator.
|Figure 4 Risk Assessment (Source: Rice Consulting)|
Though controversial, software testing may be viewed as an important part of the software quality assurance (SQA) process. In SQA, software process specialists and auditors take a broader view on software and its development. They examine and change the software engineering process itself to reduce the amount of faults that end up in the delivered software: the so-called defect rate.
What constitutes an “acceptable defect rate” depends on the nature of the software. For example, an arcade video game designed to simulate flying an airplane would presumably have a much higher tolerance for defects than mission critical software used to control the functions of an airliner that really is flying!
Although there are close links with SQA, testing departments often exist independently, and there may be no SQA function in some companies. Software
Testing is a task intended to detect defects in software by contrasting a computer program's expected results with its actual results for a given set of inputs. By contrast, QA (Quality Assurance) is the implementation of policies and procedures intended to prevent defects from occurring in the first place.
Baseline functionality is core enabler of the software application. Typically these are core features that developers will use in creating the look-and-feel, functions, user-interface, statistics and analytics. Modifying these baseline functions can cause a catastrophic effect on the entire product-line.
Once these have been developed, they must be left unmodified and not changed at all. When adding increment features to this baseline, it is important to keep the changes self-contained. Tests must stress these new features, as well as, the relationship to the rest of the baseline.
Baseline functionality in VisualSim refers to RegEx operations, data tokens, virtual connections, simulator events. They must operate consistently for all existing models. Improvement in a new release for one module may cause an incompatibility with another block that has not been updated.
For example, string token support is removed from the Processing block makes this incompatible with the Traffic_Reader block that puts a string on the output. A test suite if older models will fail to run.
These models relied on the input to the Processing block being either a RecordToken or a string version of a RecordToken. Testing showed that the Traffic_Reader block needed to be upgraded to output RecordToken. The Traffic Reader is built on a base operation. So, other Reader blocks will also need to be updated.
Existing test cases must be extended to add these new these scenarios. Next, the baseline functionality needs to verify that the new functions work for the multiple instance cases.
The baseline libraries are the Application-library. These include the Resource, Hardware blocks and the Application libraries. These use the information from the above testing. Examples of these libraries blocks include:
We shall look at two blocks to understand how we constructed the test cases- Smart_Timed_Resource and Processing blocks.
For the Smart_Timed_Resource,
1. Start with a simple check of whether the transaction enters and exists the block.
2. Vary the time delay for each transaction.
3. Add the priority
4. Provide incorrect value for all fields. For example, send a string in place of an integer or not have the field in the data structure or exceed the queue number count.
5. Check the statistics out and the reset
6. Overload the block and sample the reject output port
For the Processing block, the type of testing is different from the Smart_Timed_Resource. This block has limited functionality within and is dependent on the RegEx, token operation and casting functions of the data types. Here the procedure will be:
1. All data types are accepted on the input and output ports
2. Data received can be a field, data structure or memory value
3. AND of the input ports
4. Expressions are executed in sequence and values from the previous are available for the next line
5. Check the Left-Hand-Side vs. Right-Hand-Side for casting and expected results.
In this case, the RegEx functions are tested independently.
Baseline Functionality and Baseline Libraries
There are three types of libraries here- new RegEx functions, basic modeling blocks and standards-based components.
RegEx functions are built-in operators that can accelerate the implementation of a certain operation like getting the battery charge level or the sum of the elements in an array.
New functions are constantly being added as new applications or mechanism to speed up models is identified. This would require testing for all possible data types; checking for copy vs. reference if using memory; and handling of fields, parameters and memories.
Basic modeling blocks are normally associated with a generic functionality such as a flash memory or a distribution-based traffic or a new statistics plotter.
These would require testing against theoretical values and for a wide variety of parameter values. These would also have unit testing for each block parameter and input port value. Finally, all the possible cases will be experimented.
Standard-based components are built to meet certain standard. These can be a LPDDR or a PCI Express. In this case the standard tests, such as those for the Basic blocks, are insufficient.
Here we need to compare against the standard and also verify against specific vendor implementation. This is the hardest as a lot of data available may cover specific cases and leave consider able amount of room for implementation. Also the components must be
Regression tests can be instrumented using script tools such as pHp, Tcl and proprietary languages. In the case of VisualSim, we have introduced a non-graphical simulation environment. At the same time, we have developed a methodology and testing mechanism to verify the output is consistent between versions or between regression runs.
VisualSim can run multiple models from the command line, and can utilize our text file compare of model outputs to semi-automate our regression testing.
This methodology has been proven to work effectively, as compared to manual testing. If a discrepancy is identified, then the testing team can isolate the difference between the pre-production vs. production versions using a text file compare tool methodology.
This speeds up testing once a specific bug has been fixed, to verify that prior regression models have not broken. This command line execution of the regression suite of test models should be run every time we fix a bug. This can be performed to test for any discrepancies.
Other Specific Block Issues to be addressed
Standard tests, regressions and visual inspection cover all the known and possibly unknown cases. There might be a small list of cases that dependent on external interfaces.
An example is the link to the centralized license manager. How will this link operate in an unstable network or what happens if an input XML file is corrupted. Currently there are no automated methods for these and have to be done manually for every condition.
Deepak Shankar is CEO at Mirabilis Design, a providerof systems engineering solutions for electronics and real-time applications.He has over 15 years of industry experience at MemCall, SpinCircuit, Cadence, and Flextronics. He has an MBA from UC Berkeley, an MS from ClemsonUniversity and BS from Coimbatore Institute of Technology (India), both inelectronics and communications.
Darryl Koivisto is the CTO and has over 25 years experience as an Architect, Program Manager and computer modeling expert. Mr. Koivisto has honed his experience in quality by learning to follow rigid practice right from hisfirst day at work. He keeps a large brown book where he meticulously notesdown every technique adopted through the years. Prior to Mirabilis Design,he worked at Cadence Design System, Ford Aerospace, Signetics, and Amdahl. Mr. Koivisto has a DBA from Golden Gate University, MS from Santa Clara University and BS from California Polytechnic University, San Luis Obispo.