Seamless integration of multicore embedded systems -

Seamless integration of multicore embedded systems


This paper presents a seamless and continuous integration approach that allows to gradually introduce performance improvements while preserving an established functional baseline in an embedded system with demanding characteristics requirements. The following topics will be addressed: how performance improvements can be broken down in small steps with objective and measurable goals, how to predict, verify and measure them. An ad-hoc fault localization strategy is also proposed to exploit the multi-core DSP hardware and minimize human troubleshooting time. The purpose and benefits of this approach is to avoid a big bang and find critical faults very early in the project and secure project lead time, quality and budget.

This paper will describe the experience of introducing a new function in an existing telecom embedded software system that has reached its limits of processing capacity. This was a major technical challenge that had be dealt within stringent time & budget constraints; the three authors joined forces in the areas of embedded software design, systems, project engineering and defined and implemented the strategy that will be described in this paper.

This paper will guide you through the various steps of the experience:

  • Focus and define the problems that must be solved when integrating a large, real-time Embedded Software System
  • Determine the principles of the strategy that was implemented to reach the goal The principles have been simple and easy: give higher priority to stable system quality and performance with respect to the new functionality or changes, develop the system in (very) small steps, run continuous & seamless system integration with extensive regression test before verification of new functions
  • Establish thorough Quantitative Management of the Software Systems Performances Processor load and memory usage can be a real headache; it is always difficult to plan and control them throughout development activities and, especially, nobody wants surprises at the end of the project. In the case of this paper, the systems performances that had to be reached have been defined. This goal has then been broken down in a stepwise development by determining for each step the expected performance improvement; then at each development step the systems performances were measured and compared with the planned expected value.
  • Define and implement a model, an Integration Engine, that captures the strategy
  • Define a Fault Localization strategy that makes use of the multi-core DSP hardware in order to spend less time on fault localization by letting the system do it

The final chapter will present the results from the real project experience.

Embedded software is by definition difficult to test: when running on the target, the code is not always reachable; debugging tools their limits; sometimes the software designer may have to code using assembly; hardware devices may have bugs–just to mention a few typical barriers to a fault-free embedded software application.

In recent years, multicore technology has contributed to increasing the complexity of embedded systems, showing a very interesting opportunity to find efficient solutions to problems requiring high performance.

Often the development of such systems requires the use of simulation tools. A “basic test” phase with a simulator is the first mandatory step in a comprehensive test strategy aiming at discovering and fixing all kind of faults.

In the next phase, the “basic tested” software is loaded on the target hardware (a microprocessor, DSP, multicore ASIC, FPGA) where the interaction with the real hardware is tested: a second type of fault will be discovered here. These faults often require more time to be analyzed and fixed; typically this is run in the lab with all the needed instruments available for tracing and troubleshooting.

As pieces are put together and the system grows, more complex functionalities will be tested, and a third type of faults comes up that originates from interaction of the many integrated software parts.

When the system is almost complete and under test for days, a fourth type of fault is likely to show: a crash or a stopping fault. Here, finding which part of the system is misbehaving is usually very hard; in fact, this kind of fault is not easy to reproduce and when you succeed in reproducing it, you realize you need more logs and tracings. Time passes by and the software… is still unstable.

What is the best approach to deal with such problems? Maybe nobody has “the” answer, but we successfully tried the approach we present in this paper. The application case illustrated at the end of the paper is a real project where the multicore nature of the platform has been exploited to address the fault localization with a minimum human troubleshooting effort.

Generally speaking, development of a complex software system usually deals with the same problem: the project finds too many severe faults and undesired characteristics too late, leading to off-control quality and expensive re-engineering.

Most developers experience that after the code is developed and the system is integrated, suddenly nothing seems to work anymore (even those parts that were working before). The time and effort needed to get the system quality and performance back to an acceptable rate is a real “nightmare”: extra time, task forces, customer pressure, and so forth.

Instead, a developer or a project manager would go for is normally solving important issues day by day without pressure.

The solution can be to adopt an integration driven software engineering model that blends functional decomposition (into very small packages) with rigorous management of ESW functional and quantitative performance requirements and continuous system integration to secure a high degree of accuracy and efficiency.

The suggested approach is then to develop software with a tight control on system changes and pursuing an early feedback on quality by using continuous planning and small integration steps, in order to achieve predictability and efficiency.

View the full-size image

The main principles in this software-engineering model are:

(1) Quality first.

(2) System development in small steps.

(3) Continuous system integration.

(4) Regression test before verification.

(5) Parallel test phases.

Quality first
To achieve the desired software quality, the correction of faults should have higher priority than introducing additional system changes, so that backlog of unresolved troubles is very small and the time needed for stabilizing the system before delivering it to the customer is also very short.

A good starting point is for sure quality of the design base, but this is definitely not enough.

Human nature prefers to minimize unnecessary work, and fault troubleshooting and fixing is indeed unnecessary work. So, any faults in the system must be found early, corrected very quickly as well as the project has to ensure that the same fault is not found and corrected several times.

However the golden rule is to make as few faults as possible from the beginning by emphasizing software quality practices (ideally you should stop making faults at all!).

System development in small steps
The solution is split into very small system changes (let's call them deltas).

A regression test is performed to ensure that the changes implemented in all different system components are working and not harming legacy functions and thus avoiding the big bang delivery at the end of the project.

The benefits are many:

• Get frequent feedback on product quality.

• Reduce complexity: by implementing a small chunk of code you reduce the number and the complexity of the faults potentially introduced in that step (less and simpler faults, much faster to solve and fix).

• Track progress: frequent deliveries provide objective evidence of progress in the project.

• Achieve efficiency: by doing things frequently, people can learn lessons and improve project performances.

The basic goal is then to setup a software factory that runs and verifies frequent deliveries of new system versions to get quality feedback early with an efficient use of resources.

Continuous system integration
Delivered deltas are integrated into the latest version of the system, either if they contain testable content or incomplete functions. At different ESW/HW maturity stages, the number of components that can be integrated may vary, but the ambition is to have system level integration as soon as possible.

Build procedure must be automated and a smoke test is needed to ensure basic functionalities of the system, in order to reject failed components and not to harm the integrated system.

A project anatomy (as sketched in Figure 2) is used to show all planned development steps and how they depend on each other, from test, integration and design perspective: it is a very important tool to find the most efficient way to implement a given set of system changes.

The dependencies define the order in which the different deltas must be integrated and hence the implementation order: changes without dependencies, for instance, can be developed in parallel with each other.

Regression test before verification

Once the code implementing a delta is delivered, a regression test is run first to secure that previous baseline has not been destroyed (whatever worked before must keep on working), before verification of new functions starts.

A fully automated regression test suite is dynamically updated on system growth, driven by project anatomy. The same approach is performed at every level (from basic test to system test), but the highest applicable test phase depends on the ESW/HW maturity stage.

After delivery to integration and regression test, the design team continues implementing new changes, but if any fault is found on the delivered delta, it is corrected with higher priority.

Parallel test phases
The main enabler for a stable product quality is to have feedback from test as early as possible. Doing verification activities in parallel is a way to push for high quality, as well as a way to decrease the verification lead time.

One could argue that starting system test before function test is completed can be risky: the system stability might not be mature enough, for instance, to stand up to stress tests. That's not really the case: the assumption is that the two activities should find different faults.

And even if this were not true, it's good anyway to let testers “play” with the system, even only to get rid of as many bugs as possible: it will be much cheaper and faster to verify a less faulty system.

However, the whole model is really effective (and doesn't risk becoming a “nightmare” itself) only if it's run on each of the very small system changes, and quality gates are established before entering the different test phases in parallel.

Characterization of embedded software is one of the problems a designer has to face at least once in a lifetime. Typical critical factors are millions of instructions per second (MIPS) consumption and memory usage (program memory and data memory). A “heavy” algorithm can load the processor and affect a processing delay, with the risk that real-time constraints are not fulfilled.

To save money and reuse the available hardware for introducing new features or simply to increase the number of channels per processor, the developer has to face with code optimization and sometimes that means writing or rewriting those parts of the software that are the most time consuming.

The performance gained optimizing the code is difficult to predict, especially if the volume of software to be optimized is large and the amount of assembly code to write is critical.

In this scenario, it's important to monitor the optimization activity and take the right corrective action at the right time in order to avoid any impacts on the project deadline.

Profiling your code is another feature available to almost every real-time debugger.

The small-steps approach described here is well suited for this kind of activity, where the metric can be measured at each step.

View the full-size image

The problem of tracking progress
Estimating the final optimization grade reached at the end of the project can be done if you have access to historical data for assembly conversion and by studying algorithm-level optimization techniques. It's necessary to have access to a previous project's information with a similar software application. For instance, optimizing estimations for the AMR-WB speech codec algorithm could be based upon the result obtained from the same activity on the predecessor AMR-NB algorithm.

View the full-size image

The final estimation has an unavoidable uncertainty depending of the complexity of the application. For this reason the estimate is given with a minimum and a maximum value.

But even if the estimation of the total load reduction can be considered reliable, the problem of verifying the performance of the optimization activity during the project is still an issue. After the first optimizations, the measurements can give us some figures. How can we use this information to judge if we are on the right track? There is no way to say that the activity is on track (concerning the nonfunctional requirement of target MIPS load) because there is no estimation of the ideal path between the start and the final target values.

A solution of the tracking problem exists. Of course, a linear law of the load reduction is a wrong model and cannot be taken into account because the optimization degree depends on the kind of program flow in each function.

The correct model can be obtained solving a linear programming problem. Let's assume:

R = mean reduction factor (this is the estimated load reduction per channel)

Y = clock cycles of the not optimized channel code after N frames

y i = clock cycles of the i -th function accumulated after N frames (descents excluded)

r i = reduction factor of the i -th function (descents excluded)

So the following constraint is valid:

Σ (1 – r i ) y i ≤ (1 – R )Y


Σ r i y i RY

The linear programming problem is obtained by minimizing the effort expressed in terms of optimization grade under the constraints above and assuming that each function cannot be optimized beyond a reasonable limit.

min f(r) = Σ r i

Σ r i y i RY

0 ≥ r i ≤ limit ∀i ∈ set of functions to be optimized

View the full-size image

Having the predicted curve of load reduction, it's now possible to monitor if the activity is on track (in terms of non functional requirement fulfillment), but this requires a certain amount of test at least in simulation environment in order to profile the optimized functions as soon as they are ready to be integrated.

Iterative approach

The predicted curve of load reduction is measured at each step of the iteration. Deviations are allowed and the right actions are taken accordingly. As soon as the profiler returns the load reduction of a released function, the estimated curve for the remaining steps is modified taking into account the measures. The model is then updated with the following constraints:

r i = r i * ∀i ∈ set of already optimized functions

0 ≤ r i limit ” ∀i ∈ set of functions to be optimized

View the full-size image

With this approach, a feedback can be received and evaluated upfront. Possible corrective actions are:

• Put more effort in optimizing the remaining functions;

• Analyze the functions performing worse than predicted and rework them to be more aligned with the prediction.

Practical consideration
Solving the linear programming problem requires to use the simplex algorithm. You do not need any special tool for it; the lp-solver is provided by the common Excel spreadsheet with the solver add-in.

Sometimes designers prefers to follow their experience or their rule of thumb, which no doubt could be successful, but a more structured approach exploiting ready to use techniques, such as linear programming, leads to a more controlled development.

In this section, we have seen the case of MIPS load optimization, but similar considerations are valid also for memory optimization and, being a general approach, it could be applied to other kind of problems.

As explained early, one of the key success factors of the method proposed in this paper is a continuous integration of the step-wise delivered software to ensure an always-working system, whereas verification activities at different levels are run in parallel, to get an earliest and efficient quality feedback from different perspectives: coding, interfaces, functionalities, robustness, and performances.

Abbreviations and definitions
The following chapters use some concepts which are defined below.

  • Basic Unit Test (BUT) BUT is test phase to verify that the logic of the software under test is correct and that the implementation on unit level fulfils the requirements in a simulated environment. 100% of the new/modified code must be basic tested and also the legacy part, theoretically untouched by new design, is usually tested to a certain extent.
  • Basic Joint Test (BJT) BJT is performed to test the ESW running on the real HW to verify interfaces between the two parts. For instance protocols' behaviour is tested in this phase using standard test suites for conformance testing to standards. A semi-target environment is used: this means that, for instance, if any application SW is included in the system, it is normally simulated at this stage. Inter-working between application and ESW is also tested in this phase, not only for positive cases but also with wrong or unexpected inputs.
  • Function Test (FT) FT aims at verifying the whole embedded system from a functional point of view. All new as well as legacy functions are verified against a certain specification in a complete real environment, integrating all SW and HW components.
  • System Test (ST) ST scope is to verify system aspects like capacity, robustness under load conditions, resilience and capability to recover from faulty situations (both HW and SW disturbances). As for FT, a complete target environment is used and realistic test scenarios are developed to ensure full compliance to the characteristics requirements.

The test strategy: Integration Engine
The purpose of the activity described in this paper is to find tricky and possibly blocking faults as early as possible in order to ward off the risk of a big bang with an unpredictable stabilization time and reduce troubleshooting time so that committed deliveries to customer are not jeopardized by late fault findings. Running early robustness and capacity tests, the operational level of the system is verified and key system performances (i.e. processor load and memory occupation) are ensured before functional verification starts.

View the full-size image

The picture above sketches the used test strategy implementing the methodology described in ch. 3. The engines give the idea of the SW factory as we meant it: an effective development machine, continuously developing and integrating small chunks of code, aiming at finding faults as early as possible, while keeping system performances under control and using human and machine resources efficiently. A description of the different boxes follows:

  • Design engine
    The different deltas in which the ESW application is decomposed are coded as driven by the Project Anatomy. In this context the identified deltas are set of C-coded functions (C-reference code) to be optimized in order to improve system performances. After the needed optimization activity is completed, the original C-coded functions in the application are replaced with the ones designed in the given development step and a new load module is built to be basic unit tested (BUT). A list of selected inputs is used to verify bit exactness towards some reference patterns. The delta developed in that step is tested both standalone and integrated with the previous deltas and then released to a BJT team. Code profiling in a simulated environment is performed to measure key characteristics parameters.
  • BJT engine
    An automatic regression test suite is identified and executed with significant debugging capabilities. Only a selected subset of Test Cases (TCs) might be suitable to be run in a given iteration, depending on the delivered content as defined in the Project Anatomy. Measurements of key characteristics parameters are performed and reported to the design team to complement and confirm code profiling in a semi-target environment. Differently from what the picture above could convey BJT is basically run in parallel with FT/ST. However a very quick and smart smoke test is run in order to ensure basic functionalities of the system, before the code is handed over to FT/ST engine: that's why a sequential order is shown in the figure.
  • FT/ST engine
    A defined and automated set of both functional and stability tests is performed in a real environment to verify full compliance to system requirements. FT and ST are run in parallel. Measurements of key characteristics parameters are performed and reported to designers to complement and confirm code profiling in the complete target environment.

Note that whenever a fault is found, in any test phase it is found, it is immediately reported to the design team to feed the design engine again and be fixed very quickly not to harm testing activities on the following deltas. The implemented troubleshooting strategy to speed up fault fixing is described in the next chapter.

The proposed fault localization strategy aims at minimizing the time consuming troubleshooting activities in terms of man hours, trying to use machine hours as far as possible.

The picture below sketches the architecture of the ESW system where the method proposed in this paper was applied.

View the full-size image

It consists in several DSP cores sharing a common program memory and a common data memory (basically holding table structures) and communicating with external users via a micro-controller including O&M and HW driver functionalities.

Trying to exploit the multi-core architecture shown above, the following steps are suggested once a given load module is found faulty:

  1. The fault is found and reported to the design support
  2. Different application load modules are built, by splitting the concerned number of optimized functions under test and replacing the remaining ones back with C-reference code
  3. Each load module is loaded on a different core and tested again
  4. Once one of these load modules is highlighted to be faulty, the process is repeated starting from step 1, until a single coded function (or a very small bunch of functions) is pointed out
  5. The single function is analyzed to identify and fix the trouble by using BUT and/or BJT environment
  6. TThe corrected load module is regression tested before being delivered again

Should the troubleshooting time result to be too time consuming, the concerned load module shall be “quarantined” and a new set of functions shall be tested instead, in order not to be a showstopper for the test of the remaining system changes.

A simple and smart naming convention was used to label the different delivered application load modules and thus providing an efficient configuration management of the different deltas. The picture below tries to summarize and better clarify the steps in the strategy by means of an example with 3 cores. The used naming convention is also shown.

View the full-size image

In the given example, the first delta is delivered and loaded on all cores: let's assume it consists in nine optimized functions. A fault is found during test (step 1) and then the set of optimized functions included in Delta#1 is split in three subsets, to create three different versions where each contains only one optimized subset of three functions, while the other two subsets are replaced with the original C-reference functions (step 2). Each of these three different versions is loaded on a different DSP core to be regression tested again (step 3). Let's assume now that a fault is found on the version loaded on Core#2 (step 4 – step 1), while the other two versions pass regression test successfully. The subset of optimized functions contained in version 12001 is split in three again, to create three new versions, each containing only one optimized function, while the other two are replaced with the original C-reference functions (step 2). Each of these new three versions is loaded to be regression tested once again (step 3). Let's assume then, that a fault on Core#3 is now found, pointing precisely to a specific function being faulty, which can be much more easily troubleshot and fixed (step 5). A new version of Delta#1 is then built including the fixed bug and finally regression tested (step 6). The successful load module is then used as a base for the upcoming delta. The example should help appreciating the level of automation in the strategy: in fact the procedure is repeated recursively and only step 5 above is expected to demand the scarce and precious resource of key-competence man hours, while the rest requires only machine hours and/or low profile activities.

The shown example can be easily generalized (as indicated in the step-wise strategy above) and applied to different scenarios with, for instance, more functions delivered per delta, more DSP cores, more cycles needed to get the bug found and fixed, etc.

The team working on the embedded system of this experience was composed of around 8 engineers; they developed around 40000 lines of assembly code in roughly 4 months. The experience of this case study accounts for a small part of the entire project lead time.

The target processor load goal at the end of the project was 90%. This required extreme capability to predict, monitor and control the software process. In addition to this, the software is part of a critical telecom system with extreme reliability and dependability requirements. The delta development cycles were defined to be of 3 weeks.

View the full-size image

The results of this approach have been outstanding:

  • All deliveries on time
  • Tricky faults found very early and solved without pressure For instance it took us almost two months to solve a “nasty” assembly language TR!
  • Quality deliveries No fault was found during customer acceptance test, neither the critical first 6 months in operations

The table below shows a comparison between a project (project [a]) that has used the approach described in this paper and a similar project (project [b]) that has been run in a traditional way. Both were using the same multi-core DSP hardware platform; project [a] developed, amongst other functions the GSM codec of AMR-WB (Adaptive Multi-Rate WideBand):

View the full-size image

What conclusions can be drawn from this experience? The case study shows that the results derived from using this practice have been very good: resources were used in a very efficient way, the system was stabilized from the very beginning without ever letting it drift out of control, nasty faults could be solved without the hottest pressure derived from a delivery to the customer, the software delivered to the customer was of the required quality level and the project was completed on time and on budget.

If so, why are projects not always run in this way? The reason is probably that the approach in this paper requires an initial investment that Companies may not be willing to undertake even though our experience shows that in the end it will pay-off.

You will need to invest considerable effort in preparing detailed an

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.