Dealing with misbehaving tools -

Dealing with misbehaving tools

Making software changes very late in a project is almost never a good thing. Although correction of the error might be crucial for the correct and safe usage of the end product, it has a number of unwanted side effects:

The process view: Making changes to the code and rebuilding the application image will force a restart of one or more test and quality assurance (QA) activities in the project. Minimizing test and QA without compromising safety and integrity in the face of code changes can thus become critical for time to market.

The later in the process a problem is encountered the further back in the process you have to go to revisit certain activities. In the worst case, a full external revalidation or recertification assessment may be required.

The code view: Changing code to correct erratic behavior always carries the risk of introducing new unwanted behavior. This sometimes leads to the decision to leave a problem as is in the product and document clearly the impact of the code behavior. High-integrity regulatory frameworks often make things even tougher by requiring extensive impact analysis of the changes prior to performing them.

The goodwill view: Frequent or large code changes in the final stages of a project can make stakeholders nervous about the end product. If the product has already reached the market, the situation can be a real nightmare.

So, it's not always possible to avoid code changes, but methods and tools to avoid or minimize the impact of changes can be extremely helpful.

Three broad categories cause most of the late code changes:

The source-code bug: This kind of error is either due to mistakes or misunderstandings by the programmer in the implementation or an ambiguous or incomplete functional specification that leaves too much open to interpretation. Although this is a common occurrence, I won't be discussing it in this article.

The latent non-ANSI C/C++ source bug: The ANSI C standard has some dark corners where behavior is either implementation defined or undefined. If you've implemented parts of the source code to depend on how a particular compiler behaves for these corner cases of the standard, you can expect a problem in the future. However, as this kind of latent bug is mainly a process and knowledge issue, it will not be discussed further here.

• This leaves us with the main focus for this article, the object-code-generation-tool bug . We will restrict the discussion to bugs in the build chain, in other words, the compiler, assembler, and linker.

Consider the following situation. A bug you have found is the result of the compiler making wrong assumptions about register allocation and stack allocation of variables that are local to a function. The bug is exposed when many variables compete for the available CPU registers and some variables have to be temporarily moved to the stack. You have found the bug in a large function with a lot of arithmetic computations, but that is no guarantee that the bug will only manifest itself in large functions with a lot of computations.

So we end up with the question of whether to persuade the compiler vendor to supply a fix or applying the workaround(s) throughout the code base with all the implications for the project outlined above.

For high integrity projects the build chain and the vendor should be subject to a lot of scrutiny before selection. And the typical scenario is that once a particular compiler and version is selected, you stay with it throughout the project. Some high-integrity process frameworks even require the tools selection to be subject to a formalized process where the tools are prequalified or validated according to certain criteria.

We will now take a look at a special technique that can be used if you have a close relationship with your compiler vendor. Consider a compiler for a 32-bit architecture. Many 32-bit CPU kernels incorporate some kind of instruction pipeline to increase performance by dividing complex instructions into 1-cycle pieces that are executed in their own pipeline stage.

In this way a throughput of one instruction per clock cycle can be achieved under ideal circumstances. It is however very easy to break this if subsequent instructions are competing for the same resource. An example is if the first instruction writes to a particular register and the directly following instruction reads from the same register. On many pipelined architectures, this will cause a so called pipeline stall that means that one-cycle processing is interrupted while the second instruction waits for the first instruction to finish writing to the register.

A good compiler for such a CPU architecture will try to rearrange or schedule instructions so as to maximize the distance between instructions that use the same CPU resource in a pipeline blocking way.

To do such rearranging, the compiler must build up one or more dependency graphs for the block of instructions it's about to schedule to determine if it is safe to move an instruction backward or forward in the instruction stream. The compiler uses a set of functions to determine if two instructions are independent which means that they do not use resources in a conflicting way and thus implicates that their order can be exchanged.

Let's take a look at a function that the compiler might use to determine if two MOV instructions are independent, shown in Listing 1 .

Click on image to enlarge.

This function looks innocent enough. It basically shifts the question of independence to a helper function that determines if the source and destination of the MOV instructions are used independently.

It's perfectly OK for the compiler to leave the instructions together in the same order. To get the puzzle together with maximum performance as a result, it might for example sometimes just create a new pipeline stall by moving two instructions to avoid another stall.

Let's return to the function in Listing 1 and the compiler that uses this function. When a customer compiles a certain program with this compiler, it works flawlessly, except that he notes that two memory writes are done in the wrong order as opposed to how they are specified in the program. Usually this is the principle that the scheduler is dependent on to perform its magic, but in this case it's not OK because the user has specified that both variables that are affected by the MOV instructions are declared as volatile , which has implies that the order of the writes should not change. If, for example, the memory writes are intended to initialize some external hardware, this can be extremely important.

The AreIndependent() function ignores the volatile attribute of both instructions and thus reports that it's OK to rearrange these instructions.

As noted, the scheduler can of course choose to leave two independent instructions in place. For this customer, it's easy to see that he has at least one location that is affected by this bug, but does he have more affected locations? Finding that out effectively amounts to going through the complete code base looking for accesses to volatile variables and examining the generated code; so we're back to the central theme of this article-how can the customer's change management be simplified?

Here is one possible remedy to the situation: a special version of the original compiler can try to identify all code in the user's code base that is actually affected by the bug.

Here is how the compiler can be turned into a bug detector. The function in Listing 1 is used in another function (Listing 2 ) that changes the order of two instructions when that function has decided that it is beneficial to do so.

Click on image to enlarge.

This function can be changed to detect the bug case, in other words, when the ChangeMOVOrder() function uses the wrong information to make a decision. The added code in red in Listing 3 looks for the offending situation and when such a situation arise it reports the affected source locations. Note how the code in red would also cure the bug because it classifies all MOV instructions with the volatile attribute as dependent. But it is crucial to understand that we could not have placed the detector code in the buggy function! If we would have done so, it would report every occurrence of possibly erroneous MOV instructions.

Click on image to enlarge.

Even this simplified example showed us one of the pitfalls in creating a production quality bug detector. It can be simple to isolate the root cause of the bug but complicated to determine when this bug will actually result in the generation of wrong code. We could for example have a number of different functions of varying complexity that depend on the AreIndependent() function.

But if it's practically possible to create the bug detector, it can now be used to pinpoint the exact locations of any other code that is affected by the original bug. In this way, we can avoid going through all object code by hand to look for possible occurrences of the problem.

Anders Holmberg is software tools product manager at IAR Systems.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.