Pointers and addresses - a programming minefield

April 26, 2016

Colin Walls-April 26, 2016

Pointers are a very powerful feature of the C language. In a programming language, power is dangerous, as programmer error can have dire consequences. As a result, many developers favor languages, like Java, which do not support pointers and are, hence, “safer”. However, the power of pointers is valued by embedded developers, who accept that they need to understand their subtle nuances. This article looks at the potential problems with pointers and proposes some guidelines to avoiding problems.

What is a pointer?
To an assembly language programmer, memory is a sequence of locations (bytes or words), each one of which has an address. There is not really the concept of a variable. Manipulating addresses is an everyday occurrence. It is the responsibility of the programmer to keep track of what type of data is stored in each memory location. The data might be a number or some text (which is just a sequence of numbers, of course) or it might be an address of another location or possible an address of an address and so forth. There are also some high level languages - untyped languages - that operate in the same way; Forth and BCPL are examples that come to mind.

The majority of high level languages support data typing to a lesser or greater extent. This means, in effect, that the programmer specifies that a variable contains a specific type of data and the language only allows appropriate operations on that variable. A pointer to a variable incorporates its address, but also embodies “knowledge” of the type of the variable.

Pointers and integers
For most, but not all, modern CPUs, an address is the same bit size as a word of memory; i.e. most 32-bit CPUs have 32-bit address space as well as favoring operations on 32-bit data. In this context, most, but again not quite all, CPUs allow addresses to be stored in memory locations and registers and be operated on like any other data.

It is generally possible to store the value of a pointer (i.e. an address) in an "ordinary" variable - like an unsigned integer. An example of where this might be done in an embedded application is in device driver code. Here is an example:

unsigned normal;
unsigned *pointer;
pointer = &normal;
normal = (unsigned)pointer;

This would result in the variable normal containing its own address. On most CPUs this code will work as specified. Whether it is a good idea is another matter.

In broad terms, most of the time, code should:

  1. perform the required function
  2. be readable/maintainable
  3. be readily portable to a different CPU

#3 may be considered less important in certain instances – like device drivers in embedded systems.

#2 is questionable. If the programmer really needs to take control of typing, the code should be very carefully commented to make this clear.

Pointer arithmetic
Because pointers “know” about the data type to which they appertain, operations on pointers can appear confusing to the inexperienced, even though they are entirely logical.

Consider this code:

int x;
int *ptr;
ptr = &x;
ptr++;
...

If the variable x is located at address 0x80000000 and we are using a 32-bit processor (i.e. 4-byte integers), what value will ptr contain at the end of this code? The answer is 0x80000004. This makes sense, as the pointer is being advanced through memory by “1 unit”, which, in this case is an integer, which is 4 bytes.

What if you write the incrementing code like this?

ptr = &x;
ptr += 1;
...

or in this rather un-C-like fashion:

ptr = &x;
ptr = ptr + 1;
...

The answer is that the result is identical, even if intuitively you might expect the answer to 0x80000001.

But what if you really did want to increment this pointer by just one byte? One way to do it might be:

((unsigned)ptr)++;
 

or, perhaps slightly better might be:

((char *)ptr)++;
 

In either case, there should be very explicit commenting to explain this bizarre programming.

Pointers and arrays
Arrays in C are quite straightforward – they are just a named series of variables in contiguous memory locations. We can declare a 5-element array and set the third element (the indices start at 0) to a value thus:

unsigned array[5];
array[2] = 99;

The relationship between arrays and pointer is interesting. As an array is a sequence of memory locations, it would seem logical to use a pointer to access an element of an array. So, we can write:

unsigned *pointer;
pointer = &array[0];
*(pointer+2) = 99;

The pointer is set to point to the first element of the array and then pointer arithmetic is used to index to the third element. A neater way to write this code would be to exploit that fact that the array name by itself is just a (constant) pointer to the array. So we could write:

pointer = array;
 

which would yield the same result.

The pointer arithmetic syntax is rather messy, which is why the C language has the [ ] operators. These brackets are just a neater way to do the pointer manipulation. So, you could write:

pointer[2] = 99;
 

Although this is valid, it is strongly recommended that the bracket notation is only used with array names, not pointers. This is simply to preserve clarity.

Given this information, there are two ways to dynamically initialize our array:

unsigned array[5];
int i;

 for (i=0; i<5; i++) { array[i] = 99; }
or
unsigned array[5];
unsigned *pointer;
int i;

pointer = array; for (i=0; i<5; i++) { *(pointer++) = 99; }

Which is best? The former is certainly clear and easy to understand, but there are circumstances – usually with less trivial code – where the latter might make sense.

Conclusions
Pointers are a powerful and, hence, potentially dangerous feature of the C language. Many developers are unclear about their use. It is, therefore, good practice to avoid their use, except where their power is really necessary. Using a pointer just to show off your programming skills is very unwise. However, pointers do behave logically and, used well, result in efficient code, which may be rendered quite readable and maintainable with care. Under very specific circumstances, C programmers can override the type-based functionality and work in terms of CPU memory addresses. This practice should be exercised with extreme caution.

Colin Walls has over thirty years experience in the electronics industry, largely dedicated to embedded software. A frequent presenter at conferences and seminars and author of numerous technical articles and two books on embedded software, Colin is an embedded software technologist with Mentor Embedded [the Mentor Graphics Embedded Software Division], and is based in the UK. His regular blog is located at: http://blogs.mentor.com/colinwalls. He may be reached by email at colin_walls@mentor.com

 

Loading comments...

Parts Search Datasheets.com

Sponsored Blogs