Saving space with Pointer-less C

Mengjin Su

September 07, 2006

Mengjin SuSeptember 07, 2006

This unusual and creative approach to standard C programming can save space and time in your design without losing C's efficiency. A clever piece for all embedded systems programmers.

In C and C++ programming, pointers are used very frequently. Indeed, pointers and their applications are one of the most important features in C/C++ because they bring great efficiency, flexibility, and utility to the language. In some situations, it's absolutely necessary to use pointers to implement the functions required. A subroutine, for example, can easily return multiple sets of data if the caller (main routine) uses pointers as parameters.

On the down side, programmers sometimes complain that pointers are confusing and ambiguous especially in embedded systems, which use small MCUs having separate ROM and RAM spaces. Here "ambiguity" means the memory access using pointers is unclear or could have different explanations from compiler's side versus the programmers' view.

This article gives a new grammar that replaces the pointers' operations in C. The new grammar eliminates the declaration of all kinds of pointers, as well as their operations. Instead, it uses normal integer variables as the "holders" that denote the memory location to be accessed. In theory, this modification won't sacrifice the efficiency that C has. It makes the indirect memory access explicit and eases the work of designing a compiler.

The pointer and its applications in C/C++
The code example in Listing 1 shows how efficient pointers can be. Listing 1 demonstrates how a library function strcpy() can be written. This function copies an entire null-terminated character string from its source location to the destination. In strcpy(), des and src are pointers that contain the starting addresses of memory locations. It fetches and stores character strings from the source location to the destination, character by character. And the operator ++ increases the pointers each time after copy. This example illustrates how efficient and simple an operation using pointers in C can be.

Pointers can also implement some complicated data structures (such as data links or chains) and manipulate them as well. Listing 2 shows a good example.

I can't imagine how I'd implement the functions in Listing 2 without using pointers, at least not in such a small amount of code. But pointers have a down side.

View the full-size image

Confusing pointers
Programmers sometimes complain about pointers in C/C++ code because they can be confusing and hard to understand in some situations. I believe this is true, especially for inexperienced C programmers. For instance, I presented the code shown in Listing 3 to a number of interviewees to test their C/C++ programming proficiency. The question is simple: I asked the interviewees to indicate what n's value was after each statement. To my surprise, their answers were far from satisfying, even though I considered this question to be an entry-level problem.

View the full-size image

Harvard architecture
Things can get complicated in embedded systems because a lot of microprocessors and microcontrollers today use the so-called Harvard architecture. That means the code memory (the memory from which instructions are fetched) and the data memory (the memory that holds variables and volatile information) are separated, have different address spaces, and use different instructions for access. For example, the code in Listing 1 may not work if the source character string locates in the code memory (write-only) space.

Furthermore, multiple indexed addressing using pointers will make things even more complicated, especially when ROM and RAM accessing are mixed. (To see an example, go to www.embedded.com/code/2006code.htm.) In such situations, indirect accessing using pointers seems ambiguous.

Language without pointers
We face a dilemma: we'd like to keep pointers in the language for all the reasons of efficiency I've mentioned, but we'd also like to avoid the hassle and confusion of the grammar and rules that relate to pointers. In other words, we'd like to have a programming language with the same capabilities as C but with less confusing grammar and coding.

In Pointer-less C (PLC), all the grammar (and its usage) that relates to pointer notations is eliminated. Only those basic data types (char, int, and long, float type not concerned this time) are reserved. So the grammar of PLC is greatly simplified and easy to manipulate. As the first experiment that applies to Microchip PIC18Fxxxx device, only those basic data types (char, int, and long) are recognized (float type data can be added in later without breaking the PLC grammars). Actually, a pointer in C is a variable that holds an address of certain data; an address value is basically an integer number. The length of address is usually from 16 to 32 bits, depending on the CPU architecture. So any address value will be held using an integer in PLC. For example, a PIC18F-series device has only 4KB RAM space, so 16-bit integer is good enough to hold any address value. If we place all of our constant data in the first 64KB of the PIC's ROM space, 16 bits is also enough to hold the address of any data stored in the ROM; otherwise a 24- or 32-bit integer is needed.

In PLC, we define or declare integers to hold addresses for any data, such as:

unsigned int  ram_addr1, 
   ram_addr2; // hold 16-bit 
   addresses of RAM
unsigned long rom_addr1, 
   rom_addr2; // hold 32-bit 
   addresses of ROM
We use the type specifier unsigned because address values don't need a sign.

Also in PLC an explicit notation, so-called indexing, is introduced to access the data by their addresses shown in Table 1.

The following are examples of the indexing in PLC:

char a;
int  b;
long c;

. . .
a = (1000, char);
b = (a + 1000, int);
c = (0xE000, rom long);
a = ((b, int), rom char);
(1000, char) = 10;
(2000, unsigned int) = 0xABCD;

The string copy routine in Listing 1 can be written in PLC as shown in Listing 4.

View the full-size image

Note that all pre-increment/decrement or post-increment/decrement operations are based on integer arithmetic rules. That means increments and decrements only change the value by 1.

Now we can see that PLC only eliminates those grammars or rules that relate to pointers. It actually works the same way as in standard C when doing direct memory access. So it will produce the same, or very similar, code efficiency compared with standard C.

Access struct and union member using indexing
Similarly, we can extend the syntax a little bit to access the members in struct and union data wrappers:

struct my_data 
{
    int  a, b, c;
    char A, B, C;
};

struct my_data data;

...

int addr = &data;

(addr, struct my_data).a = 
   1000;
(addr, struct my_data).A = 'x';

Function pointers in C/C++
Function pointers and calling a function using its pointer are tricky topics in C/C++. Frankly, it always confused me. I always needed the textbook close at hand when I was trying to use the feature. The grammar rules are even more confusing and hard to digest. Can we make it simpler in PLC? The following grammar rule is used in PLC now:

(return_type : 
  function_address_expression : 
  parameter_list)

Here is an example:

int a;

a = (int : 0x2000 : int 10, 
  char 20);

It calls a function that locates at address 0x2000, with two parameters (int type and char type, respectively). The return value type is int. Obviously, function_address_expression here can be any integer type expression, also.

Going pointerless
Designing and developing a computer language, as well as its compiler, is not an easy thing. Even with the proper tools (such as GNU C/C++ tool set), it requires tremendous time and effort to accomplish the job. As new microprocessors and microcontrollers show up in market, creating these new compilers becomes a problem. It makes sense to make new compiler development easier. Programmers generally like a computer language that's easy to pick up and learn, with good performance, good readability, high code density, fast execution, and so on. In the world of embedded systems programming, C is considered a good choice. But as I mentioned earlier, C has some "ambiguous" definitions when pointers are involved. The idea of pointer-less C is to address both those issues.

In theory, PLC should have all the features of C except one: pre- and post- incrementing or decrementing of variables in C is done by adding or subtracting one "unit" from the variable. For pointers, this unit value varies based on the variable's type (1 for 8-bit quantities, 2 for 16-bit quantities, and so forth). That means the operation is adjusted automatically by the pointer itself, which offers some small convenience and improves efficiency in programming. Maybe we could to add some similar grammar to make PLC better in the future.

Mengjin Su works for Avago Technologies. You can reach him at mengjins@hotmail.com.

Additional resources:
The grammar used in a PLC compiler (PLCC18) for Microchip's PIC18F-series MCUs is available online at www.embedded.com/code/2006code.htm. A free version of PLCC18 that runs under DOS terminal environment is available at website: www.geocities.com/mengjinsu.

Reader Response


Mr. Su's article is fundamentally flawed in a number of important areas.

First, the C syntax for pointers isn't ambiguous as Mr. Su suggests. The language rules are clear, and anyone who takes the time to understand them will find the pointer syntax easy to understand and use. (Granted, sometimes operator precedence creates hard-to-interpret code, but those problems are usually easily corrected with parentheses. Mr. Su does not suggest that operator precedence is behind his complaints with C pointers).

Second, Mr. Su's reasoning behind how to turn pointers into integers ("Language without pointers") is inadequate and *highly* platform-specific. Many of the 8051 compilers I have worked with over the years add additional bits to pointer objects, to track whether the referenced is in ROM vs. RAM, which memory bank, etc. In such systems, the compiler generates code to silently consults those extra bits at runtime to figure out which instructions to use to dereference the pointer. Unless Mr. Su takes those additional bits into account (which require space beyond the "16 bits" needed), his technique will irreparably destroy the pointer object.

Finally, Mr. Su's PLC syntax is, with all due respect, abhorrent. It practically encourages the developer to type "char" once, and "int" later for the same pointer-like object. Or at least it helps the developer miss one or two such conversions when a data type changes.

I think it's great that Mr. Su has taken the time to consider what he perceives to be a problem with C, and to also propose a solution. Such critical thinking is what has brought us C, C++, and Java, and will bring countless more improvements in languages and techniques as time marches on. But PLC doesn't get my vote. It solves a problem that isn't there, and does so in a dangerous way.

Better to get a solid grounding in C pointers, switch to a pointer-less language like assembly code or Java, or get a better C compiler. Please don't spend any more time on PLC.

- Bill Gatliff
Freelance Embedded Developer/Consultant
Peoria, IL


Mengjin Su responds:
Regarding point [1], it is true that C syntax or grammars for pointers are not ambiguous since the C compiler is based on the syntax and works perfectly on all desktop machines. A non-ambiguous syntax doesn't guarantee that a program based on the syntax is non-ambiguous, especially for the embedded applications. "Subject - verb - object" is the rule or syntax for English. But we can generate the following two sentences based on the rule:

(1) I eat an apple.
(2) An apple eats me.

[2] Indeed, my article on Pointer-less C focuses at embedded applications because a lot of embedded CPUs or MCUs are in Harvard architecture which has separate ROM and RAM spaces. In such situations, the "normal" pointer operation might not work. As you described, many of those C51 C compilers added extra key words or extended the syntax to access different memory spaces, which makes the compiler more complicated. I have reason to believe that Pointer-less C can be ported to C51 environment and will work very well.

[3] Again, my article covered two aspects: (1) eliminate the ambiguous of using pointer, and (2) make the compiler easy to be made.


Did you guys actually read the cover article on Pointerless C before printing it? A whole article about a subject that can be handled with a few C macros? I'm not even going to get into the fact that the different sizes of chars, ints and longs is not addressed in the notation.....

Apologies if I've missed the big picture on this, but I don't get it.


#include <stdio.h>
#define P(a,b) (*((b *)(a)))
// allow for different sized objects
#define P1(a,i,b) (* (((b *)(a))+i))
int buffer[10];
void main(void)
{
    int i;
    int *pi;
    int addressi;
    char ci;
    char *pci;
    int addressci;
 
    pi = buffer;
    addressi = (int)pi;
 
    printf("%p %p %p\n",pi,buffer,addressi);
 
    P(pi,int) = 1234;
    i = P(pi,int);
    printf("%d %d %d\n",i,P(pi,int),buffer[0]);
 

    i = P(addressi,int);
    printf("%d %d %d\n",i,P(addressi,int),buffer[0]);
 
    P1(pi,1,int) = 5678;
    i = P1(pi,1,int);
    printf("%d %d %d\n",i,P1(addressi,1,int),buffer[1]);
}

--Charles Nowell
Longwood Florida


I read the latest print issue, and particularily looked forward to the cover feature "saving space with Pointer-less C". Unfortunately when I finished I didn't find benefits or even a clear approach from the proposed, albeit interesting, method of Mengjin Su.

There is clearly a problem with using pointers in programming, as anyone will admit when attempting linked-list and binary sort exercises in their introductory programming courses. The root cause of the problem is not the very powerful and useful concept of pointers to data, because Mengjin Su describes simply a different implementation of them. The root cause is the usage, the so-called "pilot-errors". But you can choose a simpler airplane.

There are very few, rare instances where direct usage of pointer operations are more understandable than operations with arrays and structures, which fundamentally require pointer technology but are more intuitive than "**Dptr++". As with many programming concepts, especially in embedded systems, just because you can implement something very complex in a single statement of "C" or any other language, doesn't make it good practice.

Processes, tools and training are the best vehicles for saving programming projects and engineers' sanity. We are approaching a time with embedded systems and controllers where hand-optimizing an implementation will be eclipsed by tools and methods which build highly reliable systems quickly and easily out of trusted components.

When a particular approach or tool used to solve a problem inherently leads to errors, it is time to "re-engineer" your processes and use different and better tools and approaches.

"Pointer-less C" is very possible in all compilers today, using arrays, unions, and structures, a new compiler is not required.

--Jon Pearson
Product Manager
Cypress Semiconductor Corporation
Lynnwood, WA


Dan Saks chimes in: When I saw the article "Saving space with Pointer-Less C," I was immediately wary of where it was going. Love it or hate it, C's treatmentof pointers and arrays is truly unique—it is arguably what makes C . . . C. C's pointer notation also provides the conceptual model for iterators in the C++ Standard Template Library. The generalization of pointers into iterators allows an extraordinarily flexible and efficient programming style.

Reading the article only confirmed my suspicions. The article asserts that programmers can be confused by the meaning of pointers in systems that support ROM as well as RAM, but it doesn't explain why programmers should care whether a particular pointer points to RAM or ROM. As Bill Gatliff notes, when the distinction matters, compilers can (and should) take care of it. The article's proposed solution fails to consider any existing compiler-specific extensions that might address the problem, and completely ignores any concerns for "const" or "volatile" semantics.

As Charles Nowell observed, the proposed syntax is nothing but a combination of casting and dereferencing. In effect, PLC treats integers as untyped pointers that must be cast to the appropriate pointer type at every use. Programming with untyped pointers has repeatedly proven itself to be highly error-prone. To add to the excitement, PLC eliminates scaling on pointer arithmetic (it always adds one no matter what the integer "points" to).

I expected the article to conclude by stating at least some observed benefit of using PLC instead of C. For example, the article explained that interviewees struggled with the test question in Listing 3. Where are the comparable PLC test and results? How about data showing an increase in programmer productivity, or a decrease in software defects, or any other benefit from using PLC instead of C? They're not there. And, despite the title of the article, there's no explanation of how PLC saves space compared to C.

PLC's notation is an ill conceived, platform-specific solution to a vaguely specified problem. The dialect's disregard for type checking represents a giant step backward. In my opinion, Embedded Systems Design should not have published this article as is.

--Dan Saks
Saks & Associates
Springfield, OH


Editor's note: Our editorial staff is responsible for the final article title and description (deck), not the author.


First, I agree with other respondents that macros or other non-pointer features of C should have at least been discussed in the article as an alternate solution to using a totally new language.

Second, I find it confusing that Mr Su chose PLC for the name of his new language, as PL/C was the also the name of one of the spinoffs of the PL/1 language. (It wasn't the embedded spinoff, though; that was Intel's PL/M.)

--Stefan Daystrom
Los Angeles, CA


Loading comments...

Most Commented

  • Currently no items

Parts Search Datasheets.com

KNOWLEDGE CENTER