Shrink 8051 code with better data choices
Some people think that programs for microcontrollers like the 8051 that have limited program memory must be written in assembly language rather than a high-level language like C. The executable code of a C program is indeed usually larger than the code in assembly language, but it is possible to reduce this difference by constructing the C program properly. Two effective methods are choosing the right data type and appropriately positioning that data in the memory space.
Microcontrollers based on the 8051 architecture are still popular and are offered in many models. Recently, for instance, Silicon Labs released the energy-friendly EMF8 microcontroller for IoT devices based on the 8051. Other 8051-type microcontrollers are available from manufacturers such as Atmel, FTDI, and Maxim.
The internal Flash program memory in such devices is typically not externally extensible and is often smaller than the address space. For example, EMF8 microcontrollers contain from 2 to 64 kB of program memory, depending on the model. A common opinion about programs for microcontrollers with small memory is that they should not be written in a high-level language because the executable code will rapidly will fill up the limited memory.
The C language is the most popular high-level programming language for microcontrollers, but the 8051 microcontroller is typically regarded as unfriendly to C compilers. One reason is the 8051's addressing modes. Program data for the 8051 microcontroller can be in different memory blocks, whose address ranges overlap. Various subsets of assembly instructions and addressing modes are available to perform operations on data, but indirect addressing modes such as indexed and base with displacement -- that in some cases would facilitate access to elements of arrays and structures -- are missing. Because C library functions with arguments that are pointers must process data from any memory block, and different memory blocks can require different sequences, the absence of such modes means the library functions' executable code on the 8051 is relatively large.
But while a C compiler can optimize the executable code it generates for speed or size, a programmer can also affect code size by applying, among other things, proper constructs of the C language. The following are a few tips for reducing the size of executable code of C programs. The examples used the Keil v6.20 compiler set to optimize for minimal code size. When a range of code sizes is given, unless otherwise stated, the extreme values were achieved with data placed in a register (or registers) or in external RAM.
Use the correct memory space
Typical 8051 microcontrollers have up to 256 bytes of internal RAM, and generally it is possible to attach up to 64 kB of external RAM (in some cases even more) to them. There is also internal Flash program memory as mentioned above. Data may be located in either memory type. Fixed constant data such as conversion tables, for example, can be stored in and used from Flash memory without needing to be copied to RAM.
Compiler creators have divided the 8051's entire memory space into segments, sometimes overlapping, defined by what addressing modes can be used to perform operations on data placed there. There are such segments in Keil C, for instance, although without segments in extended memory. The segments are:
- data – bottom half of internal RAM, 00h-7Fh, direct and register indirect addressing
- idata – entire internal RAM, 00h-FFh, register indirect addressing
- bdata – fragment of the bottom half of internal RAM, 20h-2Fh, bit addressable, direct addressing
- pdata – 256-byte page of external RAM, xx00h-xxFFh, where xx = 00h-FFh, provided by Port 2, register indirect addressing
- xdata – external RAM, 0000h-FFFFh, register indirect addressing
- code – program memory, 0000h-FFFFh, base-indexed addressing.
Locations 00h through 1Fh of internal RAM are reserved for four register banks, each bank containing 8 registers: R0-R7. The register addressing mode is used to refer to data in those registers.
Register indirect addressing of data in data, idata, and pdata segments uses the address contained in an R0 or R1 register. For the xdata segment the data address is contained in the data pointer (DPTR) register.
In the base-indexed addressing mode the data's address is the sum of the accumulator's contents and those of the DPTR register or of the program counter.
A segment's name is also a memory type specifier. By using the segment names in the declaration of a variable the programmer may override the default memory type the memory model imposes. There are small, compact, and large memory models, with all variables residing in data, pdata, and xdata segments, respectively.
Table 1. Assembly code of C instruction var++, when var is unsigned char variable
As shown in Table 1, the smallest code size comes from performing operations on data in registers. The assembly instructions using register and indirect register addressing modes are mostly 1-byte long. However, the programmer cannot declare register variables (the Keil compiler ignores the C language keyword register). A function's automatic local variables (parameters) can be put in registers. By default, up to three function parameters are passed via working registers, and they can be treated as initialized local variables.
The table also suggests that intensively-used global variables and a function's static local variables should be stored in internal RAM, provided that their size will allow it. Bear in mind, however, that a stack must also still fit in this memory.
Copies of constants in RAM are stored in program memory but they are moved to RAM in the C startup code. It is worthwhile to store arrays of constants in program memory, which is usually larger than data memory.
Continue reading on Embedded's sister site, EDN: "Data choices help compress 8051 code."