Rolling your own

Real-time operating systems and kernels provide all the functions you need for a multitasking embedded system . . . and then some. It may make more sense than you think to embed do-it-yourself operating system functions into your application.

The following article first appeared in February 1989 issue of Embedded Systems Programming magazine.

If you read the multitude of articles on the subject, you might think the only way to design a PROM-based system is to use a commercial embedded operating system or real-time kernel like PDOS, VRTX, or OS/9. But there is an alternative, one that's quietly being used by a significant number of designers, and that is simply to write fully stand-alone software. Let's compare the two approaches based on two criteria–ease of use and versatility.

In comparing ease of use, we're faced with two options; unfortunately, each requires us to learn an interface. If we choose to write stand-alone software, we must learn the workings of various hardware components, among them serial, timer, and interrupt chips. These complicated chips often require extensive study before they can be accessed in a standalone environment. Buying a real-time kernel also challenges the designer to learn a fairly complicated interface and, in many cases, still requires the applications programmer to learn the details of the hardware components being accessed.

Besides the hardware issues, we must consider some purely software structures. The real-time kernel provides multitasking, a facility that would have to be created from scratch in a stand-alone environment. Here the real-time kernel seems to have the edge in ease of use, though there are (as we shall see) competitive alternatives in this area as well.

In a comparison of versatility, obviously nothing can beat the stand-alone environment. Real-time kernels are versatile enough for a wide range of applications, so the question is whether the versatility provided by the real-time kernel is sufficiently close to the absolute versatility of the stand-alone environment for your particular application. The commercially available real-time kernels are the result of many hours of development; an attempt to duplicate the functionality of one of these little gems would require a larger programming effort than would be justified by any single project. But programmers who write stand-alone software don't duplicate that functionality, because they simply don't need all of it.

Power-up initialization
A real-time kernel or operating system normally takes care of power-up initialization. For the 68000 family, this involves setting up RAM vectors, a stack pointer, and the interrupt level. For the 8086 family, segment registers must also be initialized. A sample initialization routine is provided with most compilers designed for stand-alone use, so this feature isn't hard to provide. Beyond processor initialization, major hardware components such as serial ports, timer control delays, and interrupt chips must be initialized.

The first and most basic function of a real-time kernel is to provide access to the hardware. A typical CPU beard in a VME-bus or STD-bus system has a serial port that may be connected to an operator's console. If the application software is being written in C, you may want to have the getchar() and putchar() functions defined for you. This is all you need for your first “Hello, world” program.

While it's true that using the real-time kernel for this access may be easier than writing your own driver, it's also true that the initialization, input, and output drivers for a serial chip often take fewer than 50 lines of assembly code. In many cases, that code is supplied by the board manufacturer. If your application calls for an interrupt-driven serial input and output facility, the simple interface offered by the real-time kernel will rarely suffice.

In a project I was involved in recently, we connected a number of single-board computers by their serial ports to a multidrop serial bus. A proprietary addressing protocol was used to target listeners on the bus. By writing our own driver, we were able to implement the addressing state machine within the serial input interrupt service routine. Characters following the addressing sequence were buffered by the addressed board and discarded by all the other boards. Doing all this below the level of getchar() meant that the application code could be written independently of multidrop communication. It would have been difficult to implement this interrupt-driven selective response using the typical canned serial input driver.

Another piece of hardware that's commonly accessed is the timer. Most real-time applications require some form of timekeeping. The real-time kernels provide this function to the application in a variety of ways; one is to provide a function that returns the number of timer ticks since initialization. The application can then use this return value to establish delays or time-outs.

To use a timer in the stand-alone environment, you need to initialize the timer and possibly the interrupt system. Timer chips like the Motorola 6840 or Intel 8254 aren't difficult to access; even the slightly more versatile (and more complicated) AMD 9513 isn't beyond the reach of a knowledgeable programmer.

In most of the project's applications, we initialized the timer chip to cause regular, periodic interrupts. We simply incremented a global integer variable in the timer interrupt service routine and returned. The application need only read the global timer variable to establish delays and time-outs. Of course, time comparisons must be done with care since the interrupt incrementing is done with rollover at overflow.

Another way real-time kernels provide timer information is through the use of multitasking suspend or sleep functions. The application requests that its task be suspended until a certain event reactivates it or a time-out occurs. For a single-task application, such a wait is easily built in; for multitasking applications, the process is more involved (we'll get to that later).

Some real-time kernels also provide assistance in interrupt handling. For instance, they may save registers on the stack before calling your application interrupt routine so that you can do a normal subroutine return. I'd rather do my own pushes and pops; sometimes that extra call used in passing the interrupt to the application causes a critical time delay. In any case, a real-time kernel doesn't relieve the applications programmer of the responsibility for understanding interrupt operations.

Closed-loop control
Before we examine multitasking, let's look at an application function that's often thought of in terms of multitasking: closed-loop control. This function is instructive because there are alternative implementations that don't involve multitasking. Closed-loop control involves regular, periodic sampling of a controlled variable (such as a voltage from a transducer) and a corrective action (such as the velocity of a motor).

There are several reasons for not placing the entire sampling and feedback function in the interrupt service routine. For one, the feedback calculation may take longer than you can afford in an interrupt routine. Also, the use of an A/D converter in an interrupt service routine will unnecessarily complicate the shared use of that resource from other contexts. In general, the less you do in an interrupt service routine, the easier the program is to debug.

An alternative that has worked well for us many times is to call a subroutine to process closed-loop control whenever possible. Most of the time, this subroutine just checks the timer variable maintained by the timer interrupt service routine and returns. When the timer variable indicates that one sample period of time has elapsed, however, the subroutine performs a complete iteration of input sampling and output correction.

All that remains is to place calls to this subroutine at all the lowest-level waiting routines in the concurrent application. This usually comes to about four or five calls. While it's true that the exact running time (sampling aperture) of each sampling and feedback operation is imprecise by the amount of time between calls to the subroutine, that usually has little effect on the performance of the closed loop.

If your application requires full use of most of the features of a multi-tasking real-time kernel, then the stand-alone approach is certainly not cost-effective. Such features as priority scheduling, time-slicing, and task signaling are well provided for in a real-time kernel. But are all these features necessary for your application?

What follows is an implementation of stand-alone multitasking that we've found adequate in many of our applications (Listing 1). We implemented it in C for PROM-based 8085, 6809, and 68000 systems. The assembly language support functions never come to more than 1 1/2 pages of source code, so we call it our 1 1/2-page multitasker.

Listing 1: An implementation of stand-alone multitasking.

Multtask.asm: multitasking support for Lattice C.title multtask.asm: Multitasking Supportname mtaskX equ 4DGROUP GROUP DATADATA SEGMENT WORD PUBLIC 'DATA'ASSUME DS:DGROUPpublic task_numbertask_stack_size equ 2048tasks equ 5 ; number of taskssystem_stack dw ?task_number dw ?task_mode db 0 ; says If task or systemtask_control_table dw tasks dup (?) ; table of stack pointerstask_stack_pool db (task_stack_size*tasks) dup (?)DATA ENDSPGROUP GROUP PROGPROG SEGMENT BYTE PUBLIC 'PROG'ASSUME CS:PGROUP;........................................; init_task( task#, taskaddr ); /* called from "system" mode */; continue_task( task# ); /* called from "system" mode */; suspend(); /* called from "task" mode */;; The task_mode flag to keep track of which mode is in effect; allows the use of subroutines that are called by both system; and task code. If those subroutines call suspend(), then in; the case of a system mode caller suspend() must return ; control immediately.;.......................................public init_task,continue_task,suspendinit_task proc nearpush bpmov bp,spmov system_stack,spmov ax,[bp+X] ; task #mov task_number,ax ; (application code may need to know this)mov bx,task_1stack_sizemul bxadd ax,offset dgroup:task_stack_pool+task_stack_sizemov sp,ax ; Stack belongs to the task task_mode,1mov ax,offset pgroup:callsus; In case the task returns, the next push prepares the return; address of a hung task that repeatedly gives up control by ; calling suspend().push axpush [bp+2+X] ; starting task address; By falling through to suspend, we act as if task has called ; suspend.suspend:test task_mode,1 ; In case "system" code calls suspend...jnz sustskret ; "system" gets control back again.sustsk:push bp ; on "task" stack; Note: Here is where we would save any register variables on ; the bx,task_numbersal bx,1 ; task_number * 2; Save current task's SP in task_control_table[bx],spmov sp,system_stack 0 ; back in "system" mode nowmov task_mode,0pop bpretcontinue_task:push bpmov bp,spmov system_stack,spmov bx,[bp+X]mov task_number,bxsal bx,1mov sp,task_control_table[bx] ; Restore task's task_mode,1 ; We're in "task" mode now.; Note: Here is where we would restore any register variables; from stack.pop bpret; Rejoin task.;————————————————————————————callsus:call suspendjmp callsus ; in case a task returns all the wayinit_task endpPROG ENDS;END   

We start with the distinction between system and task coding. Both are considered application programming. When we come to the point in the system application programming where we wish to initiate multitasking, the application program calls a setup routine, giving the starting function pointer for each task. This routine allocates separate stack areas (separate from system stack) for each of a fixed number of tasks. It then sets up a current stack pointer and execution pointer for each task. After the initialization of, say, four tasks, the system can cause concurrent execution of the tasks with code that looks like this:


The continue_task() function transfers control to the numbered task and returns when the task calls a suspend() function. The continue_task() and suspend() functions also save and restore any register variables on the task stack. Since control is transferred cooperatively, we don't need to save the state of a floating-point processor in the context switch. All floating-point operations are begun and completed within one call of continue_task() . The while(1) condition can, of course, be expanded to check for various stopping conditions, including global status flags set by the tasks. This kind of multitasking isn't forced timeslicing, so it requires that the tasks call suspend() frequently. Since each task has its own stack, the call to suspend() may occur at any level of subroutine nesting.

In our applications, the four tasks are usually the same reentrant subroutine in a multi-station industrial test stand. At the lowest level, the reentrant subroutine uses the global variable tsknum, which is set to the task number by continue_task() . This variable determines which station hardware to access.

That may sound like a lot of functionality to pack into 1 1/2 pages of assembly code, but what we're providing is actually much simpler than a general real-time multitasking kernel. For example, this task scheduler provides no means for an external event to activate a task. Every task is expected to have control for a short time in every loop; it's up to the task to check for activation conditions. Neither does the scheduler provide signaling between tasks. The tasks can, however, use common variables to implement this type of signaling.

Since all tasks are linked in as reentrant functions in one program, special care must be taken when referencing static variables (both global and nonglobal). In general, the tasks' use of static variables should be limited to deliberate intertask communication. Variables allocated off the stack are automatically private to the task.

Special shared-resource handling isn't provided by this primitive scheduler. In our testst and applications, we used simple semaphores to manage the allocation of shared hardware to individual tasks. Again, because all the tasks are linked concurrently, addressability of common variables used to implement semaphores is automatic. If this kind of implementation is sufficient for your application, then the ease of using this task scheduler will compare very favorably to that of the generalized task control blocks used in a commercial real-time kernel.

Development environment
Perhaps the most forbidding aspect of writing stand-alone code is the absence of a friendly development environment. Many programmers have grown accustomed to the debugging facilities in most real-time kernels. Again, there are several alternatives. A good in-circuit emulator will provide hardware breakpoints in ROM, instruction tracing, even high-level language debugging. Once your product is working, these gadgets can be removed.

If you can't justify using an ICE, there are still alternatives. The debug ROMs provided by beard manufacturers are very useful in verifying hardware; they usually come up running with no application software at all. The operator's console may then be used for primitive hardware verification. Once the hardware has been verified, the commercial debug ROM may be removed and the burn-and-learn process can begin, starting with the ever-popular “Hello, world” program. This is a big step, and one that's hard to subdivide. But if you pass it, you can quickly build into your application the primitive peek-and-poke functions just in case you start doubting the hardware again.

What does the future of stand-alone programming look like? It appears that the trend is toward a more packaged approach. While that may be desirable for the majority of applications, the diverse group of microprocessor users may find it more practical to write stand-alone code. Rather than labeling such efforts as old-fashioned, we might do better to support both approaches.

This question is particularly relevant to board manufacturers and writers of ROMable compilers. Some boards now come with a manual describing in great detail how to use the board with packaged software, yet provide only a page or two of the information needed by programmers who write their own drivers. I personally find sample assembly language drivers and technical function descriptions much more useful than packaged software that holds me at arm's length from the full hardware potential of the board. And ROMable compiler writers can help by supplying a library of tested kernels (in source form, of course) for some of the popular boards. With adequate support, writing stand-alone code can be painless and practical.

Robert Scott is chief engineer at Real-Time Specialties, a custom software supplier for real-time applications. He has a master's degree in mathematics. Robert also develops and sells a line of professional piano tuning software at

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.