New Twists on the Z80 -

New Twists on the Z80


To read original PDF of the print article, click here.

New Twists on the Z80

Jean J. Labrosse

Despite being a quarter-century old, the 8-bit Z80 and its derivatives continue to be popular with embedded system designers. With two new derivatives emerging within the last year, it's a good time to take a fresh look at the architecture. This article considers the architectural choices from the perspective of an RTOS implementer or user.

Introduced by ZiLOG in the early '70s, the Z80 was an impressive processor for its time. The original Z80 (a 2MHz part) was a souped-up version of the then-popular Intel 8080 processor. While compatible with the 8080, the Z80 offered many new features: new instructions, new addressing modes, two index registers, faster execution, high performance peripheral chips, and a clever interrupt scheme. Soon, Z80s were available in higher speed grades (4MHz and 6MHz). For many years, the Z80 was the king of 8-bit processors.

In 1976, ZiLOG introduced the Z280, which was an enhanced Z80, but it turned out to be a big failure.Around 1985, Hitachi introduced the 64180 microprocessor, which was an improved Z80. ZiLOG second-sourced the processor under the Z180 name. This 64-pin DIP processor executed most instructions with fewer clock cycles than the Z80 and contained a number of on-chip peripherals such as a pair of UARTS, a clocked serial I/O port, an on-chip interrupt controller, two DMA channels, two 16-bit timers, and an on-chip memory management unit allowing the '180 class processor to extend the address space from 64K to 1,024K. Although called an MMU, this device only implemented memory banking (described later).

In the mid '90s, ZiLOG announced the Z380 but, like the Z280, it never really went anywhere. Today, a number of companies are selling Z80-derived processors, including AB-Semicon, Kawasaki, LSI, NEC, Toshiba, and VAutomation.

Towards the end of last year, ZiLOG announced a new generation of Z80 derivatives. Around that same time, a new company called Rabbit Semiconductor introduced its own new generation of Z80 derivatives. ZiLOG's offering is called the eZ80 and is expected to hit the street in the second half of 2000. Rabbit Semiconductor's new processor is called the Rabbit 2000; it started shipping in November 1999. Rabbit Semiconductor was founded by its parent company Z World Inc. Z World has been around since 1981 and has been using the Z80 and Z180/64180 extensively in its board-level controllers.

Annual volumes of Z80 flavors are reported to be at around 200 million units. I asked both ZiLOG and Rabbit why they would introduce new products based on a 25-year-old architecture. I got the same response from both: a lot of engineers (about 100,000), have used these chips at some point in their career, and are already familiar with the architecture. The Z80 architecture is simple to learn and still quite powerful. And, like any popular platform, there is a lot of valuable legacy software floating around for it. Finally, it's quite difficult and expensive to introduce completely new architectures. Both ZiLOG and Rabbit claim that their new processors rival 16-bit processors in performance.

The original
The register model of the Z80 is shown in Figure 1 . The Z80 can only address 64K of memory but has a separate 64K I/O address space. The Z80 has two sets of 8-bit registers but only one is visible at any given time. Most 8-bit operations are performed on the A register. The F register contains the six CPU flags (Sign, Zero, Half carry, Parity/oVerflow, Negative, and Carry). The flags are automatically changed based on the result of the current operation. Registers B, C, D, E, H, and L are general-purpose 8-bit registers that can be used during 8-bit operations. These last registers can be combined in pairs to form 16-bit registers: BC, DE, and HL. The BC, DE, and HL registers can be used in 16-bit arithmetic operations or can be used as pointers.

Click on image to enlarge.

The I register is used to hold the upper 8-bit address of the interrupt vector table. When an interrupt is generated, the CPU uses the I register and an 8-bit value provided by the interrupting device to form a 16-bit interrupt vector address. Determining the source of an interrupting device and vectoring to the appropriate interrupt service routine (ISR) is thus handled in hardware. The R register is incremented on each instruction and is used to provide a refresh address for dynamic memory during a refresh cycle that is buried in each instruction cycle. However, it is unlikely that anyone would use dynamic memory nowadays because static memory is more appropriate for a Z80 design. Both the IX and IY registers are 16-bit pointers. Finally, SP is the stack pointer and PC is the program counter. On average, a Z80 processor executes an instruction every 10 to 12 clock cycles. Newer Z80 derivative processors cut this to two to five cycles per instruction, depending on the architecture.

Listing 1: Task code
void TaskCode (void){ while (1) {  Do something useful;  Wait for event or time to expire; }}

Real-time kernels
A real-time kernel is software that manages the use of a microprocessor or microcontroller to ensure that all time-critical events are processed as efficiently as possible. A real-time kernel can help simplify your design because it allows your project to be divided into multiple independent elements called tasks. A task is a program that competes for CPU time and is generally written as an infinite loop as shown in Listing 1.

Listing 2: Letting the kernel manage your tasks
    void main (void){  .  .  OSTaskCreate(Task1Code, TopOfStack1, Priority1);  OSTaskCreate(Task2Code, TopOfStack2, Priority2);  .  .}

With most real-time kernels, each task is given a priority based on its importance and its own stack space. You allow the kernel to manage each task by calling a function provided by the kernel, as shown in Listing 2.

When you design a product using a real-time kernel you split the work to be done into tasks, each responsible for a portion of the problem solution. A real-time kernel also provides valuable services to your application such as time delays, system time, message passing, synchronization, mutual-exclusion, and more.

Most commercial real-time kernels are preemptive. A preemptive kernel ensures that the highest-priority task that is ready to run is given control of the CPU. When an ISR makes a higher priority task ready to run, the higher priority task will be given control of the CPU as soon as all nested interrupts complete. The execution profile of a system designed using a preemptive kernel is illustrated in Figure 2 . As shown, a low priority task is executing (1). An asynchronous event interrupts the microprocessor and the processor vectors to an ISR (2). The microprocessor services the event (3) and calls a service, provided by the kernel, that causes a higher priority task to become ready for execution. Most likely, the higher priority task was waiting for the ISR to occur. Upon completion of the ISR, the kernel is invoked once more. This time, the kernel notices that a higher priority task is now ready to run and thus, the processor cannot return to the interrupted task. Instead, the kernel will resume the high priority task (4). The higher priority task executes (5) until it again needs to be signaled by another occurrence of the ISR or another task. A kernel service is again invoked to have the high priority task wait for an event. If the event did not occur, the kernel resumes execution of the lower priority task (6).

Click on image to enlarge.

A kernel thus ensures that time critical tasks are performed first. Furthermore, execution of time-critical tasks are deterministic and are almost insensitive to code changes. In fact, you can often add and change low priority tasks without affecting the responsiveness of your system to high priority tasks.

In order to manage tasks, the kernel needs to maintain internal data structures. One such data structure is called the task control block (TCB). Each task is assigned a TCB by the kernel. A TCB contains the “state” of a task (ready or waiting), the current location of the task's top-of-stack (TOS), priority, and other kernel-specific elements.

Because each task is assigned a priority, the kernel must also keep track of tasks in order of priority. The kernel maintains what's called a Ready List. The ready list tells the kernel which tasks are able to execute. The kernel always chooses the highest priority task that is ready to run. Only one task can execute on the processor at any given time, and thus, the other tasks that are ready will have to wait until the current (running) task is no longer able to run. Figure 3 shows how TCBs can be organized in the ready list. The singly linked list is only shown to demonstrate the concept. Kernel designers have devised more efficient ways to maintain a ready list.

Click on image to enlarge.

The kernel also maintains a number of Wait Lists. A task can wait on kernel objects such as a semaphore, mailbox, message queue, or pipe. Each of these objects contains state information. For example, a semaphore contains a value indicating whether the semaphore is available or not, as well as a list of tasks waiting on the semaphore, provided the semaphore is currently owned by a task. Figure 4 shows a list of tasks waiting on a semaphore. Again, the singly linked list is shown mainly to illustrate the concept.

Click on image to enlarge.

Click on image to enlarge.

Figure 5 shows the execution profile of a service, OSTimeDly(), provided by an kernel that I developed a few years ago, called µC/OS-II. In the example, Task 1 is executing and calls OSTimeDly() to suspend execution of Task 1 for one system tick. A system tick is generally created by a timer chip that interrupts the CPU and occurs at a fixed interval (10ms to 100ms, depending on the application). A task calling OSTimeDly() is placed on a list of tasks waiting for time to expire. OSTimeDly() calls OSSched(), which is another kernel function that finds the next most important task that is ready to run. Once found, OSSched() calls OSCtxSw(), which performs a context switch to this task. What's important to note here is that these functions take time to execute, so the processor should execute these as quickly as possible.

The kernel performs a context switch when it determines that the current task is no longer the most important task to execute. A processor's context generally consists of the volatile state of a processor. For the Z80, the context consists of the following registers: AF, BC, DE, HL, AF¢, BC¢, DE¢, HL¢, IX, IY, SP, and PC. In other words, if you want to stop execution of one task and resume execution of another task, you need to save the context of the first task onto its stack and restore the context of the task you wish to execute from its stack.

A context switch for the Z80 is shown in Figure 6 . It is assumed that the kernel maintains pointers to the TCB of the running task (task to suspend) and the new task (task to resume). A context switch starts by saving the processor's context onto the running task's stack (1). The processor's SP register (new TOS) is then saved in the TCB of the task to suspend (2). The task's TOS is retrieved from the new task's TCB and placed in the processor's SP register (3). Finally, the processor registers are restored by popping them from the stack (4). At this point, the new task code continues execution as if it had never been suspended.

Listing 3: Context switch pseudocode for a Z80
Z80_Context_Switch:; Save the current task's context  PUSH main register set  PUSH secondary register  Save SP into the TCB of the task to suspend;; Restore the new task's context  Load SP from the TCB of the task to resume;  POP secondary register set  POP main register  RET  

Listing 3 shows the pseudo-code for the above operation. Note that it's assumed that the PC register is already on the TOS of the task to suspend. A context switch takes about 426 clock cycles on a Z80 (17µs at 25MHz).

Kernel requirements
Kernels can be designed to work with just about any processor. In fact, a kernel requires very few features from a processor. However, to run a preemptive kernel, a processor must be:

  • Able to disable and enable interrupts
  • Able to access a large stack area
  • Able to load and store the processor's stack pointer
  • Able to save and restore the processor's context onto/from the stack
  • Able to easily manipulate pointers

The Z80 and its derivatives support all of these requirements. The kernel disables interrupts to protect critical sections of code. Thus, the processor must thus have instructions to disable and enable interrupts. The Z80 provides two instructions: DI and EI, respectively. What if the programmer had interrupts disabled before calling a function provided by the kernel? Unfortunately, the Z80 doesn't provide an easy way to preserve the state of the interrupt disable flag and restore it upon completion of the critical section. Because of this, interrupts on the Z80 are always enabled after leaving a critical section. Fortunately, most of the time, this is the way you would want it. The Rabbit 2000 (described later) is the only Z80 derivative that can preserve the state of the interrupt disable flag.

The Z80 provides a number of registers that can be used to indirectly address memory, making it a natural for pointer manipulation.

A kernel needs RAM to store data structures and to maintain a stack for each stack. On a Z80, a kernel such as µC/OS-II requires as little as 1K of RAM for internal structures (task stacks excluded). Where code space (ROM) is concerned, a minimal configuration for µC/OS-II would require about 3K. Other operating systems may require additional RAM and/or ROM.

Hitachi 64180
As previously mentioned, the Hitachi 64180 processor was introduced in the mid '80s. Hitachi has stopped promoting the 64180 in the U.S. and seems to be concentrating on the H8 and SH processor families. ZiLOG has been pushing many Z180 derivative products, all based on Hitachi's original design, having different complements of on-chip peripherals.

Apart from its on-chip peripherals, a 180 class processor adds value by integrating an on-chip MMU. Again, although called an MMU, the 180-class processors only implement memory banking. Figure 6 shows how the MMU translates a 64KB logical address space into a 1024KB physical memory space. A '180 class processor can only address 64KB of memory at any given time. The 64KB logical address space is split into three areas: Common Area 0, Bank Area, and Common Area 1. The size of each of these can be adjusted with a granularity of 4KB. The MMU is configured through three 8-bit I/O ports, which Hitachi calls registers-CBAR, BBR, and CBR.

Click on image to enlarge.

Common Area 0 is designed for ISRs and other code that cannot be banked, such as kernel services. This region always starts at address 0x0000 in the logical address space and always translates to physical address 0x00000.

The starting address, and thus the size, of the other areas is established by the common/bank area register, or CBAR. The upper nibble (four bits) of the CBAR sets the starting address in the logical address space of Common Area 1. In other words, if the upper nibble of the CBAR is set to 0x8 then Common Area 1 starts at 0x8000. The lower nibble of the CBAR sets the starting address of the Bank Area. For example, if the lower nibble of CBAR is set to 0x4 then the Bank Area starts at 0x4000. To achieve the layout of the logical address space shown in Figure 7 , the CBAR would be initialized to 0x84 in the startup code and would not be changed at run time.

Common Area 1 is generally used to hold data (RAM). The common base register, or CBR establishes the starting address of Common Area 1 in the physical address space. The mapping of logical address to physical address for Common Area 1 is given by:

PhysicalAddress = (CBR << 12) + LogicalAddress;

Your application can change the value of the CBR at run time and thus point to a different physical block of memory in the physical address space. However, the CBR is generally set once at startup.Finally, we get to the bank base register, or BBR. Like the CBR, this register allows your application to view a “window” of physical memory (could be code or data). The BBR can be changed at run time and thus allows you to view different chunks of code or data. The mapping of logical address to physical address for the Bank Area is given by:

PhysicalAddress = (BBR << 12) + LogicalAddress;

As shown in Figure 7 , logical address 0x4000 would map to physical address 0x0C000 if the BBR contains 0x08. You should note that steps in value of the BBR are determined by the bank size. If you setup the Bank Area to have 16KB, values of the BBR are in steps of 0x04 since each increment represents 4KB.

Click on image to enlarge.

Listing 4: Context switch pseudocode for a 180-class processor, in banking mode
 H64180_Context_Switch:; Save the current task's context  PUSH main register set  PUSH secondary register  Get BBR and PUSH  Save SP into the TCB of the task to suspend;; Restore the new task's context  Load SP from the TCB of the task to resume;  POP and set BBR  POP secondary register set  POP main register  RET

The C compiler I've used sets the BBR to the proper value if you compile your code with the banking option. From a kernel's perspective, the kernel needs to save the BBR in addition to the other Z80 registers during a context switch. This means that the context switch code for the 180-class processors would look as shown in Listing 4. A context switch for µC/OS-II on the 180-class processors takes about 400 clock cycles (16µs at 25MHz).

Rabbit 2000
Last fall, Rabbit Semiconductor introduced the Rabbit 2000 processor. From its name, it's not obvious that this is a Z80/180 derivative processor, but it is. Like the 180-class processors, the Rabbit can address up to 1MB of code and data. Rabbit Semiconductor added new instructions that substantially improve 16-bit operations, allow alternate registers to be referenced explicitly, allow an offset to be added to the SP and HL registers (making stack manipulation much more efficient), allow the interrupt disable flag to be saved and restored to/from the stack, and more. In short, it corrects a number of deficiencies in the Z80/180.

The Rabbit goes through code mostly at two clock cycles per byte with very few extra cycles needed. The exception is the 16-bit multiply instruction, which requires 12 cycles. This is much better than the original Z80, which requires a minimum of three or four cycles per byte with many instructions requiring multiple extra cycles. Rabbit Semiconductor advertises the Rabbit 2000 to be four times faster than a Z80 when executing compiled C code on systems that have equal memory access times. This is a due to a combination of improvements:

  • Every instruction executes in fewer clock cycles
  • New instructions improve the quality of compiler generated code
  • Memory interface improvements permit higher clock speeds for the same memory access time

Finally, the Rabbit 2000 contains a number of I/O devices on chip: 40 parallel I/O ports, four asynchronous serial ports, numerous timers, a watchdog timer, and interfaces for memory chips.The register model for the Rabbit is shown in Figure 8 and is nearly identical to the Z80 except for the addition of the following registers (shown in bold in Figure 8 ): XPC, IP, STACKSEG, DATASEG, SEGSIZE.

Click on image to enlarge.

The EIR is the same as the I register on the Z80 and thus contains the upper byte of an interrupt vector table. The IIR replaces the R register on the Z80. However, the IIR is used to point to an interrupt vector table specific to internally generated interrupts. The IP register is the interrupt priority register. It contains four 2-bit fields that hold a history of the processor's interrupt priority. In fact, the Rabbit supports four levels of processor priority. The Z80 and 180-class processors have only two: maskable and non-maskable. What I find particularly useful is that you can save and restore the current interrupt priority level on the stack.

Listing 5: Handling of a critical section on the Rabbit
; Enter critical section:  PUSH IP  IP  1;; Critical section;; Exit critical section:  POP IP  

This makes it a natural for critical sections as shown in Listing 5.Memory management on the Rabbit is similar to the 180-class processors as shown in Figure 9 , but it's more flexible. There are four registers in the Rabbit MMU: SEGSIZE, DATASEG, STACKSEG, and XPC. In Figure 9 , I assumed a product might have 128KB of ROM and 64KB of RAM. The SEGSIZE register is used to establish the boundary of the Root Segment, Data Segment, and Stack Segment. I decided to have 24KB of code that is non-bankable and a data segment of 32KB. The Rabbit makes it easy to have multiple stacks since the stack segment can be placed on any 4KB page boundary beyond physical address 0x0D000. With the memory configuration shown, we could have 64 stacks each having 512 bytes and thus have each stack segment hold the stack of eight tasks. Of course, we could split up the stack differently between tasks because some tasks may require more or less RAM than others.

Click on image to enlarge.

The XPC is similar to the BBR register of the 180-class processors but sets the bank size to 8KB. The 8KB area act as a page rather than as a bank. This scheme is similar to the 80×86 real mode except that the page is 8KB instead of 64KB and the amount that the page can slide is 4KB instead of the 80×86's 16 bytes. There are no limits regarding the 8KB size of the page because the compiler generates code to change the XPC register when the code passes the 4KB mark (that is, halfway through the page). This is done with a “long jump” instruction that changes the XPC register and the PC. The net effect is that there are no gaps in the utilization of memory as is the case of banked schemes when functions do not quite fill up a bank. The entire process is handled by the compiler and is quite transparent to the programmer. The only caveat is that no single statement can generate more than 4KB of code-an unlikely possibility. With the configuration shown in Figure 9, I could have 24KB of root code and 104KB of extended code for a total of 128KB of code. The root code is shown in the diagram as taking 24KB. In fact, the root code region can be made smaller or larger. Root code enjoys a slight advantage over extended code in that subroutine linkage is a little faster, a factor that is most important for very short subroutines. In a typical Rabbit application, most of the code would reside in extended code space. The code for mapping of logical address to physical address for the stack segment is:

PhysicalAddress = (STACKSEG << 12) + LogicalAddress;

This equation is valid only when the logical address is between 0xD000 and 0xDFFF. The STACKSEG register is most likely maintained by the kernel and not the compiler. The code for mapping of logical address to physical address for the XPC segment is:

PhysicalAddress = (XPC << 12) + LogicalAddress;

Listing 6: Context switch on the Rabbit 2000

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.