Tips for doing effective hardware/firmware codesign: Part 1
Editor’s Note: In an excerpt from his book Hardware/firmware Interface design, author Gary Stringham provides examples of best practices developers can use to increase their design’s performance.
Part 1: event notification and performance.
In an embedded system, hardware and firmware each have their respective jobs to do but must work together as a system. Coordination must occur between hardware and firmware, especially to keep both working optimally.
However, if the system is not balanced, performance could be impacted if firmware is waiting for hardware to finish something or if hardware is waiting for firmware to say what to do next.
Firmware has to wait for hardware to complete a task. In the meantime, firmware often busies itself with other tasks. But if hardware does not generate any kind of task completion signal, firmware is often left guessing and accommodating a worse-case scenario. On the other hand, if firmware is too busy to respond to interrupts from hardware, hardware is left idling and possibly could miss external events it needs to handle.
Timing is not the only aspect, but efficiency across the hardware/firmware interface is a factor, too. Inefficient designs are prone to defects that need to be detected and resolved. In this chapter I will discuss several design aspects that will help hardware and firmware work more efficiently to increase performance and reduce unnecessary waiting.
When some event occurs in the block that firmware needs to respond to, firmware needs to be notified. Events in the block result from block tasks that were launched. Launches can be divided into two categories: external launches and firmware launches. Events from external launches come as a result of something from outside the block launching a block task, such as an incoming data packet or a signal from another block.
They come asynchronously; neither the block nor its device driver can anticipate when it will come. The block is set up to watch for it and generate an interrupt for the device driver to handle it. Events from firmware launches come as a result of firmware having launched some task in the block, such as process some data or send out a packet.
Often the hardware engineers designing the block have a good idea as to how long the task will take. However, that information may not be very useful for firmware. For example, if it is known that a block task will take 2000 clock cycles, how can firmware use that information? Firmware engineers do not know what that means for a number of reasons:
- The CPU and the chip are not necessarily running at the same frequency.
- Firmware engineers typically do not know how many CPU clock cycles a section of code takes (unless they are writing in assembly).
- A CPU with a multiple core or that reorders instructions makes it difficult to calculate.
- When idle, a CPU could shut itself down, which stops its own clocks.
- Other system interrupts will stop the current firmware thread from executing temporarily, and the thread will not know that it occurred.
- The CPU may be busy working on some other firmware task.
- I/O reads and writes from the CPU to the chip typically take a few extra clock cycles.
- The CPU can buffer up I/O reads and writes, so others may be in front.
- Different CPUs have different read/write characteristics.
So how should firmware know when the task is done? Firmware can only know how much time passes by reading some counter in the CPU or chip. But firmware is not so concerned with how much time has passed; it is more concerned about when some event has occurred.
Different methods have been used, each with their own advantages and disadvantages. Events from external launches always generate an interrupt. But if firmware just launched a task with a very short time, generating an interrupt right back to firmware might not be optimal. Firmware finds out that it can proceed with the next step using one of these four methods:
- No indication
- Timed delay
- Status bit
No Indication. The block does not have a way to notify firmware that the task is complete. This is often used when the task completion is immediate; or, in other words, synchronous with firmware’s access of the block, such as changing configuration settings or an instantaneous abort. This is okay because firmware can safely and immediately write something else to the block.
However, when designs evolve and tasks that used to be instantaneous are delayed or take time to complete, there is the risk that firmware could access the block again before it is ready.
It is not good design to not indicate to firmware that a task has completed or that an event has occurred that is not synchronous to firmware’s access to the block.
Firmware is left guessing when it can take the next step and is prone to guess wrong. In the best case, firmware will know immediately that it guessed wrong and can wait a little longer. But in the worst case, firmware will not know immediately that it guessed wrong, and its premature access will have caused problems elsewhere in the system, resulting in a very difficult defect to diagnose.
Make sure that firmware will be able to know about every event that will occur asynchronously to its access to the block.
Best Practice Tip: Always provide an indicator to firmware of any event or condition that firmware needs to know about.
Timed Delay. A timed delay is when firmware needs to wait for a specific amount of time before it can take the next step.
The method that firmware uses to wait must be portable across generations, types, and speed of chips and CPUs. Specifying delays in units of clock cycles is difficult for firmware and not very portable. As mentioned above, telling firmware to wait 2000 clock cycles is difficult.
Specifying delays in units of seconds is portable and firmware generally knows how to handle that on any given platform. Delays measured in seconds are generally implemented using one of three implementations: OS timer, CPU busy loop, or hardware timer.
OS Timer. Most OS platforms provide some type of timer delay facility that will invoke a task after a specified number of ticks. Most systems have a tick size of 1 ms, 10 ms, 100 ms, or other value within that range. The OS timer works well for long delays: seconds, minutes, hours, and so on.
Short delays, that require just a few ticks, are prone to problems. If the OS tick were 10 ms, then asking for a 25-ms delay requires analysis. If that is a minimum of 25 ms, then 3 ticks are needed and it will generate a 30-ms delay. If 25 ms is a maximum, then 2 ticks are needed which will generate a delay of 20 ms.
Using 3 ticks to get 30 ms when a 25-ms delay is wanted incurs a penalty of an extra 5 ms, or a delay time of an extra 20%. However, this assumes that when a 25-ms delay is requested, it is launched at the beginning of the next 10-ms tick window.
But firmware does not operate that way. The task that wants a 25-ms delay could launch that delay anywhere within the 10-ms window. Figure 7.1 illustrates how a timer delay for 3 ticks can result in a delay anywhere from 20 to 30 ms; therefore, a minimum 25-ms delay cannot be guaranteed with a 3-tick delay. So 4 ticks must be requested, which will result in a delay greater than 30 ms and less than or equal to 40 ms.
The OS timer cannot handle delays less than 1 tick. To induce delays of a shorter amount of time, the CPU busy loop or a hardware timer must be employed.