Tips for doing effective hardware/firmware codesign: Part 1 - Embedded.com

Tips for doing effective hardware/firmware codesign: Part 1

Editor’s Note:  In an excerpt from his book Hardware/firmware Interface design , author Gary Stringham provides examples of best practices developers can use to increase their design’s performance.

Part 1: event notification and performance.

In an embedded system, hardware and firmware each have their respective jobs to do but must work together as a system. Coordination must occur between hardware and firmware, especially to keep both working optimally.

However, if the system is not balanced, performance could be impacted if firmware is waiting for hardware to finish something or if hardware is waiting for firmware to say what to do next.

Firmware has to wait for hardware to complete a task. In the meantime, firmware often busies itself with other tasks. But if hardware does not generate any kind of task completion signal, firmware is often left guessing and accommodating a worse-case scenario. On the other hand, if firmware is too busy to respond to interrupts from hardware, hardware is left idling and possibly could miss external events it needs to handle.

Timing is not the only aspect, but efficiency across the hardware/firmware interface is a factor, too. Inefficient designs are prone to defects that need to be detected and resolved. In this chapter I will discuss several design aspects that will help hardware and firmware work more efficiently to increase performance and reduce unnecessary waiting.

Event Notification
When some event occurs in the block that firmware needs to respond to, firmware needs to be notified. Events in the block result from block tasks that were launched. Launches can be divided into two categories: external launches and firmware launches. Events from external launches come as a result of something from outside the block launching a block task, such as an incoming data packet or a signal from another block.

They come asynchronously; neither the block nor its device driver can anticipate when it will come. The block is set up to watch for it and generate an interrupt for the device driver to handle it. Events from firmware launches come as a result of firmware having launched some task in the block, such as process some data or send out a packet.

Often the hardware engineers designing the block have a good idea as to how long the task will take. However, that information may not be very useful for firmware. For example, if it is known that a block task will take 2000 clock cycles, how can firmware use that information? Firmware engineers do not know what that means for a number of reasons:

  • The CPU and the chip are not necessarily running at the same frequency.
  • Firmware engineers typically do not know how many CPU clock cycles a section of code takes (unless they are writing in assembly).
  • A CPU with a multiple core or that reorders instructions makes it difficult to calculate.
  • When idle, a CPU could shut itself down, which stops its own clocks.
  • Other system interrupts will stop the current firmware thread from executing temporarily, and the thread will not know that it occurred.
  • The CPU may be busy working on some other firmware task.
  • I/O reads and writes from the CPU to the chip typically take a few extra clock cycles.
  • The CPU can buffer up I/O reads and writes, so others may be in front.
  • Different CPUs have different read/write characteristics.

So how should firmware know when the task is done? Firmware can only know how much time passes by reading some counter in the CPU or chip. But firmware is not so concerned with how much time has passed; it is more concerned about when some event has occurred.

Different methods have been used, each with their own advantages and disadvantages. Events from external launches always generate an interrupt. But if firmware just launched a task with a very short time, generating an interrupt right back to firmware might not be optimal. Firmware finds out that it can proceed with the next step using one of these four methods:

  • No indication
  • Timed delay
  • Status bit
  • Interrupt

No Indication. The block does not have a way to notify firmware that the task is complete. This is often used when the task completion is immediate; or, in other words, synchronous with firmware’s access of the block, such as changing configuration settings or an instantaneous abort. This is okay because firmware can safely and immediately write something else to the block.

However, when designs evolve and tasks that used to be instantaneous are delayed or take time to complete, there is the risk that firmware could access the block again before it is ready.

It is not good design to not indicate to firmware that a task has completed or that an event has occurred that is not synchronous to firmware’s access to the block.

Firmware is left guessing when it can take the next step and is prone to guess wrong. In the best case, firmware will know immediately that it guessed wrong and can wait a little longer. But in the worst case, firmware will not know immediately that it guessed wrong, and its premature access will have caused problems elsewhere in the system, resulting in a very difficult defect to diagnose.

Make sure that firmware will be able to know about every event that will occur asynchronously to its access to the block.

Best Practice Tip: Always provide an indicator to firmware of any event or condition that firmware needs to know about.

Timed Delay. A timed delay is when firmware needs to wait for a specific amount of time before it can take the next step.

The method that firmware uses to wait must be portable across generations, types, and speed of chips and CPUs. Specifying delays in units of clock cycles is difficult for firmware and not very portable. As mentioned above, telling firmware to wait 2000 clock cycles is difficult.

Specifying delays in units of seconds is portable and firmware generally knows how to handle that on any given platform. Delays measured in seconds are generally implemented using one of three implementations: OS timer, CPU busy loop, or hardware timer.

OS Timer. Most OS platforms provide some type of timer delay facility that will invoke a task after a specified number of ticks. Most systems have a tick size of 1 ms, 10 ms, 100 ms, or other value within that range. The OS timer works well for long delays: seconds, minutes, hours, and so on.

Short delays, that require just a few ticks, are prone to problems. If the OS tick were 10 ms, then asking for a 25-ms delay requires analysis. If that is a minimum of 25 ms, then 3 ticks are needed and it will generate a 30-ms delay. If 25 ms is a maximum, then 2 ticks are needed which will generate a delay of 20 ms.

Using 3 ticks to get 30 ms when a 25-ms delay is wanted incurs a penalty of an extra 5 ms, or a delay time of an extra 20%. However, this assumes that when a 25-ms delay is requested, it is launched at the beginning of the next 10-ms tick window.

But firmware does not operate that way. The task that wants a 25-ms delay could launch that delay anywhere within the 10-ms window. Figure 7.1 illustrates how a timer delay for 3 ticks can result in a delay anywhere from 20 to 30 ms; therefore, a minimum 25-ms delay cannot be guaranteed with a 3-tick delay. So 4 ticks must be requested, which will result in a delay greater than 30 ms and less than or equal to 40 ms.

Figure 1: OS time-delay request can occur anywhere within the 10-ms tick window.

The OS timer cannot handle delays less than 1 tick. To induce delays of a shorter amount of time, the CPU busy loop or a hardware timer must be employed.


CPU Busy Loop. A CPU busy loop involves spinning in a loop,reading a hardware or CPU counter incrementing at a known speed untilthe desired time has elapsed. But spinning in a busy loop ties up theCPU too long, preventing lower-priority tasks from executing. Thespinning task is doing nothing useful besides waiting.

Though thespinning loop is undesirable, it may be necessary if a short delay isneeded and no hardware timer support is available. If a hardware timeris available, the CPU busy loop should not be used.

Tale from the Trenches. Ateam member needed a delay of at least 1 ms for an I 2C device driver.The OS timer tick was at 10 ms, and there was no hardware timeravailable. However, she did not want to use a CPU busy loop. Instead,she used the OS timer with 1 tick, inducing a delay up to 10 ms, whichwas 9 ms longer than needed. But it avoided the CPU busy loop. However, 1tick would only guarantee a delay anywhere between 0 and 10 ms.Depending on when the timer was installed, there was a 10% chance thatthe delay would be less than 1 ms. So she had to set the timer tick to 2to guarantee a minimum of 1 ms in all cases.

Hardware Timer. A high-resolution hardware timer, such as general-purpose timers, willgenerate an interrupt to firmware after the programmed delay. Thisprovides better precision than the OS timer and does not require tyingup the CPU with a busy loop.

As a rule of thumb, the hardwaretimer should have a resolution of at least 1 μs and should have enoughbits to count up to 10 times the OS tick. If the OS tick were 10 ms,then the hardware timer should be able to count up to 100 ms. Thisallows some overlap between the hardware timer and OS timer. Thispermits firmware to use the precision of the hardware timer for delaysless than 10 times the OS tick.

In the above example of OStimers, to guarantee a minimum of 25 ms, firmware must install an OStimer request for 4 ticks, which would yield a delay from 30 to 40 ms.But with a hardware timer with a 1 μs resolution, the delay would be atmost 25.002 ms.

Best Practice Tip: Providesupport in the chip—such as a general-purpose timer—that will generatean interrupt after short delays (less than 100 ms).

Status Bit. The OStimer facility is best suited for long delays. For shorterdelays, rather than using a hardware timer, a better option is to havethe block indicate when firmware can take the next step.

Thisallows the block to generate it sooner or later, depending on the taskat hand, and it takes the burden off of firmware to launch, manage, andrespond to other timers. The block can indicate task completion in oneof two ways, status bit and interrupt. This section talks about statusbits.

Tale from the Trenches. On theUnity mono video block, firmware was required to wait for a minimum (butshort) amount of time after a reset before writing to any of theregisters. There was no status or interrupt bit to indicate that theminimum time had been met. I ran experiments and discovered that goingsix times through a busy loop was enough to induce a long enough delaybefore writing to the registers.

About 3 yearslater, the device driver was ported to a different CPU that ran throughthe loop faster. This caused the device driver to write to thoseregisters too soon. But there were no indications of errors, except thatthe printer was behaving incorrectly. I was not working on that projectat the time; two other engineers spent months trying to solve it andcould not so they asked for my help. After 2 weeks of looking at theproblem, I remembered that obscure little timing constraint. Thesolution was simple; I bumped up the loop count to 30. Had there been astatus or interrupt bit to indicate ready, months of engineeringresources would have been saved.

A status bit is simply a bit insome register that indicates that the event has occurred. It could be anactive bit that is cleared when the task is done. Or it could be aready bit that is set when the block is ready for the next task.Firmware has to read the bit to see what state it is in. If firmware hasto wait for the bit to change, it spins in a polling loop, reading thebit over and over until it changes. Listing 1 shows a typical pollingloop.

Obviously, it is not good design to have an infinite loopin firmware. A max counter should be employed. Having firmware poll forthe bit to change is best suited if the delay is short.

Listing 1: A polling loop used to watch and wait for a bit to get set.

Ifthe bit will change within a few passes of the loop, it is moreefficient for the device driver to poll than to have the firmware systemhandle the overhead and interrupt as shown here:

  • The device driver launches a task in the block in the chip. The device driver then becomes blocked waiting for the interrupt.
  • Since that device driver becomes blocked, it is swapped out and another firmware process is swapped in.
  • An interrupt occurs and current process is swapped out.
  • The main firmware interrupt handler wakes up, reads the interrupt register, decodes where the interrupt came from, then calls the device driver’s interrupt handler.
  • The device driver’s interrupt handler wakes up, reads the block’s interrupt register, sets a flag for itself, and exits.
  • The device driver now becomes unblocked and can move on. If the delay is long, then firmware should not tie up the CPU by polling but should incur the overhead and use interrupts.

Best Practice Tip. Use a status bit to indicate completion of tasks guaranteed to complete within an efficient polling period.

Part 2: Performance, Power-On and communications

This article is an excerpt from Hardware/Firmware Interface Design by Gary Stringham, copyright 2010, used by permission from Newnes, an imprint of Elsevier Publishing.

Gary Stringham is the founder and president of Gary Stringham &Associates, LLC . He has engineering experience in R&D andmanufacturing with a proven track record of cost-savings and innovationin the design, implementation, and testing of firmware, hardware, andsoftware solutions. He also has extensive expertise in diagnosing andresolving a broad range of engineering problems. Gary worked forHewlett-Packard Company for over 21 years, working in Fort Collins,Colorado; Exeter, New Hampshire; Blingen, Germany; and Boise, Idaho. Hecan be contacted by writing to .

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.