There is nothing left to be invented in embedded control, Part 1 -

There is nothing left to be invented in embedded control, Part 1

I am obviously paraphrasing this famous (although disputed) quote[1] from Lord Kelvin, who was addressing the British Association for the Advancement of Science in the year 1900.

One hundred and fourteen years later, it might seem like this quote could be applied to the world of embedded control. According to some, it would appear that all the progress to be made is limited to driving the increased use of a few more bits in the microcontroller (MCU) Arithmetic Logic Unit, possibly 32 of them, by cost reducing it all to a shiny quarter.

But this would be just sad, and, thankfully, it is far from the truth. There is plenty of innovation happening in embedded control, and it is occurring right now, in front of our eyes—in the 8-bit world. While low-end 32-bit MCUs are being squeezed and left with only the most basic peripherals, 8-bit MCUs are being loaded with new and ever more useful peripherals.

Choosing Your Weapons
It is interesting how the two events are actually very tightly connected. It is the focus of the new, low-end 32-bit product generations on trying to replace 8-bit MCUs in their sweet spot—small, low-power and low-cost embedded applications—that is forcing those 8-bit MCUs to re-invent themselves.

The two contenders are clearly very different in their abilities and, therefore, in their approach to the practical problems posed by embedded control design. Like all armed conflicts, victory is heavily dependent on choosing the right weapons.

The primary strength of a 32-bit processor is certainly its processing speed, expressed in MIPS (millions of instructions per second). Memory size is also a factor, as it is necessary to use a very small CMOS process geometry in order to squeeze a 32-bit core into a small package, and with that comes more Flash (code) memory at a higher density.

In contrast, 8-bit MCU cores are so small that relatively large CMOS process geometries can be used effectively to operate at higher voltages, drive higher current loads (up to 100 mA on selected pins) and provide large margins of noise immunity, hence robustness. But, when it comes to processing speed, there is a clear disadvantage that 8-bit MCUs need to neutralize. To do this, they need to change the fundamental rules of the game by creating an arsenal of what we'll call Core Independent Peripherals .

Core Independent Peripherals
The idea is quite simple: If a task requires a lot of CPU cycles or large amounts of code and RAM memory, then design a dedicated (small) piece of on-chip hardware (a peripheral) to solve it in the most efficient possible manner. This new peripheral , once set up properly, will operate independently and, therefore, relieve the MCU core from the heavy lifting of the task at hand. Integrating these Core Independent Peripherals (CIPs) enables designers to use a smaller MCU core that is operating at a lower clock speed and idling (or even entering standby mode), for maximum cost, power and complexity reductions.

Let’s start with a practical example, to clarify this concept.

[Note 1]: “There is nothing new to be discovered in physics. All that is left is more and more refined measurements.” – Lord Kelvin, address to the British Association for the Advancement of Science – 1900 (disputed)

Measuring an Input Square Wave
In thisexample, we are actively developing an embedded-control subsystem thatis supposed to receive commands from a higher-tier controller in theform of a PWM signal (see Figure 1). The protocol is designed formaximum simplicity and low cost, so let’s assume the period of our PWMinput is constant within a given tolerance (between Tmin and Tmax) andit is only using the duty cycle (W) to carry all the informationrequired (let’s assume we require R = 4 bits of resolution in the DCmeasurement). For the purposes of our conversation, we’ll focus only onhow to optimize the input signal capture and decoding. The output ofthis subsystem could truly be anything (e.g., speed control, position ortemperature).

Figure 1. A simple PWM square-wave input

Let’sfirst approach the problem with the traditional tools and peripheralsfound in any traditional, general-purpose MCU. While there are manypossible approaches, we’ll limit our analysis to the following threebasic cases:

  1. Polling loop : This is the most CPU-intensive option possible. 100% of its CPU cycles are spent in loops (counting), to measure the respective lengths of the Ton and Toff periods in the incoming waveform. There is a very tight connection between the CPU core (and I/O speed) and the achievable actual duty-cycle measurement resolution. To make things worse, the entire application is “blocked” during the measurement stage of this approach, and any other activity is suspended. Additionally, only alternate periods can be measured as after each measurement we must allow the MCU to drive the resulting values to the output.

    We can, in fact, prove that Fcy (the MCU’s clock frequency) is now required to be:

    Fcy ≥ 2 * N * 2R / Tmin

    Where N is the number of CPU cycles required to perform a single polling loop.

    For Tmin = 100µs (10kHz), N = 4 and R = 4 we obtain Fcy ≥ 1.2MHz

  1. Interrupt driven : This method assumes that two types of resources are available to the MCU: a timer with sufficient resolution (likely 16-bit) and an input change notification mechanism capable of generating interrupts. This approach might appear to reduce the number of CPU cycles, and it might allow the application to maintain other background activities. However, in practice, this method imposes strict requirements on both the speed of the MCU and its interrupt-response mechanism. After all, the accuracy of the measurement is directly impacted by the latency of the interrupt response; and with it the minimum and maximum duty-cycle values

    Assuming DCmin = 1 LSB or 1/2R Tmin, it can be proven that the MCU clock is now required to be:

    Fcy ≥   M * 2R / Tmin

    Where M is the interrupt latency of the CPU (in clock cycles), including context saving and the instructions required to capture the timer value and return to the application.

    For M = 40 (quite an optimistic assumption), Tmin 100µs (10kHz) and with R=4, we obtain Fcy ≥ 24MHz.

    In other words, while the application is now capable of supporting background activities (no blocking loops), the MCU clock speed has to be increased tenfold. There is also increased software complexity, as the interrupt mechanism now requires a proper state machine to be set up in order to perform the correct capture of the rising and falling edges and to keep track of each period completion.

  1. Capture module : This method assumes that, in addition to the (16-bit) timer, a corresponding capture mechanism is available so that, at each alternating rising and falling edge of the input signal, the current value of the timer is captured in a corresponding register and an interrupt is generated to alert the MCU. While this peripheral feature is becoming more common among even the most inexpensive MCUs, it must be noted that the positive impact on the application’s performance is still quite marginal. In fact, the only practical benefit is a potential reduction in the measurement error, when the interrupt latency cannot be guaranteed to be fixed but can depend on the CPU workload at the time of the event.

As long as the minimum duty-cyclevalue is set to 1 LSB, the same formula applies to the calculation ofthe minimum MCU clock speed allowed, or 24 MHz. Similarly, the softwarecomplexity of the solution is only marginally reduced, if at all. Infact, the state machine required to track the duty cycle of each periodis almost identical; and so are the (16-bit) arithmetic operationsrequired to obtain the period and duty cycle.

Seven additionaltechniques, using a variety of software and traditional peripherals, areillustrated in Application Note AN1473, Various Solutions for Calculating a Pulse and Duty Cycle. Their performance and code-size impacts are summarized in Table 1.

Table 1. Associated Code Numbers

Grantedthat each technique’s complexity cannot simply be gauged by countingthe lines of code (which varies from 50 to ~ 600), doing so can stillgive us a sense of the complexity introduced in the application. Alsonote how some of the listed techniques explicitly require the use ofassembly language to provide the performance and resolution levelsindicated in Table 1.

Introducing the Signal Measurement Timer
Our first example of a Core Independent Peripheral is called the SMT or Signal Measurement Timer .The SMT is composed of three elements: a resettable counter, a pair ofdouble-buffered capture registers, and a configurable hardware statemachine (see Figure 2).

Figure 2. Simplified Block Diagram of the Signal Measurement Timer

Whilethere are many possible configurations and uses of this new peripheralmodule, the one we will use to solve our example application is perhapsthe most basic and very closely resembles the capture module method wejust reviewed.

The resettable counter is connected to a referenceclock source whose frequency, Tr, we will estimate shortly. Mostimportantly, it is now completely independent from the MCU clock.

Thetwo capture registers are automatically controlled by the SMT statemachine, which performs—independently from MCU intervention—thefollowing simple sequence:

  1. On the input rising edge, the counter is reset and the count is restarted
  2. On the following input-signal falling edge, the first capture is performed (Ton)
  3. On the subsequent rising edge, the second capture is performed (T), and an interrupt/flag is set to alert the MCU

The sequence continues from 1, with a new counter restart and so on…

Asyou can see, an entire period is elapsed before the firstinterrupt/flag is generated/set. Also, since the capture registers aredouble buffered, the MCU has the entire next period availableto retrieve the two values and compute whatever output value/action isrequired. This provides a much more relaxed condition for determiningthe MCU clock, which can now be proven to be:

Fcy ≥ P / Tmin

WhereP is the number of clock cycles required to retrieve the capturedvalues and compute any output action/value. Assuming P = 20, we obtainFcy ≥ 200 kHz, or a reduction factor of approximately 100 over theprevious traditional methods.

Not only is this a much lowerfrequency than in any of the above examples, but we have also proventhat the CPU frequency is now independent from the resolution of themeasurement required (R). In practice, the current incarnations of theSMT peripheral use a 24-bit counter, and the capture registers are ofthe same size. This provides a great dynamic range for the input signal,and allows the optimal choice of the reference clock source—based onthe resolution (R) required, but independent from the CPU clock. As inprevious estimates:

Fr ≥ 2R/ * Tmin or Fr ≥ 160kHz with R =4,

In fact, the CPU can now be put in idle (if the reference clock and MCU clock are shared) or standby during each period, and can be awakened only upon completion of each measurement period. Further, a so called averaging mode can be used, to instruct the SMT state machine to keep accumulating period and duty-cycle values over a given window, andproducing a single interrupt at the very end. This can be used tofurther reduce the power consumption of the application, when the outputdoes not require an update at each period. Here, an averaging filtercan be beneficial to the application’s robustness.

/** Example 1: read PWM input using SMT1 and output a copy on PWM1*/#include “mcc.h”int main( void){    uint16_t duty;    // initialize I/O, peripherals, generated by MCC    SYSTEM_Initialize();    // main loop    while( TRUE)    {        MainTasks();        if (!SMT1_IsSignalAcquisitionInProgress())      // check if new measurement ready        {            duty = SMT1_GetPulseWidth();                 // get the input signal duty cycle            PWM1_LoadDutyValue( duty);                   // copy value to PWM output        }    } // main loop}

Listing 1. Example application using the SMT to reproduce a PWM input signal

The example code in Listing 1 features a (fictitious) application whose main activity is represented by the function MainTask() . In the background, an input PWM signal is measured bythe SMT module to extract its duty cycle value and then reproduced onan output PWM peripheral (possibly operating at a different frequency).For convenience, we used Microchip’s MPLAB Code Configurator (MCC)—aplug-in for Microchip’s MPLAB X IDE that generates C code that is theninserted into a project—to generate all the required peripheralinitializations (see SYSTEM_Initialize() call).

Thanksto the use of the SMT module, the entire application can eventually bereduced to a simple test of the SMT acquisition-complete flag. Thanks tothe double buffering of the measurement values, we have an entireinput-signal period available to fetch the measured duty-cycle value andtransfer it (eventually scaled) to the output PWM module. Note that,should the Main task require a longer time to execute, we can choose touse interrupts (upon SMT measurement completion) to ensure that no cycleis missed. Once more, thanks to the SMT buffers, this interrupt will beof low priority and won’t require any particular consideration forinterrupt latency and/or processor clock speed.

In retrospect, itis interesting to notice how PWM modules are such a common and availableperipheral used to produce square-wave outputs . Yet, for traditional general-purpose MCUs that lack an SMT, receiving square-wave inputs can be the cause of so much complexity.

About Vertigo and LEGO Blocks
Some will say that this is nothing new; that this is what every peripheral on every MCU was meant to do in the first place. The SMT might seem like justanother specialized module. For years, the MCU industry has focused on asmall set of peripherals to cover the most fundamental needs of eachembedded-control design—communication peripherals (UART, SPI, I2 C),timing (Capture, Compare), motor control peripherals, analoginterfaces—and yet resisted going much beyond such a basic set for areason: the fear of specialization!

This is a very realslippery slope, as a peripheral can quickly become too specific to anapplication (or small set of applications), making the processor inwhich it is integrated too expensive and/or too tied to a “vertical”application segment/market (hence the reference to vertigo) .

TrueCore Independent Peripherals avoid this danger by keeping it small, andby their ability to seamlessly connect with each other, likeLEGO blocks, in order to provide an infinite number of possiblesolutions without ever adding, individually, a single cent of cost.

The Configurable Logic Cell
To bring the promise of this concept to fruition, Core Independent Peripherals need some form of glue logic inside the device for direct connection among modules, and theadditional flexibility to transport the correct signals in and out ofthe device efficiently. These are provided by two mechanisms called,respectively, the Configurable Logic Cell and Peripheral Pin Select .

TheConfigurable Logic Cell, or CLC, is often referred to as the firstamong the CIPs as, by itself, it can provide some of the key benefits ofcore independence touted above.

Figure 3. A Configurable Logic Cell macro block, as seen in the CLC Designer tool

Oneof the most entertaining ways to describe the CLC is to compare it toother (much larger) programmable logic devices, such as the FPGA withits “sea of gates.” The absurdity of the comparison is obvious when yourealize that a typical CLC module contains four macro blocks. That iswhy many jokingly refer to the CLC as a “puddle of gates.”

Yet,just like an FPGA, each CLC macro block has an input multiplexer capableof selecting its input signals from a multitude of internal “sources”(more on this shortly). The CLC can also be configured to perform one ofa small number of combinatorial and logic functions, including: AND-OR,OR-XOR, AND, D-type flop, S-R Latch, J-K flip flop, D-type latch. Theoutput of each macro block can be used internally or published directlyto output pins. Any output state change can be selected to further setflags and generate corresponding interrupts.

The CLC’s Power
There are three important aspects of the CLC that can help explain its true power:

  1. The operation of each CLC macro block is completely asynchronous with the MCU’s operation (independence). The speed of the CLC logic is equally unconstrained by the MCU’s own maximum clock speed, but rather by the output (pin) drivers available on chip (16 to 32 MHz, typically).
  2. The configuration of each block is controlled via special function registers (RAM) and, therefore, can be changed dynamically by the application as needed.
  3. The size and power consumption of a typical CLC block are minuscule. The power consumption is actually so small as to be impossible to detect with normal bench equipment (nanoAmperes) although, being composed of elemental logic gates in a CMOS process, it can be expected to grow linearly with the frequency of the signals applied to its inputs.

Thanksto these characteristics, we can see how the CLC alone can providerelief to the typical embedded MCU’s CPU. Once more, an example (albeitmuch shorter this time) will help clarify the concept…

Let’sassume we are developing a very low power, battery-operated applicationwhere a small number of inputs (A, B and C) must be monitored for aspecific alarm condition to occur. Let’s further specify thatthis condition is recognizable by its sequence: (A and B) then C. Whenthis sequence occurs, the MCU is activated and some specific(intelligent) action must be performed, although we will not speculateany further about its complexity.

Key to the long battery life ofthe application is to maintain the MCU in the lowest power-consumptionmode possible (standby), where only a few nanoAmperes will be consumed.

Atraditional approach would require the MCU to wake up on each inputstatus change, which would tie the power consumption of the applicationdirectly to the most dominant of the inputs signals. If, for example, Bis pulsed every 100 µs, on average, but A is pulsed only once a second,our application will end up performing a wake up 10,000 times a second.And, for 9,999 of those, it will only return back to sleep afterverifying that B is false.

Figure 4. Using the CLC for smart wake up

Byconfiguring a pair of CLC macro blocks as an S-R latch followed by anAND gate (see Figure 4), we can produce a single wake-up event. Thishappens only when the specific sequence is realized, which renders theapplication’s power consumption independent of any individual inputfrequency. Depending on the nature of the inputs, as in the exampleabove, this can mean a power-consumption reduction and correspondingbattery-life extension of several orders of magnitude.

Beyond the Basics
Asmentioned before, the true power of the core independent peripherals isfully realized when we start clicking those blocks together; composingnew, perhaps never before imagined, functional blocks that relieve theCPU from potentially large computational loads and real-timeconstraints.

In the next installment of this three-part series, wewill explore a few more examples of core independent peripherals; andwe will see how easily we can combine them to achieve further designsimplifications and power-consumption reductions.


  • PIC10F32X – a 6-pin MCU family featuring the CLC
  • PIC16F150X – a small pin count MCU family featuring several core independent peripherals, including: CLC, NCO, CWG
  • PIC16F161X – a small pin count MCU family featuring several core independent peripherals, including the SMT and 100 mA output drivers
  • AN1473 – Application note: Various Solutions for Calculating a Pulse and Duty Cycle
  • AN1450 – Application note: Delay Block De-bouncer
  • AN1451 – Application note: Glitch Free Design
  • DS41631 – Tips and Tricks with the Configurable Logic Cell
  • DS41632 – Tips and Tricks with the new (Core Independent) Peripherals
  • ISBN: 97813129077759 – Di Jasio – This is (not) Rocket Science

Lucio Di Jasio isthe EMEA Business Development Manager for Microchip Technology Inc. Hehas held various technical and marketing roles within the Company’s 8,16 and 32-bit divisions for the past 20 years. As an opinionated andprolific technical author, Lucio has published numerous articles andseveral books on programming for embedded-control applications.Following his passion for flying, he has achieved both FAA and EASAprivate pilot license certifications. You can read more about Lucio’slatest books and projects on his blog.

Join over 2,000 technical professionals and embedded systems hardware, software, and firmware developers at ESC BostonMay 6-7, 2015, and learn about the latest techniques and tips forreducing time, cost, and complexity in the development process.

Passes for the ESC Boston 2015 Technical Conferenceare available at the conference's official site, with discountedadvance pricing until May 1, 2015. Make sure to follow updates about ESCBoston's other talks, programs, and announcements via the Destination ESC blog on and social media accounts Twitter, Facebook, LinkedIn, and Google+.

The Embedded Systems Conference, EE Times, and are owned by UBM Canon.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.