Increase embedded processor efficiency through the use of distributed CPU blocks -

Increase embedded processor efficiency through the use of distributed CPU blocks


This “Product How-To” article focuses how to use a certain product in an embedded system and is written by a company representative.

In then the past few years we have seen multiprocessing systems become more mainstream, in fact most modern personal computer CPUs now feature symmetric multiprocessing systems (SMP), where multiple instantiations of the same processor share the processing burden of the applications running on the PC.

While SMPs are quite common today, we typically have not seen a shift towards multiprocessing in embedded computing. However, a new type of embedded design technique gives engineers the freedom to intelligently distribute processing functions across a digital subsystem.

This article will look at an example of the distributed processing technique using Cypress Semiconductor's PSoC 3 and PSoC 5 architectures, which consist of a main CPU (in this case an 8051 or ARM Cortex M3), a DMA engine, and array of Universal Digital Blocks (UDB).

The UDBs effectively serve as an array of mini-processors. By distributing processing functions across such a subsystem the engineer can increase the efficiency of the overall system by offloading less computationally complex processing functions.

There are multiple benefits to breaking up processing functions across multiple functional blocks, the largest of which is a reduction in active power consumption. By lowering the burden on the CPU of processing MIPS hungry – but computationally simple functions such servicing interrupts – it is possible to run the application at a lower frequency since the CPU does not have does not have to burn instruction cycles on the less complex functions in addition to all of the functions in the application.

This reduces the power consumption of the overall applications in two ways. The first benefit is obvious -by reducing the CPU clock you see a linear decrease in active power consumption as the clock speed is reduced.

The second benefit, while perhaps more subtle, it is equally as important:the CPU has roughly 10X more logic gates than the UDB, by offloading processing functions to the mini-processors from the main CPU the number of logic gates that will be toggle to complete a processing function the active power consumption is further significantly improved.

In addition to significantly reducing active power consumption in an lication, another benefit of distributed processing isthat the CPU is freed from the burden of the more mundane processing. It can then focus its MIPS on functions that better take advantage of the features of the CPU, such as more computationally intensive functions like multiply and divide instructions.

To understand how it is possible to break up the processing functions across the architecture, we will take a look at a common embedded application as an example: Brushless DC Motor control. But first let's take a look under the hood and examine the PSoC 3 and PSoC 5 digital subsystem to understand its capabilities.

Under the hood of PSoC 3/5 devices
The PSoC 3 and PSoC 5 devices share a common platform architecture, which means that fundamentally the hardware is the same across the two families. The PSoC3 and 5 platform architecture consists of 4 main functional blocks:

CPU subsystem : contains the main CPU (either 8051 or Cortex M3) and all of the supporting IP including the interrupt controller, debug hardware, and DMA controller. Other system functions are also located in the CPU subsystem such as clocking, power management, and system memory. The CPU combined with the DMA engine provides us with two of the key components necessary for implementing distributed processing functions.

Digital Subsystem . Another key element in the PSoC 3 and PSoC 5 family architecture, the digital subsystem enables the implementation of the distributed processing systems. The digital subsystem in PSoC 3 and PSoC 5 consists primarily of an array of flexibly programmed Universal Digital Blocks.

As seen in Figure 1 below , the UDB hardware contains a datapath element consisting of an 8-bit microcomputer capable of standard processing functions such as shifts, adds, and compares.

Figure 1. Each PSoC's UDB block contains a datapath element which essentially a mini-processor 8-bit processor capable of standard processing functions such as shifts, adds, and compares.

The datapath elements are also coupled with a PLD fabric which can be used to implement custom logic function or even look up tables for the datapath element to use as reference.

These UDBs are used to implement many standard peripheral functions such as PWMs, Timers, and SPI, but they can also be used to implement custom peripheral functions. This flexibility is the fundamental reason why these PSoCs can implement distributed processing functions.

The UDB array features up to 24 of these UDB's as well as a flexible routing matrix which enables the user to connect multiple UDBs ( Figure 2 below ) together to create larger and more complex processing functions.

Figure 2. A PSoC's UDB array can feature up to 24 UDBs as well as a flexible routing matrix which enables the user to connect multiple UDBs

Analog subsystem . The PSoC 3 and PSoC 5 families also feature a high performance and programmable analog subsystem which contains all the components to create a full analog signal chain, including up to a 20-bit Analog to Digital conversion stage, a digital filter block for signal conditioning, and Digital to Analog conversion stage.

In the context of this discussion on distributed processing, the analog subsystem can perform pocroccesing on analog inputs before sending them to the digital subsystem or CPU for further data processing.

Programmable routing interconnect . Seen on the far right of the block diagram, the programmable routing and interconnect subsystem contains a flexible routing matrix which is connected to I/O's as well as the digital, analog, and CPU subsystems. This functional block enables the ability for uses to define where the on-chip signals are routed to, enabling the ability to create processing systems that multiple subsystems.

Examples of PSoC UDB distributed processing
After this brief look at the underlying architecture, we can now look at how to use such distributed processing elements to increase our system efficiency.

One common embedded control function that illustrates the benefits of such distributed processing is sensored brushless DC (BLDC) motor control .

The traditional method of controlling a sensored BLDC is to that motor rotates and causes the logic levels of three Hall-effect sensors to change state. In a typical MCU-sensored BLDC control system, the processor receives an IO interrupt each when the state changes.

The CPU then determines and then adjusts which motor coils to connect to the PWM output and drive. This creates a large interrupt burden on the CPU, effectively taking away CPU MIPS to service the interrupts rather than perform the other processing functions that may need attention in the application.

In addition, the faster the motor runs, the more often the CPU is interrupted. Furthermore, adding additional motors to the application further complicates the problem, as there is no way to sync the two (or more) motors reliably to ensure that that the hall sensors do not trigger simultaneous – yet independent – interrupts with the same priority.

There must be another way, right? Well, indeed there is. The PSoC 3 and PSoC 5 family architecture provides a great example of how one can distribute processing across an array of microcontrollers to offload these interrupt intensive operations. By simply implementing a hardware look up table in the PLD fabric of the UDB's, the CPU no longer needs to be interrupted.

Instead of sending the interrupts to the CPUs interrupt controller, the Hall-effect sensor inputs are fed directly into the hardware look up table which then will determine which output will receive the PWM signals. In this implementation, the CPU only gets interrupted when the speed of the motor changes.

By implementing a look up table in the UDB architecture and using the UDB datapath element to compare the data, the CPU no longer needs to be involved in the processing of the interrupt.

Another example of application of distributed processing is using the DMA for data transfer intensive applications , such as an I2S to USB (recording) or USB to I2S (playout) application.

By implementing the I2S block in the UDB architecture and using the DMA to transfer the data between the two functional bocks and an SRAM block in the middle of the transaction, a very small percentage of the CPU cycles are used to control the data flow. Using the DMA in this manner is extremely beneficial in an application where one communication protocol is based on a burst intensive scheme, such as USB and one that is a steady data rate such as I2S.

With the availability of the new PSoC 3 and PSoC 5 platform, engineers now have another tool in their system design tool belt. By looking at an embedded application as a combination of processing functions that can be divided up and distributed across an a collection of processing subsystems, engineers can now optimize their embedded system efficiency and lower the system power consumption.

Loren Hobbs is PSoC product marketing manager at Cypress Semiconductor.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.