The peripheral hub: backbone of efficient system-on-chip design -

The peripheral hub: backbone of efficient system-on-chip design


Over the past decade electronic devices have been getting smaller and lighter, yet packed with more functionality. For example, the first Apple iPod, unveiled in 2001, weighed 6.5 oz and had a capacity of 1000 songs. In contrast, the latest iPod shuffle weighs but 1.1 oz and allows users to hold 4,000 songs in their pockets. This growth is not without its cost. Embedded systems developers face unprecedented challenges in their efforts to rapidly deliver competitive products. Many are turning to system-on-chip (SoC) technology to improve time-to-market, cost, performance, design reuse, and operating longevity of their products.

With ARM cores growing in popularity, a single Cortex core no longer offers sufficient competitive advantage in itself. The primary factors influencing a designer’s SoC choice are the features and interconnect system of peripherals. Thus, vendors need to consider the design of peripherals and their interconnecting system in order to differentiate their products.

An SoC is often composed of multiple systems, including digital, analog, CPU, and programmable routing and interconnect subsystems. To secure system function coherence and thereby maximize design efficiency, vendors have been putting significant emphasis into their peripheral hub designs to increase bandwidth, reduce latency, and offload processor overhead.

For example, Atmel’s bus matrix implements a multi-layer advanced high-performance bus (AHB) that enables parallel access paths between multiple AHB masters and slaves. Texas Instruments (TI) has its advanced high-performance bus (AHB) and advanced peripheral bus (SPB) along with uDMA to connect between system peripherals, serial peripherals, and analog peripherals.

Cypress utilizes a peripheral hub (PHUB) responsible for data transfer between the CPU and peripherals, as well as transfers between peripherals (Figure 1 ). Most of the resulting architectures also make use of various forms of a direct memory access (DMA) controller to allow more efficient use of the processor and available bus bandwidth. We will discuss peripheral hub implementations in SoC designs and explore key performance factors.

Figure 1: PSoC 5LP peripheral hub system architecture

Peripheral access flexibility
Peripheral hubs are used to connect the CPU to memory and peripherals. In order to increase bandwidth, different vendors choose different connection methods. Atmel’s SAM3N series, for example, uses a multi-layer AHB that enables parallel access paths between multiple masters and slaves. Atmel's Bus Matrix can support up to 3 masters and 4 slaves, and each master and slave is assigned dedicated accesses. For example, Slave 0 can access internal SRAM, Slave 1 is dedicated to internal ROM, etc.

Cypress’s PSoC 5LP series makes use of spoke arbitration in place of dedicated access (Table 1 ). Spoke arbitration offloads demand from the CPU and DMAC, and then each of the 8 spokes is connected to one or more peripheral blocks. For example, Spoke 0 connects to SRAM, Spoke 1 handles I/O interface and the port interrupt control unit.

Table 1: Spoke configuration in PSoC 5LP

These different implementations help designers maximize peripheral flexibility and scalability. First, peripherals can have a data width wider than their spoke. A Delta-Sigma ADC, for example, can support up to 20-bit data although it is placed in a 16-bit spoke using an internal FIFO. One peripheral can extend across multiple spokes. Peripherals of different data widths can also be connected to a single spoke.

Second, the CPU and DMAC can access different spokes simultaneously. Both accesses are also independent, which translates to a multiprocessing environment and reduced access latency. For designers utilizing on-chip programming resources to create their own virtual chips and programmable signal chain, a Spoke + internal FIFO approach maximizes the possibilities.

Efficient processor usage
DMA has been widely adopted tomaximize efficient processor usage. It is often presented as a DMAcontroller (DMAC), micro-DMA (uDMA), or peripheral DMA controller(PDMAC) that serve one common purpose: to offload processor overhead byreducing the need for CPU intervention.

TI’s uDMA, for example,enables more efficient use of the processor and available bus bandwidthby supporting memory-to-memory, memory-to-peripherals, andperipherals-to-memory transfers with multiple modes. Its usage is alwayssubordinate to the processor core, so it never delays a bus transactioninitiated by the processor. Since the uDMA uses only otherwise-idle buscycles, the data transfer bandwidth is essentially free, with no impacton the performance of the rest of the system.

Cypress’s DMAcontroller (DMAC) can also transfer data including SRAM, Flash, EEPROM,and other system blocks. To minimize CPU interruptions by the DMAcontroller, every DMA channel goes through the following phases toperform data transfers:

Arbitration phase: The DMAC selects which DMA channel to process based on priority.
Fetch phase: The DMAC fetches the transaction descriptors and DMA channel details.
Source engine phase: The source engine selects the spoke to which the source peripheral isconnected. When the spoke is available for data transfer, the datatransfer from the source begins.
Destination engine phase: This phase selects the spoke on which the destination peripheral isavailable. When the spoke is available, the data collected in the sourceengine phase is transferred to the destination peripheral.
Write-back phase: The TD and DMA channel configurations are updated after data transfer is completed.

Inintra-spoke DMA transfers, because the source and destination reside onthe same spoke, the 16-byte internal FIFO of the PHUB can be used as anintermediate buffer.

There are 24 DMA channels in a PSoC 5LP(Figure 2), so assigning priority for channels is critical for optimizedperformance as well. Each channel can take a priority from 0 to 7, with0 being the highest priority. The DMAC supports two methods to handlepriority: Simple Priority, and Grant Allocation Fairness algorithm:
Simple Priority: With this method, high priority channels can interrupt low priority channels.
Grant Allocation Fairness algorithm: With this method, channel 0 and 1 take the highest priority; no otherpriority can interrupt channels with priority 0 and 1. A DMA Channel ofpriority 0 and priority 1 can occupy the bus 100%. The rest of thepriorities share the bus based on the number of channels requested atthat time. Because priority 0 has higher priority than 1, priority 0 caninterrupt priority 1.

The DMAC uses a round-robin method tohandle DMA channels with the same priority. In this case, the DMAchannel least recently executed takes priority. The execution of samepriority DMA channels when the round-robin algorithm is enabled dependson the last time a channel was enabled, or if the last time is the samefor 2 channels, then DMA Channel with the lower number takes higherpriority.

Figure 2. PSoC 5LP system architecture

Besidesperipheral accessibility and efficient processor usage, vendors arealso looking at how DMAs can help with external peripheral integration,such as TI’s exteral peripheral interface (EPI), which is also supportedby uDMA. This provides an interface to a variety of common externalperipherals.

The PSoC architecture, in the meanwhile, integratesthe maximum number of peripherals into a single chip. The good news isthat the technology of peripheral integration has evolved significantlyin the past decade of SoC designs, which translates to more options forembedded system designers.

Meng He graduated fromMarquette University with Master of Science degree in ElectricalEngineering and has been working at Cypress Semiconductor as a productmanager since 2007. You can contact Meng at .

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.