High-performance embedded computing -- Multiprocessor and multicore architectures

João Cardoso, José Gabriel Coutinho, and Pedro Diniz

December 19, 2017

João Cardoso, José Gabriel Coutinho, and Pedro DinizDecember 19, 2017

Editor's Note: Interest in embedded systems for the Internet of Things often focuses on physical size and power consumption. Yet, the need for tiny systems by no means precludes expectations for greater functionality and higher performance. At the same time, developers need to respond to growing interest in more powerful edge systems able to mind large stables of connected systems while running resource-intensive algorithms for sensor fusion, feedback control, and even machine learning. In this environment and embedded design in general, it's important for developers to understand the nature of embedded systems architectures and methods for extracting their full performance potential. In their book, Embedded Computing for High Performance, the authors offer a detailed look at the hardware and software used to meet growing performance requirements.

Elsevier is offering this and other engineering books at a 30% discount. To use this discount, click here and use code ENGIN318 during checkout.

Adapted from Embedded Computing for High Performance, by João Cardoso, José Gabriel Coutinho, Pedro Diniz.


By João Cardoso, José Gabriel Coutinho, and Pedro Diniz

Modern microprocessors are based on multicore architectures consisting of a number of processing cores. Typically, each core has its own instruction and data memories (L1 caches) and all cores share a second level (L2) on-chip cache. Fig. 2.4 presents a block diagram of a typical multicore (a quad-core in this case) CPU computing system where all cores share an L2 cache. The CPU is also connected to an external memory and includes link controllers to access external system components. There are, however, multicore architectures where one L2 cache is shared by a subset of cores (e.g., each L2 cache is shared by two cores in a quad-core, or is shared by four cores in an octa-core CPU). This is common in computing systems with additional memory levels. The external memories are often grouped in multiple levels and use different storage technologies. Typically, the first level is organized using SRAM devices, whereas the second level uses DDRAMs.

click for larger image

FIG. 2.4 Block diagram of a typical multicore architecture (quad-core CPU).


Several platforms provide FPGA-based hardware extensions to commodity CPUs. Examples include the Intel QuickAssist QPI-FPGA [10], IBM Netezza [11], CAPI [12], and Xilinx Zynq [13]. Other platforms, such as Riffa [14], focus on vendor-independent support by providing an integration framework to interface FPGA-based accelerators with the CPU system bus using the PCI Express (PCIe) links.

Other system components, such as GPIO, UART, USB interface, PCIe, network coprocessor, and power manager, are connected via a fast link possibly being memory mapped. In other architectures, however, the CPU connects to these subsystems (including memory) exclusively using fast links and/or switch fabrics (e.g., via a partial crossbar), thus providing point-to-point communication channels between the architecture components.

Continue reading on page two >>


< Previous
Page 1 of 2
Next >

Loading comments...