Not very long ago, embedded systems were comprised of logic spread across multiple chips – for example, the CPU subsystem. Memory, the analog components, and so on, were each on their own IC. The advantage was that each chip could be independently designed at its appropriate process technology node (90nm, 130nm, etc.). However, the inter-connection between the chips consumed a significant amount of power, and there was high inter-chip communication latency as well as even higher risk of failure.
Then came the era of the SoC (System on Chip) where the various components (digital logic, analog logic, and memory sub-system) were placed on the same silicon chip. Thus the increased power consumption and latency issues observed with inter-chip communication in the previous design mechanism was ruled out. However, the disadvantage was that as the components were integrated in the same silicon, they had to be built at the same technology node (65nm, 40nm, 28nm, and so on).
While some logic, particularly the main processor, adds a greater value if it is designed at the latest technology node, the other components, such as memory, might not add that much value. However, the dilemma is that using traditional System-on-Chip methods, the chip designer has to choose the same process technology for all the logic. In addition to forcing the designer to create a less than silicon-efficient design, this also hindered reusability of some blocks, such as the CPU subsystem, as independent verified logic in the next system design.
The best of both worlds approach that the electronics industry has come up with to solve this dilemma is the System in Package (SiP) in a 2D package. Here multiple chips (DIE) are placed on a common substrate. Thus there can be a CPU-subsystem on one die, a memory subsystem on another, and analog logic on a third die (Figure 1 ).
Each DIE can be designed at its appropriate process technology node and later reused in later designs as Known Good DIEs (KDGs) which have been tested at wafer level. This reduces the time to market for the complete system, as KGDs can be reused with newer dies. The substrate carries a low power, low latency, high speed communication link between the DIEs.
The DIEs are mounted on the SiP substrate by flip-chip bumps. The copper tracks on the substrate are many times wider than that on the DIEs. However, this creates a problem of placement, as it limits the number of dies that can be accommodated on the substrate because of routing congestion, which ultimately impacts performance and power consumption.
The solution to this problem is 2.5D packaging, where a silicon interposer is added between the DIEs and the substrate. The interposer is a low power, low latency, high speed communication link between the chips. The DIEs are mounted on the interposer using micro bumps that are many times smaller than the flip-chip bumps connecting the interposer to the substrate.
The copper tracks on the upper layer of interposer are almost of the same dimensions as those on the DIEs. The interposer has TSV (Through Silicon Vias – copper connections) connecting the tracks on the upper layer to those on the lower layer. It allows thousands of connections between the dies. Thus, by using the interposer the routing/congestion problems faced in 2D design are overcome.
A natural successor is 3D, where multiple DIEs are mounted one on top of the other with TSVs running through the lower dies to allow inter-die communication and provide power to all the dies. Further elaboration of 3D technology is out of scope from our technical paper’s perspective.
System design with 2.5D
Show in Figure 3 is a typical 2.5D System in Package (SiP) architecture, in which DIE1/DIE2 consist of two CPU cores with l2 cache interfaced by two AXI master interfaces. It has some slow peripherals (UART, GPIO, I2S, I2C, SPI, BootROM) interfaced by an APB interface.
The DIE to DIE (D2D) interface runs through the interposer. It consists of two AXI-AXI bridges (each AXI bridge consists of a master and slave port), 64 GPIO lines (32GPIOs in which DIE1 is configured as input and DIE2 as output and the remaining 32GPIO lines are configured for DIE1 as output and DIE2 as input), Interrupt lines, and DMA channels.
The interposer also branches the system reset to each DIE to allow both the DIEs to reset together. The D2D interface determines the visibility of IPs of the neighboring DIE to the host DIE. In this case, as each DIE has two master/slave interfaces, it can communicate to some of the IPs of the neighboring DIE, which will be discussed later.
DIE1 and DIE2 are virtually identical, except that DIE1 has a DDR controller and PHY interface connected to a DRAM chip, while DIE2 does not. Further, DIE1 has I2S and an I2C interfaced codec, which is missing in DIE2. The DIE2 subsystem has an LCD panel mounted on GPIO pins. Each DIE has its own memory map in which D2D slave interfaces occupy a section.
The system is designed in such a way that a part of DDR (first 512MB) is shared between the two DIEs. The interaction between DIE1 and DIE2 via the DIE-to-DIE interface is illustrated in Figure 4. Thus, DIE2 is able to access the first 512MB of DDR through its D2D slave interfaces. Please note this is only a partial view of our memory map, which serves to illustrate access of DDR by DIE2 via DIE-to-DIE interface/communication.
Shown in Figure 4 is the DIE-to-DIE communication, in which DIE2 communicates with the DDR of DIE1 through its D2D slave interface via the D2D Bridge. However, it is equally possible for DIE1 to communicate with any functional component of DIE2 through its D2D slave interface. For the sake of keeping the design simple, this has been avoided.
Software demonstration of DIE-to-DIE interaction
Shown in Figure 5 is a simple implementation that illustrates DIE-to-DIE interaction.DIE2 has a GPIO interface to a LCD panel. It displays images on the LCD,which is obtained from DIE1. DIE1 has I2S and I2C interfaced codec. Itplays audio data obtained from DIE2. DIE-to-DIE GPIO has been used as asignaling mechanism between DIE1 and DIE2.
Execution of atypical system operation typically starts from Boot ROM (0xFFFF0000).Both the DIEs fetch their first instruction from Bootrom. DIE1 and DIE2initialize the CPU cores and caches. DIE2 is made to wait on aDIE-to-DIE GPIO to allow DIE1 to initialize DDR Controller and PHY andcopy the images (to be displayed by DIE2) from its NOR (which has aparallel interface via Static Memory Controller) to a pre-decidedlocation in the shared section of DDR (first 512MB).
Butthis operation restricts DIE2 from accessing DDR (via the D2Dinterface) until DDR has been initialized by DIE1. On receiving the GPIOsignal, DIE2 starts copying the audio data (for DIE1) from its NOR(interfaced by Static Memory Controller) to a pre-decided location inthe shared section of DDR (first 512MB) via its DIE-to-DIE (D2D) slaveinterfaces.
In the meantime, DIE1is waiting for a D2D GPIOsignal. After copying the audio data, DIE2 sends a D2D GPIO signal toDIE1. Further, DIE2 initializes the LCD Controller and starts renderingdata to the LCD via the GPIO from the DDR location (where DIE1 hadcopied the images) via its DIE-to-DIE (D2D) slave interfaces.
Asshown in Figure 6, on receiving the D2D GPIO signal DIE1 initializesthe I2S and codec. It starts rendering audio data from DDR (where DIE2had copied the audio data) to the codec via the I2C interface. HerePNOR1/2 denotes NOR with static memory controller interface. The DDR ispartitioned for use by DIE1 and DIE2 as shown in Figure 7 .
Onall the categories important to a system designer – design simplicity,reusability of hardware IP, power consumption, and time to market, 2.5Dtechnology is a step ahead of single DIE System on Chip. Refer to Figure 8 for a comparative description of 2.5D vs single DIE SoC.
Inreal-world scenarios, 2.5D is best used with a more heterogeneoussystem architecture where we have the CPU subsystem and digital logic(such as USB, PCIE, SATA, etc.) on one DIE, memory subsystem (SRAM, DDR3controller) on another die, analog components (DDR-PHY, USB PHY) on athird die, and thermal sensors and power regulators on a fourth DIE.Such complex designs would derive even greater benefits from the 2.5Dapproach.
Ayan Kumar Halder is a System Software Engineer at Open-Silicon Research Pvt Ltd. His work involves designing/porting bare-metal software, kernels, bootloaders and linux drivers on ARM based custom SOCs. He has contributed to porting Open Virtualization , an open-source kernel-based virtual machine SDK on Versatile Express. He received his Masters in Computer Science from University of Pune, India.