CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS

Providing memory system and compiler support for MPSoc designs: Part 1
Memory Architectures



Embedded.com

DRAM
DRAMs have been used in a processor-based environment for quite some time, but the context of their use in embedded systems - both from a hardware synthesis viewpoint and from an embedded compiler viewpoint - have been investigated relatively recently.

DRAMs offer better memory performance through the use of specialized access modes that exploit the internal structure and steering/buffering/banking of data within these memories. Explicit modeling of these specialized access modes allows the incorporation of such high-performance access modes into synthesis and compilation frameworks.

New synthesis and compilation techniques have been developed that employ detailed knowledge of the DRAM access modes and exploit advance knowledge of an embedded system's application to improve system performance and power.

A typical DRAM memory address is internally split into a row address consisting of the most significant bits and a column address consisting of the least significant bits. The row address selects a page from the core storage, and the column address selects an offset within the page to arrive at the desired word.

When an address is presented to the memory during a READ operation, the entire page addressed by the row address is read into the page buffer, in anticipation of spatial locality. If future accesses are to the same page, then there is no need to access the main storage area since it can just be read off the page buffer, which acts like a cache. Thus, subsequent accesses to the same page are very fast.

A scheme for modeling the various memory access modes and using them to perform useful optimizations in the context of behavioral synthesis has been described. The main observation is that the input behavior's memory access patterns can potentially exploit the page mode (or other specialized access mode) features of the DRAM.

The key idea is the representation of these specialized access modes as graph primitives that model individual DRAM access modes such as row decode, column decode, precharge, and so on; each DRAM family's specialized access modes are then represented using a composition of these graph primitives to fit the desired access mode protocol.

These composite graphs can then be scheduled together with the rest of the application behavior, both in the context of synthesis and for code compilation. For instance, some additional DRAM-specific optimizations include:

1. Read-modify-write (R-M-W) optimization that takes advantage of the R-M-W mode in modern DRAMs, which provides support for a more efficient realization of the common case in which a specific address is read, the data are involved in some computation, and then the output is written back to the same location.

2. Hoisting, whereby the row-decode node is scheduled ahead of a conditional node if the first memory access in both branches is on the same page.

3. Unrolling optimization in the context of supporting the page mode accesses.

Synchronous DRAM
As DRAM architectures evolve, new challenges are presented to the automatic synthesis of embedded systems based on these memories. Synchronous DRAM represents an architectural advance that presents another optimization opportunity: multiple memory banks.

The core memory storage is divided into multiple banks, each with its own independent page buffer, so that two separate memory pages can be simultaneously active in the multiple page buffers.

There a number of problems modeling the access modes of synchronous DRAMs, including:

  • burst mode read/write: fast successive accesses to data in the same page
  • interleaved row read/write modes: alternating burst accesses between banks
  • interleaved column access: alternating burst accesses between two chosen rows in different banks

Memory bank assignment can be performed by creating an interference graph between arrays and partitioning it into subgraphs so that data in each part are assigned to a different memory bank. The bank assignment algorithm is related to techniques that address memory assignment for DSP processors such as the Motorola 56000, which has a dual-bank internal memory/register file.

The bank assignment problem is targeted at scalar variables and is solved in conjunction with register allocation by building a constraint graph that models the data transfer possibilities between registers and memories followed by a simulated annealing step. Note that such techniques can be particularly suitable for MPSoCs in which a single reduced instruction set computing (RISC)-like core manages multiple DSP-like slave processors.

Chang and Lin approach the SDRAM bank assignment problem by first constructing an array distance table. This table stores the distance in the DFG (dataflow graph) between each pair of arrays in the specification. A short distance indicates a strong correlation, possibly indicating that they might be, for instance, two inputs of the same operation and hence would benefit from being assigned to separate banks. The bank assignment is finally performed by considering array pairs in increasing order of their array distance information.

Whereas the previous discussion has focused primarily on the context of hardware synthesis, similar ideas have been employed to exploit aggressively the memory access protocols for compilers. In the traditional approach of compiler/architecture co-design, the memory subsystem was separated from the microarchitecture; the compiler typically dealt with memory operations using the abstractions of memory loads and stores, with the architecture (e.g., the memory controller) providing the interface to the (typically yet unknown) family of DRAMs and other memory devices that would deliver the desired data.

However, in an embedded system, the system architect has advance knowledge of the specific memories (e.g., DRAMs) used; thus we can employ memory-aware compilation techniques that exploit the specific access modes in the DRAM protocol to perform better code scheduling. In a similar manner, it is possible for the code scheduler to employ global scheduling techniques to hide potential memory latencies using knowledge of the memory access protocols and in effect, improve the ability of the memory controller to boost system performance.

Special Purpose Memories
In addition to the general memories such as caches, and memories specific to embedded systems, such as SPMs, there exist various other types of custom memories that implement specific access protocols. Such memories include memory implementing last-in, first-out protocol (LIFO), memory implementing queue or first-in, first-out protocol (FIFO), and content-addressable memory (CAM). Typically, CAMs are used in search applications, LIFOs are used in microcontrollers, and FIFOs are used in network chips.

Next in Part 2: Customization of memory architectures

This series of articles is based on copyrighted material submitted by Nikil Dutt and Mahmut Kandemir to "Multiprocessor Systems-On-Chipsp  edited byWayne Wolf and Ahmed Amine Jerraya. It is used with the permission of the publisher, Morgan Kaufmann, an imprint of Elsevier. The book can be purchased on-line.

Mahmut Kandemir is an assistant professor in the Computer Science and Engineering Department at Pennsylvania State University. Nikil Dutt is a professor of computer science for Embedded Computer Systems at the University of California, Irvine.

1 | 2 | 3

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :