CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS

Providing memory system and compiler support for MPSoc designs: Part 2
Customization of memory architectures



Embedded.com

Split spactial and temporal caches
Various specialized memory structures proposed over the years could be candidates for MPSoC-based embedded systems. One such concept is split spatial/temporal caches.

Variables in real life applications present a wide variety of access patterns and locality types (for instance scalars, such as indexes, usually present high temporal and moderate spatial locality, whereas vectors with small stride present high spatial locality, and vectors with large stride present low spatial locality and may or may not have temporal locality).

Several approaches have proposed splitting a cache into a spatial cache and a temporal cache that store data structures with high temporal and high spatial locality, respectively. These approaches rely on a dynamic prediction mechanism to route the data to either the spatial or the temporal caches, based on a history buffer.

In an embedded system context, the approach of Grun et al. uses similar split-cache architecture but allocates the variables statically to the different local memory modules, avoiding the power and area overhead of the dynamic prediction mechanism.

Thus, by targeting the specific locality types of the different variables, better utilization of the main memory bandwidth is achieved. The useless fetches due to locality mismatch are thus avoided. For instance, if a variable with low spatial locality is serviced by a cache with a large line size, a large number of the values read from the main memory will never be used.

The approach described by Grun et al.shows that the memory bandwidth and memory power consumption could be reduced significantly. Note that, in an MPSoC-based architecture, each processor may demand a customized cache (or SPM) for the best behavior.

Reconfigurability and Challenges
In MPSoC-based embedded systems, modifying a given code to improve data locality is one way of enhancing performance. An alternative approach is to reconfigure the cache (or SPM) architecture dynamically according to the application at hand.

That is, it might be useful to have a morphable (reconfigurable) memory/cache system that adapts itself to the application's requirements (from both performance and energy/power consumption angles) dynamically. In fact, an optimizing compiler can analyze a given application, divide its code into regions, and, for each region, select an optimum cache configuration for each processor. However, there are several key issues that need to be addressed in translating the promise of reconfigurable cache architectures into practice:

  • Architectural and circuit mechanisms for efficient and fast reconfiguration are essential. This issue has been the focus of several recent efforts that provide architectural mechanisms to support dynamic reconfiguration of the cache and memory parameters. It is important to note that the support for reconfiguration can also increase the access and energy costs of a reconfigurable cache, in contrast to that of a non-reconfigurable cache with identical cache parameters such as cache size (capacity) and associativity. This cost depends on the flexibility supported by the reconfigurable cache and the implementation mechanism.
  • Control mechanisms for deciding when to reconfigure these caches are required. The reconfiguration can be performed at various levels of code granularity such as entire application, a single subroutine, a nested loop, or some specific segment of the application. The chosen level of granularity will depend on the overhead associated with reconfiguration and the potential benefits due to better customization with more frequent cache reconfiguration.
  • Mechanisms to determine the optimal configuration of the cache are required. The desired cache configuration is a function of the application behavior and can be determined either statically using compile-time estimates or dynamically using run-time behavior. For example, the cache miss rates can be used as a metric to resize the caches dynamically.
  • Techniques for minimizing the overhead of data invalidation across different reconfiguration phases are essential. For example, if the associativity of a cache is changed, the function for mapping the memory location onto the cache changes. Consequently, the data that can be reused across reconfiguration need to be invalidated after reconfiguration.

Kadayif et al.  focus on a morphable cache architecture and array-dominated embedded codes (which are suitable for an MPSoC-based environment) and show the potential benefits that can be obtained from such a system. They conduct a limit study for potential benefits (from energy and performance perspectives) for going from one configuration to another.

The granularity that they focus on is a nested loop, which is the natural computation/access pattern boundary for array-dominated applications from a scientific domain and an image/video processing domain. Using a set of array-dominated codes, they investigate what the best cache configuration is for each nested loop under different objective functions (optimization criteria) such as cache energy, memory energy, cache misses, performance (execution time), overall energy, and energy-delay (energy-execution time) product.

In addition to morphing conventional cache parameters, they also consider reconfigurability of energy-aware features found in some cache architectures such as block buffering. Their results indicate that there are potential performance and energy benefits in adopting a morphable cache subsystem.

The results also show that, depending on the optimization objective targeted, one may select an entirely different cache configuration. For example, minimizing cache memory energy requires a cache configuration for each nest that is different from an objective criterion that tries to minimize the overall memory system energy under a performance constraint.

To read Part 1, go to Types of Memory Architectures.
Next in Part 3: Compiler Support

This series of articles is based on copyrighted material submitted by Nikil Dutt and Mahmut Kandemir to "Multiprocessor Systems-On-Chipsp  edited byWayne Wolf and Ahmed Amine Jerraya. It is used with the permission of the publisher, Morgan Kaufmann, an imprint of Elsevier. The book can be purchased on-line.

Mahmut Kandemir is an assistant professor in the Computer Science and Engineering Department at Pennsylvania State University. Nikil Dutt is a professor of computer science for Embedded Computer Systems at the University of California, Irvine.

1 | 2

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :