Split spactial and temporal caches
Various specialized memory structures proposed over the years could be
candidates for MPSoC-based embedded systems. One such concept is split
spatial/temporal caches.
Variables in real life applications present a wide variety of access
patterns and locality types (for instance scalars, such as indexes,
usually present high temporal and moderate spatial locality, whereas
vectors with small stride present high spatial locality, and vectors
with large stride present low spatial locality and may or may not have
temporal locality).
Several approaches have proposed
splitting a cache into a spatial cache and a temporal cache that store
data structures with high temporal and high spatial locality,
respectively. These approaches rely on a dynamic prediction mechanism
to route the data to either the spatial or the temporal caches, based
on a history buffer.
In an embedded system context, the approach of Grun
et al. uses similar split-cache architecture but allocates the
variables statically to the different local memory modules, avoiding
the power and area overhead of the dynamic prediction mechanism.
Thus, by targeting the specific locality types of the different
variables, better utilization of the main memory bandwidth is achieved.
The useless fetches due to locality mismatch are thus avoided. For
instance, if a variable with low spatial locality is serviced by a
cache with a large line size, a large number of the values read from
the main memory will never be used.
The approach described by Grun et al.shows that the memory bandwidth
and memory power consumption could be reduced significantly. Note that,
in an MPSoC-based architecture, each processor may demand a customized
cache (or SPM) for the best behavior.
Reconfigurability and Challenges
In MPSoC-based embedded systems, modifying a given code to improve data
locality is one way of enhancing performance. An alternative approach
is to reconfigure the cache (or SPM) architecture dynamically according
to the application at hand.
That is, it might be useful to have a morphable (reconfigurable)
memory/cache system that adapts itself to the application's
requirements (from both performance and energy/power consumption
angles) dynamically. In fact, an optimizing compiler can analyze a
given application, divide its code into regions, and, for each region,
select an optimum cache configuration for each processor. However,
there are several key issues that need to be addressed in translating
the promise of reconfigurable cache architectures into practice:
- Architectural
and circuit mechanisms for efficient and fast reconfiguration are
essential. This issue has been the focus of several recent
efforts that provide architectural mechanisms to support dynamic
reconfiguration of the cache and memory parameters. It is important to
note that the support for reconfiguration can also increase the access
and energy costs of a reconfigurable cache, in contrast to that of a
non-reconfigurable cache with identical cache parameters such as cache
size (capacity) and associativity. This cost depends on the flexibility
supported by the reconfigurable cache and the implementation mechanism.
- Control
mechanisms for deciding when to reconfigure these caches are required.
The reconfiguration can be performed at various levels of code
granularity such as entire application, a single subroutine, a nested
loop, or some specific segment of the application. The chosen level of
granularity will depend on the overhead associated with reconfiguration
and the potential benefits due to better customization with more
frequent cache reconfiguration.
- Mechanisms
to determine the optimal configuration of the cache are required.
The desired cache configuration is a function of the application
behavior and can be determined either statically using compile-time
estimates or dynamically using run-time behavior. For example, the
cache miss rates can be used as a metric to resize the caches
dynamically.
- Techniques
for minimizing the overhead of data invalidation across
different reconfiguration phases are essential. For example, if the
associativity of a cache is changed, the function for mapping the
memory location onto the cache changes. Consequently, the data that can
be reused across reconfiguration need to be invalidated after
reconfiguration.
Kadayif et
al. focus on a morphable cache architecture and
array-dominated embedded codes (which are suitable for an MPSoC-based
environment) and show the potential benefits that can be obtained from
such a system. They conduct a limit study for potential benefits (from
energy and performance perspectives) for going from one configuration
to another.
The granularity that they focus on is a nested loop, which is the
natural computation/access pattern boundary for array-dominated
applications from a scientific domain and an image/video processing
domain. Using a set of array-dominated codes, they investigate what the
best cache configuration is for each nested loop under different
objective functions (optimization criteria) such as cache energy,
memory energy, cache misses, performance (execution time), overall
energy, and energy-delay (energy-execution time) product.
In addition to morphing conventional cache parameters, they also
consider reconfigurability of energy-aware features found in some cache
architectures such as block buffering.
Their results indicate that there are potential performance and energy
benefits in adopting a morphable cache subsystem.
The results also show that, depending on the optimization objective
targeted, one may select an entirely different cache configuration. For
example, minimizing cache memory energy requires a cache configuration
for each nest that is different from an objective criterion that tries
to minimize the overall memory system energy under a performance
constraint.
To read Part 1, go to
Types of Memory Architectures.
Next in Part 3:
Compiler Support
This series of
articles is based on copyrighted material submitted by Nikil Dutt and
Mahmut Kandemir to "Multiprocessor
Systems-On-Chipsp edited byWayne
Wolf and Ahmed Amine Jerraya. It is used with the permission of the
publisher, Morgan Kaufmann, an imprint of Elsevier. The book can be
purchased on-line.
Mahmut
Kandemir is an assistant professor in the Computer Science and
Engineering Department at Pennsylvania State University. Nikil Dutt is a professor of
computer science for Embedded Computer Systems at the University of
California, Irvine.