Optimizing Memory Space Utilization
Another important issue in an MPSoC-based environment is the management
of limited memory space. It should be observed that optimizing for data
locality does not mean that the optimized application will execute
using fewer data items. To reduce data space requirements of a given
application, one may need to consider lifetimes of program data
structures/variables. In particular, if the lifetimes of two different
data items do not overlap, they can potentially share the same memory
location, thereby reducing overall memory space demand.
For array-based applications, in a given loop nest, one can define
the lifetime of an array element
as the difference (in loop iterations)
between the time it is first assigned (written) and the time it is last
used (read). For a given array index a (which might be
multidimensional), the start of its lifetime is referred to as S(a),
whereas the end of its lifetime is denoted using E(a) - both in terms
of loop iterations. Using these definitions, the lifetime vector for
this array element can be given as s = E(a) - S(a), where "-" denotes
vector subtraction.
Note that the lifetime of a is expressed as a vector, as in general
there might be multiple loops in the next, and expressing lifetime as a
vector allows the compiler to measure the impact of loop
transformations on it. As an example, if an array element (that is
accessed in a nest with two loops) is produced in iteration (1 3)T and
consumed (i.e., last-read) in iteration (4 5)T, its lifetime
vector is s = (4 5)T - (1 3)T = (3 2)T.
It should be noted that before S(a) and after E(a), the memory
location
allocated to this array element could be used for storing another array
element (which belongs to the same array or to a different array).
Obviously, the shorter the difference between E(a) and S(a) the better,
as it leaves more room for other array elements.
However, it is to be noted that reducing lifetimes of array elements
does not guarantee memory location reuse. This is because in order to
have memory location reuse, the lifetimes should be shortened such that
the different elements can share the same memory location. As a result,
it is important for the compiler to make sure that this happens (when
lifetimes are reduced). That is, the compiler can check whether the
lifetime reduction will really be beneficial; if not, it may choose not
to apply it.
Along these lines, Lefebvre and
Feautrier propose a method for
automatic storage management for loop-based parallel programs with
affine references. They show that parallelization by total data
expansion can be replaced by partial data expansion without distorting
the degree of parallelism.
Grun et al. have
presented memory size
estimation techniques for multimedia applications. Strout et al.
introduce the universal occupancy vector that allows
schedule-independent storage space optimization. Their objective is to
reduce data space requirements of a given code without preventing
subsequent locality optimizations from being performed. Wilde and
Rajopadhye have investigated memory reuse by performing
static
lifetime
computations of variables using a polyhedral model. Lim et al.
have defined an optimization method called the array contraction.
The work done by Song et al. and Catthoor and his colleagues shows
how loop fusion can be used for minimizing data
space
requirements. Along similar lines, Fraboulet
et al. has also presented
an algorithm to reduce the use of temporary arrays by loop fusion. They
demonstrate that although the complexity of their algorithm is not
polynomial, it is highly efficient in practice. Thies et al. has
described a unified general framework for scheduling and storage
optimization.
One of the problems that they address, namely, "choosing
a store for a given schedule" tries to assign storage
locations to array elements with storage size reduction in mind. It
should also be observed that reducing memory space requirements alone
might help improve data locality (cache behavior). This is because in
array-intensive applications a significant portion of data cache misses
(called capacity misses) occurs because large datasets are accessed in
a short period of time (e.g., within a loop). By reducing the number of
array elements, one can also improve cache behavior.
We also need to point out an important tradeoff that needs to be
made for an MPSoC-based execution environment. In general, more memory
space means more parallelism, as discussed in studies such as those of
Tu and Padua and Barthou et al.
This is
because a large
percentage of data dependences in a given application are
anti-dependences and output-dependences, which occur mainly due to
reuse of the same memory location for different data items.
By
allocating a separate memory location for each data item, one can
eliminate these dependences. However, as we have discussed above,
reusing the same memory location for different data items is an
important optimization as far as memory space utilization is concerned.
Therefore, an optimizing compiler should be able to resolve this
tradeoff considering the performance and memory space constraints at
the same time.