Using Java to deal with multicore programming complexity: Part 1 - Embedded.com

Using Java to deal with multicore programming complexity: Part 1

Editor’s Note: Developers of embedded software are being forced to figure out ways to exploit the complexity of multicore platforms with languages and library software not designed for multiprocessor environments. In this first part in a series of three articles,Kelvin Nilsen surveys some of the issues that must be addressed when writing software for multiprocessors . In Part 2 he explains how Java, originally conceived with parallel programming in mind, can be combined with existing C and C++ software to preserve legacy software during the transition to multiprocessor execution. Part 3 deals with issues relating to combining C/C++ and Java for real-time multicore performance.

As semiconductor manufacturers continue to shrink silicon circuit sizes, computer engineers are able to pack more sophisticated logic and larger caches onto each chip. Over the years, these transistors have been used to increase typical data path size from 4 bits to 64 bits, add math coprocessors and specialized math instructions, implement multiple instruction dispatch, dynamic instruction scheduling, deep pipelines, superscalar operation, speculative execution, and branch prediction. They have also been used to expand the sizes of memory caches. Most of these hardware optimizations have served to automatically identify and exploit opportunities for parallel execution that is inherent in a single application’s instruction stream.

As more transistors are placed on a single chip, the connecting wires between adjacent transistors also shrink. This means less time is required to propagate a signal from one transistor to the next, enabling processors to run at higher clock frequencies, meaning less time is required to execute each instruction.

Prior to the turn of the century, nearly all high-volume microprocessors were single core solutions. Starting in about 1995, single core microprocessors began to show a scalability weakness. These processors were falling below the Moore’s law prediction (Figure 1 ) for ever-expanding transistor densities, sometimes significantly below it. This was due in large part to questions of balance. It is relatively easy, for example, to use the available transistor budget to expand on-chip memory cache. But expanding the on-chip cache is only worthwhile if the performance gains will be proportional to the increased chip costs, and chip designers have determined through simulation studies that further expansion is not cost effective without a balanced increase in other components.


Click on image to enlarge.

Figure 1. Microprocessor complexity as represented by transistor count. Moore's law prediction is shown as black line, single core as blue., multi-core as red.

The typical balanced increase in other components would normally include expanding the word size and ratcheting up the clock frequency. Larger caches, wider word sizes, and faster clock rates all increase power consumption and heat dissipation requirements. Power and heat dissipation are limiting further growth along the traditional scalability paths. Faster CPU clock rates also increase the speed differential between on-chip activities and off-chip transactions with memory and I/O devices, which tends to decrease overall efficiency.

With uniprocessor platforms, each doubling of performance has required a doubling of power consumption. Switching to multicore architectures changes this dynamic. The dual-core AMD Opteron 275, for example, runs 1.8 times faster on certain benchmarks than its single-core equivalent, the Opteron 248, but dual core processor uses only 7% more power. Thus, the performance-per-watt rating of the dual-core architecture is roughly 68% better than the single-core architecture.

The motivations for the move towards multicore architectures are easy to understand and industry trends cannot be denied. For the foreseeable future, new processor designs will almost always feature multicore capabilities. And single-core architectures are being phased into obsolescence. Like it or not, software engineers are going to have to befriend multiprocessing platforms.

Multicore hardware demands on software
Developing software for deployment on multiprocessor platforms is very different than traditional development for uniprocessors. Kunle Okukotun, Director of the Pervasive Parallelism Lab at Stanford University, recently explained “Anything performance-critical will have to be rewritten. Eventually, you either have to rewrite that code or it will become obsolete” [7]. Alex Bachmutsky, a system architect and author of “System Design for Telecommunications Gateways”, adds “This is one of the biggest problems telecom companies face today. Their apps were not written for multiple cores and threads, and the apps are huge; they have millions and tens of millions of lines of code” [7].

The challenges are several fold. Among them, software engineers targeting a multiprocessor must:

  1. Assure that the application is represented as a collection of independent threads or tasks, with minimal information sharing and coordination requirements between these tasks;
  2. Design and implement efficient and reliable task coordination mechanisms;
  3. Address load distribution and load balancing issues;
  4. Analyze inter-task contention and schedulability in order to demonstrate compliance with real-time constraints; and
  5. Structure their software solutions so that they are easily maintained and ported between distinct multiprocessor implementations.

One of the challenges of porting software to a multiprocessor platform is that the maximum speedup available through running an application in parallel on multiple processors is limited by Amdahl’s law [9, 10]. That is, for any given application, certain parts of the application are inherently sequential and cannot run in parallel on multiple processors. Let Q represent the percentage of the application that is inherently sequential. Let N represent the number of processors comprising the targeted multiprocessor. Amdahl’s law states that the maximum speedup S available on this multiprocessor is limited by the following mathematical expression:

Suppose, for example, that Q = 5% and N = 4. Applying Amdahl’s law, we find that the maximum speedup S is 3.5. Increasing N to 8 changes the limit to 5.9. Though Amdahl’s law may limit overall speedup, it does not necessarily limit processing capacity. Consider, for example, a software-implemented telephone switch that involves a significant amount of sequential work to set up each call, and then a comparable sequential amount of work to tear down the call. Running this telephone switch software on a multiprocessor will not decrease the time required to set up or tear down each call, but it will very likely increase the number of calls that can be initiated and terminated every second.

In the quest to achieve the theoretical speedup limits of Amdahl’s law, most legacy software must be restructured to expose the potential for parallel execution paths. Tools that automatically detect parallelism inherent in numeric matrix algorithms provide limited success with automatic generation of parallel code for those specialized algorithms [8]. However, the generality required by typical information processing applications, with complex linked data structures and data structure traversals, makes it very difficult to do automatic parallelization of this code.

Thus, most experts believe that manual translation of sequential algorithms is necessary in order to effectively exploit multiprocessor parallelization of software systems. When manually translating sequential programs into parallel execution paths, software engineers need to target a programming language that allows them to express parallel constructs in concise notations that are robust, easily maintained, and scalable across a broad spectrum of multiprocessor hardware configurations. Since the Java language was invented after the importance of parallel execution on multiprocessors was well understood, it has features designed to facilitate multi-threaded programming.

“The ubiquitous C language is the worst [tool] because it is inherently sequential and obscures any parallelism inherent in an algorithm,” said Jeff Bier, President of DSP consulting firm Berkeley Design Technology [7]. Both C and C++ suffer because the design of these languages never considered the special needs of multiprocessing. Various real-time operating system (RTOS) vendors have found ways to extend C into the domains of concurrent and fully parallel execution. However, this approach is hindered because the extensions are bolted on to the language in the form of libraries, without any standards-based support from the language syntax or the compilers.

Over the years, C and C++ language compilers and the language definition itself have evolved to extract high performance from modern uniprocessors. Thus, the language allows caching values in registers and speculative and out-of-order execution. One important consideration, from the perspective of writing code for execution on multiprocessors, is there is no clean way to disable these optimizations in selected contexts that might involve coordination with other processors. Many of the mechanisms that might be used to implement reliable information sharing between processors using particular C compilers with particular hardware and particular RTOS are highly non-portable. Great difficulty is encountered when developers attempt to move the code to a new RTOS, CPU, or C compiler.


A motivating example
The Java language was designed by SunMicrosystems during the early 1990s and released to the general publicin 1995. At the time, Sun Microsystems had already made the shift tomultiprocessor architectures. It would appear that Sun Microsystems feltit was important that Java improve the ease with which multiprocessorsoftware could be developed and maintained, as various multiprocessingfeatures were designed into the Java language. These features integrateat all levels of the Java implementation, manifesting themselves withinthe programming language syntax and source compilers, in special bytecodes and constraints on implementation of the Java virtual machine, andthroughout the standard libraries. Unlike C and C++, Java provides astrong foundation for building efficient, portable, and scalableapplications and software components that fully exploit whatevermultiprocessing capabilities are provided by the underlying hardware.

Theconcurrency features of Java are defined to provide compatible behavioron both uniprocessors and multiprocessors. On uniprocessors,concurrency is implemented by time slicing and priority preemption. Onmultiprocessors, multiple concurrent threads execute in parallel ondifferent processor cores. Below, we describe the behavior of a simpleconcurrent Java program.

The thread type. A Javaapplication is comprised in general of one or more independent threadsof execution. Each thread represents an independent sequence ofinstructions. All threads within a particular Java application can seethe same memory. In the vernacular of POSIX [12], all of the threadsrunning within a single Java virtual machine reside within the sameprocess.

Defining a new thread in Java is straightforward. SinceJava is an object-oriented language, the developer simply introduces anew Class extending the java.lang.Thread class, overriding the run() method with the body of code to be executed within this thread. See the excerpt below for an example.

  public class MyThread extends java.lang.Thread {
    final private String my_id;

    // constructor
    public MyThread(String name) {
      my_id = name;
    }

    public void run() {
      int count = 0;
      while (true) {
        System.out.println(“MyThread “ + my_id + “ has iterated “ + count + “ times”);
        count++;
      }
    }
  }

Tospawn this thread’s execution, the program simply instantiates aMyThread object and invokes its start() method. This is demonstrated inthe following code sequence.

  class MyMain {
    public static void main(String[] thread_names) {
      for (int i = 0; i < thread_names.length; i++) {
        MyThread a_thread = new MyThread(thread_names[i]);
        a_thread.start();
      }
    }
  }

For an invocation of this main program with command-line arguments bob fred sally, the first several lines of output were:

  MyThread bob has iterated 0 times
  MyThread bob has iterated 1 times
  MyThread bob has iterated 2 times
  MyThread bob has iterated 3 times
  MyThread bob has iterated 4 times
  MyThread fred has iterated 0 times
  MyThread sally has iterated 0 times
  MyThread bob has iterated 5 times
  MyThread fred has iterated 1 times
  MyThread sally has iterated 1 times
  MyThread bob has iterated 6 times
  …


Synchronized statements.
Whenevermultiple threads share access to common data, it is occasionallynecessary to enforce mutual exclusion for certain activities, enforcingthat only one thread at a time is involved in those activities.Consider, for example, a situation where one thread updates a recordrepresenting the name and phone number of the person on-call to respondto any medical emergencies that might arise. Many other threads mightconsult this record in particular situations. An (incorrect)implementation of this record data abstraction is shown below:

  public class OnCallRecord {
    private String name, phone;

    public void overwrite(String new_name, String new_phone) {
      name = new_name;
     // placing print statement here increases probability that we’llexperience a context switch after incompletely updating record
      System.out.println(“overwriting OnCallRecord with: “ + new_name + “, phone: “ + new_phone);
      phone = new_phone;
    }

    public OnCallRecord clone() {
      OnCallRecord copy = new OnCallRecord();
      copy.name = this.name;
      copy.phone = this.phone;
      return copy;
    }

    public String name() {
      return name;
    }

    public String phone() {
      return phone;
    }
  }

Theproblem with the above code is that the overwrite() method may bepreempted before the entire record has been updated. If another threadperforms a clone() during this preemption, the other thread will seeinconsistent data, perhaps obtaining one person’s name paired with adifferent person’s phone number. An excerpt of an execution trace for anactual run involving one updating thread and one inquiring thread showsthat this possibility is real. In this run, the initial value of theOnCallRecord had name George, with phone 111-2222.

  [1] overwriting OnCallRecord with: Mary, phone: 222-3333
  [2] Responsible party is: Mary, phone: 111-2222
  [3] overwriting OnCallRecord with: Mark, phone: 444-5555
  [4] Responsible party is: Mark, phone: 222-3333
  [5] overwriting OnCallRecord with: Sally, phone: 666-7777
  [6] Responsible party is: Sally, phone: 444-5555
  [7] Responsible party is: Sally, phone: 444-5555
  [8] Responsible party is: Sally, phone: 444-5555
  [9] Responsible party is: Sally, phone: 444-5555
  [10] Responsible party is: Sally, phone: 444-5555
  [11] overwriting OnCallRecord with: Sue, phone: 888-9999
  [12] Responsible party is: Sue, phone: 666-7777

Note that, for this execution trace, the phone number is consistently wrong. The fix for this problem is to add the synchronized keyword to each of the methods in the definition of OnCallRecord .

  public class OnCallRecord {
    private String name, phone;

    public synchronized void overwrite(String new_name, String new_phone) {
      name = new_name;
      System.out.println(“overwriting OnCallRecord with: “ + new_name + “, phone: “ + new_phone);
      phone = new_phone;
    }

    public synchronized OnCallRecord clone() {
      OnCallRecord copy = new OnCallRecord();
      copy.name = this.name;
      copy.phone = this.phone;
      return copy;
    }

    public synchronized String name() {
      return name;
    }

    public synchronized String phone() {
      return phone;
    }
  }

With this change, a similar execution trace shows much better behavior:

  [1] overwriting OnCallRecord with: Bob, phone: 123-4567
  [2] Responsible party is: Bob, phone: 123-4567
  [3] overwriting OnCallRecord with: Nancy, phone: 321-9876
  [4] Responsible party is: Nancy, phone: 321-9876
  [5] overwriting OnCallRecord with: George, phone: 111-2222
  [6] Responsible party is: George, phone: 111-2222
  [7] Responsible party is: George, phone: 111-2222
  [8] Responsible party is: George, phone: 111-2222
  [9] Responsible party is: George, phone: 111-2222
  [10] Responsible party is: George, phone: 111-2222
  [11] overwriting OnCallRecord with: Mary, phone: 222-3333
  [12] Responsible party is: Mary, phone: 222-3333
  [13] overwriting OnCallRecord with: Mark, phone: 444-5555
  [14] Responsible party is: Mark, phone: 444-5555
  [15] overwriting OnCallRecord with: Sally, phone: 666-7777
  [16] Responsible party is: Sally, phone: 666-7777


Java’s architecture-agnostic concurrency features
In whatshould be of particular interest to developers looking to migrate theirsoftware code from uniprocessors to multiprocessors, the concurrencyfeatures of Java are defined to provide compatible behavior in botharchitectural environments.

On uniprocessors, concurrency isimplemented by time slicing and priority preemption. On multiprocessors,multiple concurrent threads execute in parallel on different processorcores with few problems if the developer has a good understanding of thekey concurrency constructs built into the Java platform, includingthose related to thread types, synchronized statements, wait and notify,and volatile variables. Numerous programming aids are also available inthe form of extensive concurrency libraries.

The thread type.  AJava application is comprised in general of one or more independentthreads of execution. Each thread represents an independent sequence ofinstructions. All threads within a particular Java application can seethe same memory. In the vernacular of POSIX [Reference source notfound.], all of the threads running within a single Java virtual machinereside within the same process.

Defining a new thread in Java isstraightforward. Since Java is an object-oriented language, thedeveloper simply introduces a new Class extending the java.lang.Thread class, overriding the run() method with the body of code to be executed within this thread. See the excerpt below for an example.

  public class MyThread extends java.lang.Thread {
    final private String my_id;

    // constructor
    public MyThread(String name) {
      my_id = name;
    }

    public void run() {
      int count = 0;
      while (true) {
        System.out.println(“MyThread “ + my_id + “ has iterated “ + count + “ times”);
        count++;
      }
    }
  }

To spawn this thread’s execution, the program simply instantiates a MyThread object and invokes its start() method. This is demonstrated in the following code sequence.

  class MyMain {
    public static void main(String[] thread_names) {
      for (int i = 0; i < thread_names.length; i++) {
        MyThread a_thread = new MyThread(thread_names[i]);
        a_thread.start();
      }
    }
  }

Synchronized Statements
Whenevermultiple threads share access to common data, it is occasionallynecessary to enforce mutual exclusion for certain activities, enforcingthat only one thread at a time is involved in those activities.Consider, for an illustrative example, a situation where one threadupdates a record representing the name and phone number of the personon-call to respond to any medical emergencies that might arise. Manyother threads might consult this record in particular situations. An(incorrect) implementation of this record data abstraction is shownbelow:

  public class OnCallRecord {
    private String name, phone;

    public void overwrite(String new_name, String new_phone) {
      name = new_name;
     // placing print statement here increases probability that we’llexperience a context switch after incompletely updating record
      System.out.println(“overwriting OnCallRecord with: “ + new_name + “, phone: “ + new_phone);
      phone = new_phone;
    }

    public OnCallRecord clone() {
      OnCallRecord copy = new OnCallRecord();
      copy.name = this.name;
      copy.phone = this.phone;
      return copy;
    }

    public String name() {
      return name;
    }

    public String phone() {
      return phone;
    }
  }

The problem with the above code is that the overwrite() method may be preempted before the entire record has been updated. If another thread performs a clone() during this preemption, the other thread will see inconsistent data,perhaps obtaining one person’s name paired with a different person’sphone number. The fix for this problem is to add the synchronizedkeyword to each of the methods in the definition of OnCallRecord .

  public class OnCallRecord {
    private String name, phone;

    public synchronized void overwrite(String new_name, String new_phone) {
      name = new_name;
      System.out.println(“overwriting OnCallRecord with: “ + new_name + “, phone: “ + new_phone);
      phone = new_phone;
    }

    public synchronized OnCallRecord clone() {
      OnCallRecord copy = new OnCallRecord();
      copy.name = this.name;
      copy.phone = this.phone;
      return copy;
    }

    public synchronized String name() {
      return name;
    }

    public synchronized String phone() {
      return phone;
    }
  }

One effect of adding the synchronized qualifier to a method is to cause the Java compiler to insert code intothe method’s prologue to acquire a mutual exclusion lock associatedspecifically with the object to which the method is attached. Code torelease the mutual exclusion lock is inserted into the method’sepilogue. Some comparisons with C (and C++) are appropriate:

  1. Note that the implementor of the OnCallRecord has declared the name and phone fields to be private. This encapsulates access, assuring that the only way to modify or view the values of these fields is by means of the public methods associated with this data type. Thus, the implementor of OnCallRecord can assure that access to its internal data is properly synchronized. The Java compiler enforces that there is no way for other code outside the OnCallRecord abstraction to violate these constraints.
  2. As a high-level programming abstraction, much of the low-level detail associated with managing synchronization locks is hidden from developers. The Java virtual machine automatically decides when to allocate and deallocate lock data structures and has full control over when and how to coordinate with the underlying operating system.
  3. Since the synchronized keyword applies to entire methods and statements, it is block oriented. The Java compiler enforces that the lock is acquired upon entry to the block and is released upon exit from the block. Unlike C, there is no way for a programmer to forget to release the lock. The Java platform assures that the lock will be released even if this block of code is exited abnormally, such as when an exception is thrown or a break statement leaves an inner-nested loop.
  4. Since it is known by the Java run-time environment that this lock is associated with mutual exclusion and nested locks are released in LIFO order, the Java run-time is able to reliably implement priority inheritance and even priority ceiling emulation for the synchronization locks. Priority inversion avoidance is not required by the Java language specification, but it is implemented by several real-time virtual machines.
  5. Because the synchronize keyword is built into the language itself, the compiler and run-time environment are able to cooperate in providing efficient implementations. The typical implementation of Java lock and unlock services is much more efficient than the RTOS system calls that are required to implement these services in typical C code.

Wait and Notify
Within a synchronized method, Java programshave access to implicit condition variables associated with the object’ssynchronization lock. The typical concurrent Java programmer acquiresan object’s mutual exclusion lock by entering a synchronized method,checks for a certain condition and, if that condition is not satisfied,blocks itself by invoking the wait() service. When a thread executes wait() ,it is placed in a blocked state on a special queue associated with theenclosing object. While on this queue, the thread temporarily releasesits mutual exclusion lock.

Other threads operating on the sameobject will presumably make the condition true. Before making thecondition true, the other thread also acquires the mutual exclusionlock. After making the condition true, but while still holding themutual exclusion lock, the other thread performs a notify() (or notifyAll() )operation. This has the effect of unblocking one (or all) of thethreads on the object’s condition queue. Each unblocked threadautomatically reacquires the mutual exclusion lock before it executeswithin the synchronized block that was running when it invoked wait() . The mutual exclusion lock will not be available until the notifying thread leaves its synchronized block.

The following abstract data type demonstrates the use of wait() and notify() services.

  [1] public class FIFO {
  [2] final private int buffer[]; // this FIFO buffers integers
  [3] private int first, count;
  [4]
  [5] public FIFO(int size) {
  [6]   buffer = new int[size];
  [7]   first = 0; count = 0;
  [8] }
  [9]
  [10] public synchronized void put(int value) {
  [11]   while (count >= buffer.length) {
  [12]     wait();
  [13]   }
  [14]   buffer[(first + count++) % buffer.length] = value;
  [15]   if (count == 1) {
  [16]     notify(); // wake a waiting reader, if there is one
  [17]   }
  [18]  }
  [19]
  [20]  public synchronized int get() {
  [21]    int result;
  [22]    while (count <= 0) {
  [23]      wait();
  [24]    }
  [25]    result = buffer[first++];
  [26]    if (first >= buffer.length) {
  [27]      first = 0;
  [28]    }
  [29]    if (count– == buffer.length) {
  [30]      notify(); // wake a waiting writer, if there is one
  [31]    }
  [32]   }
  [33]  }

Inthis sample code, a single wait queue represents two possibleconditions. If the buffer is currently full, any threads on the waitqueue must be waiting to insert new items. If the buffer is currentlyempty, any threads on the wait queue must be waiting to extract an item.

The notify() operations at lines notify(); // wake a waiting reader , if there is one, and notify(); // wake a waiting writer ,if there is one, have no effect if no thread is currently waiting onthe corresponding wait queue. The Java Language Specification does notspecify which of multiple threads on the wait queue will be notified ifmultiple threads are on the queue. Certain real-time virtual machinessort the wait queue first by active thread priority and second by timeof arrival (FIFO), guaranteeing to always notify the highest prioritythread that has been waiting the longest.

Providing the wait() and notify() services as built-in capabilities of the Java language encourages astyle of programming that is easily understood and easily maintained.The common usage patterns are recognized as idioms by experienced Javadevelopers. Idiomatic usage is desirable because it means the softwareengineer does not have to analyze all of the subtle thread interactionsassociated with each body of code. The lack of idiomatic usage forcondition notification in C and C++ is one of the disadvantages of thoselanguages for development of concurrent software.

Volatile Variables
LikeC and C++, the Java language allows programmers to declare that certainvariables are volatile. One effect of declaring a variable to bevolatile is that any fetch or store of this variable is forced tocommunicate with the memory subsystem. For variables not declared to bevolatile, the variable, which might be cached in a machine register,could be updated many times by one thread while other threads consultingthe variable would never see a change to its value.

The program below represents an alternative solution to the on-call physician problem. With this solution, the OnCallRecord is replaced each time a new physician is placed on call.

  public class ImmutableOnCallRecord {
    private final String name, phone;
    public ImmutableOnCallRecord(String name, String phone) {
      this.name = name;
      this.phone = phone;
    }

    public String name() {
      return name;
    }

    public String phone() {
      return phone;
    }
  }

  public class ImmutableOnCallMain {
    static volatile ImmutableOnCallRecord shared_record;

    public static void main(String[] args) {
      OnCallScheduler ocs = new OnCallScheduler();
      ocs.start();

      while (true) {
        ImmutableOnCallRecord responsible = shared_record;
        if (responsible != null) {
          System.out.println(“Responsible party is: ” + responsible.name() + “, phone: ” + responsible.phone());
        }
      }
    }
  }

  public class OnCallScheduler extends java.lang.Thread {
    public void run() {
      while (true) {
        ImmutableOnCallMain.shared_record = new           
        ImmutableOnCallRecord(“George”, “111-2222”);
        ImmutableOnCallMain.shared_record = new    
        ImmutableOnCallRecord(“Mary”, “222-3333”);
        ImmutableOnCallMain.shared_record = new        
        ImmutableOnCallRecord(“Mark”, “444-5555”);
        ImmutableOnCallMain.shared_record = new     
        ImmutableOnCallRecord(“Sally”, “666-7777”);
        ImmutableOnCallMain.shared_record = new    
        ImmutableOnCallRecord(“Sue”, “888-9999”);
        ImmutableOnCallMain.shared_record = new    
        ImmutableOnCallRecord(“Bob”, “123-4567”);
        ImmutableOnCallMain.shared_record = new ImmutableOnCallRecord(“Nancy”, “321-9876”);
      }
    }
  }

Note that the shared_record variable defined within the ImmutableOnCallMain class is declared volatile. This means that each time the variable is overwritten within the run() method of OnCallScheduler ,the change is immediately visible to other threads. Further, it meansthat each time the variable’s value is fetched within the ImmutableOnCallMain.main() method, the value is fetched from shared memory rather than a locallycached register value. If this variable had not been declared volatile ,a fully compliant Java implementation would be free to behave as ifeach running thread had its own private copy of this variable.

Itis important to allow the Java run-time to cache shared variables, toreorder instructions, and to perform certain speculative operationsbecause these behaviors are critical to achieving high performancewithin each thread. However, it is also important to allow programmersto selectively disable these optimizations where doing so is necessaryin order to achieve reliable sharing of information between independentthreads.

One aspect of Java that is different – and more powerful- than C and C++ is its treatment of volatile variables. Not only doesthe volatile keyword force all uses of the variable to access sharedmemory, but this keyword also implies certain important semantics withregards to other non-volatile variables. In particular:

1. Since overwriting the value of a volatile variable most likely triggers communication of certain information heldas part of this thread’s local state to certain other threads, theoperation is accompanied by a cache, memory, and instruction-orderingfence.

  • Any actions that precede the volatile write within source code may not be reordered to actually occur after the volatile write operation.
  • Any variables represented by machine registers that may hold values that were modified since they were last fetched from memory must be copied back to memory before the volatile write operation is performed.
  • If the underlying hardware allows the same shared variable to have different values in each processor’s cache of that memory location, any modified values held in this processor’s cache must be copied back to memory before the volatile write operation is performed. Architectures that allow relaxed consistency generally provide explicit instructions to force cache flushes.

2.Since reading from a volatile variable typically represents the receiptof information communicated from another thread, this operation is alsoaccompanied by a cache, memory, and instruction-ordering fence.

  • If the underlying hardware allows the same shared variable to have different values in each processor’s cache of that memory location, update this processor’s cache by copying any variables that have not been locally modified from shared memory immediately following the read of the volatile variable. Architectures that allow relaxed consistency generally provide explicit instructions to force cache updates.
  • Any potentially shared variables that are cached in machine registers are updated from shared memory immediately following the read of the volatile variable.
  • Any actions that follow the volatile read within source code may not be reordered to actually occur before the volatile read operation.

Compared with C and C++, these refined semantics for dealing with volatile variables are critical for reliable sharing of information betweendistinct threads. It is quite difficult to precisely implement thesesemantics using existing C compilers and RTOS services. Even moredifficult is to implement these semantics in a portable and maintainableway that would not inhibit the reuse of code on a different CPU, with adifferent C compiler and/or a different RTOS.

Concurrency libraries
Withthe Java 5.0 release in 2005, the Java standard libraries included anew package named java.util.concurrent which provides support for morepowerful concurrency capabilities. Included in this package and its twosubpackages, java.util.concurrent.atomic and java.util.concurrent.locks,are libraries to implement thread-safe collections, semaphores,prioritized blocking queues, atomically modifiable data abstractions,and a reentrant lock data type. The reentrant lock data type offersservices not available with synchronized statements, such as an API toconditionally acquire a lock only if the lock is not currently held bysome other thread.

The Java memory model
One of themore noteworthy differences between concurrent programming support inJava and C or C++ is that Java support is integrated within the languageand the compiler, whereas C and C++ support is restricted to libraries,many of which are proprietary to each RTOS vendor. By integratingconcurrency support within the compiler, special code generationpractices enable greater efficiency without sacrificing reliable andportable sharing of information between threads. (With release of theC++11 standard in August of 2011, there is now standardized support formulti-threading, including a standardized definition for the C++multi-threading memory model. This provides the new C++ definition withsome of the concurrency benefits of Java. Of course, the millions oflines of legacy C++ code that were written prior to release of this newstandard are not consistent with the new multi-threading model.)

Aswith reading and writing of volatile variables, entry and exit of asynchronized block has certain side effects relating to visibility ofvariables that may be locally cached. In particular,

1. Exiting a synchronized block causes locally cached variables to be published for view by other threads.

  • Any actions that are contained within the synchronized block or precede the synchronized block within source code may not be reordered to actually occur after control exits the synchronized block.
  • Any variables represented by machine registers that may hold values that were modified since they were last fetched from memory must be copied back to memory before control leaves the synchronized block.
  • If the underlying hardware allows the same shared variable to have different values in each processor’s cache of that memory location, any modified values held in this processor’s cache must be copied back to memory before control leaves the synchronized block. Architectures that allow relaxed consistency generally provide explicit instructions to force cache flushes.

2. Entry into a synchronized block causes this thread to see all data globally published by other threads.

  • If the underlying hardware allows the same shared variable to have different values in each processor’s cache of that memory location, update this processor’s cache by copying any variables that have not been locally modified from shared memory immediately upon entry into the synchronized block. Architectures that allow relaxed consistency generally provide explicit instructions to force cache updates.
  • Any potentially shared variables that are cached in machine registers are updated from shared memory immediately upon entry into the synchronized block.
  • Any actions that are contained within the synchronized block or that follow the synchronized block within source code may not be reordered to actually occur before control enters the synchronized block.

The Java Memory Model describes a happens-before relationship between certain events which may occur in differentthreads. Data races may exist in Java programs that are not carefullystructured so as to assure that the write operation happens before allof the read operations that are intended to see the value written. A correctly synchronized Java program will avoid race conditions by assuring that activitiesoccurring in different threads are properly sequenced by happens-beforerelationships. As examples of the happens-before relationships, the JavaMemory Model specifies the following orderings:

  1. An unlock of a monitor lock (exiting a synchronized block) happens before every subsequent locking of the same monitor lock.
  2. Overwriting a volatile variable happens before every subsequent fetch of the same field’s value.
  3. Invocation of Thread.start() happens before every action performed within the started thread.
  4. All actions performed within a thread happen before any other thread can detect termination of the thread. Other threads detect this thread’s termination by invoking Thread.join() or Thread.isAlive() .
  5. A thread’s request to interrupt another thread happens before the other thread detects it has been interrupted.

Conclusion
Javamakes it easy to write concurrent programs, which are critical tomaking effective use of new multi-core platforms. The simple Javaprogram presented in this article demonstrates this. This briefintroduction to concurrent programming in Java only scratches thesurface of what is possible with the language.

To learn moreabout what the synchronized keyword means in Java, and for discussion ofa number of other issues critical to the use of Java in concurrentsystems, read the supplemental articles in this series:

Part 2: Migrating legacy C/C++ code to Java
Part 3: Using Java with C and C++ for real-time multicore designs

Dr. Kelvin Nilsen is Chief Technology Officer over Java at Atego Systems, a mission- andsafety-critical solutions provider, where he oversees the design andimplementation of the PERC virtual machine and other Ategoembedded/real-time oriented products. Prior to joining Atego, Dr. Nilsenserved on the faculty of Iowa State University where he performedseminal research on real-time Java that led to the Perc family ofvirtual machine products. Dr. Nilsen participates in the Java CommunityProcess as a member of the JSR 282 and JSR 302 expert groups. He holds a B.S. in Physics and M.S. and Ph.D. degrees in Computer Science.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.