Using Java to deal with multicore programming complexity: Part 2 - Migrating legacy C/C++ code to Java

Kelvin Nilsen, Atego

June 27, 2012

Kelvin Nilsen, Atego


Implementing parallel activities

Java’s built-in support for concurrent programming makes implementation of parallel code straightforward. However, there are various issues that require appropriate attention at architecture and design time in order to assure robust and scalable operation of the modernized application.

Thread pools
Many real-time workloads are much more dynamic than the rigidly scheduled software PLC application described above. Events are signaled asynchronously, and a prescribed response must be provided for each event within a specific deadline associated with that event.

Because starting and stopping threads are relatively costly operations, it is preferable to start all threads during application boot time. Each thread iterates, waiting for an appropriate event to occur, providing a response to that event, and repeating the sequence.

In some cases, certain threads may be dedicated as servers capable of responding to a variety of different events, depending on the application’s most urgent requirements. A typical server thread implementation might be represented by the following run() method:

  public void run() {
    WorkAssignment a_task;

    while (true) {
      synchronized (this) {
        while (my_work_assignment == null) {
          wait();
        }
        a_task = my_work_assignment;
        my_work_assignment = null;
        // Since the server thread is running, the only possible waiters are other
        // threads wanting to assign work to me.
        notify();
      }
      // do the work after releasing the lock, allowing other threads to offer a new
      // work assignment while I’m completing the previous assignment.
      a_task.doWork();
    }
  }

This code assumes existence of a WorkAssignment interface with a doWork() method. Elsewhere within this server thread’s class definition, a different method would enable other threads to make work assignments to this server thread. That code might resemble the following:

  public synchronized void assignWork(WorkAssignment wa) {
    while (my_work_assignment != null) {
      wait();
  }
  my_work_assignment = wa;
  // I have just assigned work. It may be that the condition queue holds  other threads
  // wanting to assign work in addition to the server thread waiting to receive work. I’ll
  // wake them all just to be sure that the server gets a notification. This is reasonably
  // efficient under the assumption that this system is balanced so that the server is
  // generally keeping up with all requests so there would not normally be other threads
  // waiting to assign work which would be unproductively notified. Note that any thread
  // that is unproductively notified will simply place itself back on the wait queue.
  notifyAll();
}

In order to effectively manage response-time constraints, the recommendation is that threads with shorter deadlines are assigned higher priority. This is known as rate-monotonic assignment of priorities. Analysis of schedulability to satisfy real-time constraints is discussed below.

Note that if a server thread’s priority is assigned in rate-monotonic order, then it is necessary for all of the work tasks assigned to a given thread to have the same relative deadline. Note also that exploiting the full capacity of an N-processor computer may require N running instances of each server thread.

Deadlock prevention
Deadlock occurs if one thread is waiting for another thread to accomplish a critical task while that other thread is unable to accomplish the critical task because it is waiting (possibly indirectly) for the first thread to perform some other critical activity. It is the software equivalent of two overly polite individuals stepping up to a door with the first person saying “You first,” and the second responding, “No, after you.” People are able to break this deadlock only by observing other visual queues.

Unfortunately, software tends to focus on one objective at a time, and is totally oblivious to the bigger picture. Thus, two software threads encountering a similar situation might both wait forever for the conflict to be resolved. A symptom of this behavior is that the computer system becomes catatonic, and eventually some human operator decides to reboot it.

One risk when moving code from a uniprocessor to multiprocessor environment is that deadlocks never seen in the uniprocessor code begin to manifest in the multiprocessor version. This is because certain coordination algorithms that work fine on a single processor do not work reliably in the multiprocessor context.

An example of one such algorithm is a non-blocking priority ceiling emulation (PCE) algorithm for mutual exclusion. This protocol specifies that when a thread enters a particular critical section of code, it immediately raises its priority to the ceiling priority associated with that code. By design, any thread requiring mutually exclusive access to this same critical region of code must have a priority no greater than the ceiling priority. On a uniprocessor system, no semaphores or queues are required to implement the PCE protocol as long as threads holding a priority ceiling emulation “lock” do not suspend execution while holding the lock. This is because a thread holding the lock is known to run at a higher priority than all other threads that may attempt to acquire the lock.

The PCE mechanism also works on a multiprocessor but a lock and queue may be required in the implementation because there is no guarantee that other threads will not be running in parallel at the same or even lower priorities than this thread. Even though PCE generalizes to a multiprocessor environment, there is a risk that using this protocol on a multiprocessor will result in deadlock even though the same code works reliably on a uniprocessor without deadlock.

Deadlock may occur if one thread uses PCE to lock resource A and a different thread uses PCE to lock resource B. If the first thread subsequently attempts to lock resource B using PCE while still holding its PCE lock on resource A and the second thread attempts to lock resource A using PCE while still holding its PCE lock on resource B, deadlock results. It would not be possible for this sequence of events to occur on a uniprocessor system so the uniprocessor implementation is deadlock free.

A common algorithm to avoid deadlock is to require that all threads acquire nested locks in the same order. A PCE practice that assures absence of deadlock even on multiprocessors is to require that inner-nested PCE locks have a higher ceiling than outer-nested PCE locks. Adopting this practice for all PCE code will assure that the code is deadlock-free both on uniprocessor and multiprocessor platforms.

Data structure considerations
Often, mutual exclusion locks associated with shared data structures surface as the bottleneck that prevents effective parallel execution of multiple threads executing on different processors. If only one processor at a time needs access to a complex data structure, then a single lock at the outermost interfaces to this data structure may be an effective way to ensure the data structure’s integrity. However, if many concurrent threads seek to read and even modify the data structure in parallel, alternative locking mechanisms are necessary. Consider the following common approaches:

1. Reorganize the data structures so that physical proximity correlates with temporal proximity for any given thread. In other words, if a thread’s visit to a particular part of the data structure predicts with high probability that another part of the data structure will also be visited in the very near future, then it is desirable that these two parts of the data structure be near to each other within the in-memory data structure.

2. Restructure the locking protocols associated with the data structure.
  • Rather than a single global lock on the entire data structure, introduce multiple locks, each guarding a particular portion of the data structure. Data that has close physical proximity would be guarded by the same synchronization lock. The objective is to allow different processors to manipulate different parts of the data structure in parallel.
  • To lock each portion of the data structure, consider a locking mechanism that allows many concurrent readers but only one writer at a time.
3. Consider replicating certain data structures so that particular threads can access local copies of certain data without any synchronization at all. This might require that algorithms be adjusted to tolerate the possibility that localized data structure copies may hold stale information.

Part 1: How Java eases multicore hardware demands on software

Part 3: Using Java with C and C++ for real-time multicore designs



Dr. Kelvin Nilsen is Chief Technology Officer over Java at Atego Systems, a mission- and safety-critical solutions provider, where he oversees the design and implementation of the PERC virtual machine and other Atego embedded/real-time oriented products. Prior to joining Atego, Dr. Nilsen served on the faculty of Iowa State University where he performed seminal research on real-time Java that led to the Perc family of virtual machine products. Dr. Nilsen participates in the Java Community Process as a member of the JSR 282 and JSR 302 expert groups. He holds a B.S. in Physics and M.S. and Ph.D. degrees in Computer Science.



< Previous
Page 2 of 2
Next >

Loading comments...

Parts Search Datasheets.com

KNOWLEDGE CENTER