Using Java to deal with multicore programming complexity: Part 3 - Using Java with C and C++ for real-time multicore designs

Click on image to enlarge.
Figure 5: Allocatable Memory vs. Time
Allocatable Memory vs. Time in Figure 5 shows a simulated air traffic control workload with real-time garbage collection running under the direction of a real-time pacing agent. This particular application is running in a fairly predictable steady state, as characterized by the following observations: First, the slope of the yellow chart represents the amount of memory available for allocation, roughly constant whenever garbage collection is idle. This means the application’s allocation rate is approximately constant. Second, the heights of the yellow chart’s peaks are roughly identical. This means the amount of live memory retained by the application is roughly constant. In other words, the application is allocating new objects at approximately the same rate it is discarding old objects. Finally, the percentage of CPU time required for application processing is well behaved, ranging from about 20% to 50%.
Note that garbage collection is idle most of the time. As memory becomes more scarce, garbage collection begins to run. When garbage collection runs, it interferes with some, but not all, of the real-time application processing. For this reason, you’ll note a slight dip in application processing each time the garbage collector, represented by the occasional red upward spikes, runs. You’ll also note a tendency for application processing to briefly increase following each burst of garbage collection. This is because the preempted application needs to perform a small amount of extra work to catch up with real-time scheduling constraints following each preemption. If properly configured, the pacing agent will carefully avoid delaying the application threads by any more than the allowed scheduling jitter.
With multiprocessor real-time garbage collection, certain additional issues arise. Consider the following:
1. An N-processor system allows N times as much garbage to be created in a given unit of time. Thus, garbage collection for an N-processor system must perform N times as much work as the uniprocessor version.
- An N-way multiprocessor can accomplish N times as much garbage collection work as an equivalent uniprocessor in the same amount of time only if the garbage collection algorithms scale well to multiprocessor architectures. When migrating from a uniprocessor to multiprocessor Java virtual machine, it is important to understand the performance characteristics of the multiprocessor garbage collector as its behavior may not scale linearly in the number of processors.
- System architects might be tempted to dedicate one processor to garbage collection and the other N-1 processors to application processing. This may work for small numbers of processors but does not scale to large numbers of processors. Suppose, for example, that your uniprocessor application spends 15% of CPU time in garbage collection. On a 4-way multiprocessor, 25% of total CPU time would be available to garbage collection, which should be sufficient. However, on an 8-way multiprocessor, only 12.5% of total CPU time would be available to garbage collection, which is inadequate. Another disadvantage of this approach is that it does not make effective use of the Nth processor during times when garbage collection is idle.
2. Incremental parallel but nonconcurrent real-time garbage collection offers the same benefits on a multiprocessor system as on a uniprocessor. During certain time spans, all N processors are dedicated to application code. During other time spans, all N processors are dedicated to small increments of garbage collection.
3. Fully parallel and concurrent garbage collection offers greater scheduling flexibility than parallel but nonconcurrent real-time garbage collection. In particular, certain processors can be running increments of garbage collection in parallel with execution of application code on other processors. This form of real-time garbage collection is more complex and may impose a larger overhead on the execution of the application threads because those threads must coordinate with finer granularity with garbage collection activities.
Example: Why Calix migrated from C to multiprocessor Java
Calix is a supplier of simplified services platforms designed to facilitate all aspects of voice, data, and video service delivery to business and residential subscribers for local exchange carriers of all sizes. The company was founded in 1999. By 2007, they had become the number one supplier of DSL broadband loop carriers, accounting for over 41% of DSL/voice ports shipped on broadband loop carriers in North America.
In 2001, Calix faced difficult challenges. Their management plane software, which had been implemented in C, lacked key features required by the market and the product had a number of significant but elusive bugs. One of the reasons it had been so difficult to manage the C management plane was because the problem domain needed multiple threads but the C language did not provide support for threading. Calix engineers had emulated multiple threads by using cooperative coroutines, implemented in C. This code was difficult to understand and maintain. Engineering management struggled to deal with these challenges. Eventually, they chose to rewrite the entire management plane as concurrent Java code.
The Java implementation of the Calix C7 multiservice access platform is designed to support asymmetric multiprocessing with one copy of the JVM running multiple concurrent threads on each processor. The benefit of multiprocessing in this application is to allow scalability to large numbers of I/O channels. The uniprocessor configuration supports a combined total of 480 plain old telephone system (POTS) and digital subscriber line (DSL) ports. Up to five Calix C7 processors can be combined to support a total of 2,400 ports. Each interconnection of Calix C7 processors serves the needs of one local exchange.
Basic management activities such as provisioning, fault and performance monitoring, and security enforcement require a unified framework capable of providing seamless communication between the management software and the services on the Calix C7 platform. By moving the application framework of its management technology to an object-oriented platform, Calix sped application development and improved software quality. By using Java as the implementation language, Calix simplified and centralized error handling, avoided many memory management issues, and took advantage of a standardized remote debugging interface.
At the start of the migration effort, Calix engineers were not familiar with Java. They learned Java as part of this engineering activity. Specific metrics provided by Calix regarding the conversion are:
- Two-fold improvement in software developer productivity compared to the previous implementation in C, even counting the time required to learn Java
- Five-fold reduction in code size compared to the previous implementation in C
- Fewer bugs in the Java code than in the original C code
- More flexibility and generality in the new system, as the Java implementation of their management plane was more capable of adding key capabilities required by evolving market requirements
1. Portability of Java yielded several distinct benefits
- Developers could test and debug most of their code on larger, faster desktop workstations (running Windows, Solaris, or Linux) even though final deployment was on a single-board computer running a real-time operating system
- Breadth of off-the-shelf library code that was never before tested on their specialized hardware worked out of the box with no porting effort
- Simplified coordination with outside consultants, who guided development but did not have access to any of their specialized hardware
- The symbolic cross debugger was essential, with the user interface running on a desktop workstation and the debugger agent running on their single-board computer
- The remote performance profiler was essential during system tuning and optimization; profiler feedback was especially valuable as it helped educate Calix engineers new to Java development regarding the costs of particular coding practices
- Calix summarized the importance of access to these development tools stating that the effort would have failed if they did not have tools to match the capabilities they had used during C development
Conclusion
Multiprocessor architectures offer superior performance and more efficient and scalable use of electric power. The shift by chip vendors to abandon uniprocessor designs in favor of multicore chips is forcing changes to the ways that embedded software systems are architected and designed.
In light of the challenges associated with implementing and maintaining multiprocessor software, the C and C++ languages are showing their age. Since the Java language was designed to simplify engineering of multiprocessor software, it is often preferred for the major software rewrites that are required to take full advantage of modern multiprocessing capabilities. The two-fold improvement in developer productivity and the five- to ten-fold improvement in ease of software reuse and integration are other reasons to prefer Java for major software renovation activities.
Software engineers who are responsible for modernization of existing uniprocessor software legacies need to become familiar with various multiprocessing issues in order to effectively manage the modernization efforts. These issues include topics in information sharing and synchronization, parallel algorithms and data structures, resource partitioning and load balancing, real-time schedulability analysis, and choice of programming language.
Faced with the need to migrate existing legacy software to multiprocessor platforms, in-house engineering teams may benefit from involvement of external experts who can provide essential training and consulting relating to multiprocessor software.
Part 1: How Java eases multicore hardware demands on software
Part 2: Migrating legacy C/C++ code to Java
Dr. Kelvin Nilsen is Chief Technology Officer over Java at AtegoSystems, a mission- and safety-critical solutions provider, where heoversees the design and implementation of the PERC virtual machine andother Atego embedded/real-time oriented products. Prior to joiningAtego, Dr. Nilsen served on the faculty of Iowa State University wherehe performed seminal research on real-time Java that led to the Percfamily of virtual machine products. Dr. Nilsen participates in theJava Community Process as a member of the JSR 282 and JSR 302 expertgroups. He holds a B.S. in Physics and M.S. and Ph.D. degrees in Computer Science.
References
[1] Moore, G., “Cramming more components onto integrated circuits”, Electronics, vol. 38, no. 8, April 19, 1965
[2] “Transistor Count”, Wikipedia. (http://en.wikipedia.org/wiki/Transistor_count)
[3] “Design Considerations for Size, Weight, and Power (SWAP) Constrained Radios”, Presented at 2006 Software Defined Radio Technical Conference and Product Exposition, Nov. 14, 2006. (http://www.thalescomminc.com/media/DesignforSWAPRadios-SDRTechConference-Nov2006-s.pdf)
[4] Barroso, L. A., “The Price of Performance”, ACM Queue, pp. 48-53, Sept. 2005.
[5] Qi, X., Zhu, D., “Power Management for Real-Time Embedded Systems on Block-Partitioned Multicore Platforms”, Proceedings of the 2008 International Conference on Embedded Software and Systems, pp.110-117, IEEE, 2008
[6] “Multicore for Embedded Devices”, TechOnLine, Apr. 30, 2007. (www.techonline.com)
[7] Merritt, R., “Parallel Software Plays Catch-Up with Multicore”, EE Times, June 22, 2009.
[8] Suess, M., “An Interview with Dr. Jay Hoeflinger about Automatic Parallelization”, Aug. 14, 2007. (http://www.thinkingparallel.com/2007/08/14/an-interview-with-dr-jay-hoeflinger-about-automatic-parallelization/)
[9] Amdahl, G., “Validity of the single-processor approach to achieving large scale computing capabilities,” Proceedings of AFIPS Conference, 1967, pp. 483-485.
[10] Krishnaprasad, S., “Uses and Abuses of Amdahl’s Law”, Journal of Computing Sciences in Colleges, Vol. 17, No. 2. Dec. 2001, pp. 288-293.
[11] Goetz, B., Peierls, T., Bloch, J., Bowbeer, J., Holmes, D., and Lea, D., Java Concurrency in Practice, Addison-Wesley, 2006, 403 pages.
[12] IEEE Std 1003.1, 2004 Edition (http://www.opengroup.org/onlinepubs/009695399/)
[13] Klein, M., Ralya, T., Pollak, B.,, Obenza, R., Gonzalez Harbour, M., A Practitioner’s Handbook for Real-Time Analysis, Kluwer Academic Publishersr, 1993.
[14] Baruah, S. K., Goossens, J., “Rate-Monotonic Scheduling on Uniform Multiprocessors”, IEEE Transactions on Computers, Vol. 52, No. 7, July 2003. pp. 966-970.
[15] Oh, D., Baker, T., “Utilization Bounds for N-Processor Rate Monotone Scheduling with Static Processor Assignment,” Real-Time Systems: The International Journal on Time-Critical Computing, vol. 15, pp. 183-192, 1998.


Loading comments... Write a comment