Getting a Handle on Java Performance - Embedded.com

Getting a Handle on Java Performance

Java's strong appeal for embedded applications is sometimes offsetby concerns about its speed and its memory requirements. However,there are techniques that you can use to boost Java performance andreduce memory needs, and of course the Java virtual machine youchoose affects Java performance, too. You can make better-informeddecisions about using Java by understanding the factors that affectits performance and selecting meaningful benchmarks for embeddedapplications.

Techniques for improving application execution and choosing theright Java virtual machine (JVM) address only a few aspects ofsystem architecture that affect overall Java performance. Whenselecting an embedded Java platform, you must take into account ahost of other factors, beyond the scope of this article, that havean impact on performance. Among them are hardware processorselection, Java compatibility and supported APIs, applicationreliability and scalability, the choice of a real-time operatingsystem (RTOS) with associated native libraries and drivers, theavailability of Java development tool kits and middleware, graphicssupport, and the ability to put the application code into ROM.

Once you've selected a hardware and software developmentplatform, there are a variety of factors to consider that will helpyou choose the best-performing Java virtual machine (JVM) for yourapplication.

Java Is Big and Slow: Myth or Reality?

Although the average Java bytecode application executes aboutten times more slowly than the same program written in C or C++,how well an application is written in Java can have a tremendousimpact on performance, as a study by Lutz Prechelt, “Comparing Javavs. C/C++ Efficiency Differences to Interpersonal Differences”(Communications of the ACM, October 1999), has shown. In the study,38 programmers were asked to write the same application program ineither C/C++ or Java. Applying statistical analysis to theperformance data for the programs revealed that actual performancedifferences depended more on the way the programs were written thanon the language used. Indeed, the study showed that a well-writtenJava program could equal or exceed the efficiency of anaverage-quality C/C++ program.

Various approaches are available for boosting bytecode executionspeed. They include using a just-in-time (JIT) compiler, anahead-of-time compiler, or a dynamic adaptive compiler; putting theJava application code into ROM (“ROMizing” it); rewriting the JVM'sbytecode interpretation loop in assembly language; and using a Javahardware accelerator.

Consider Compilers

JIT compilers, which compile bytecode on the fly duringexecution, generally aren't suitable for embedded applications,though. They produce excellent performance improvements in desktopJava applications but typically require 16 to 32 MB of RAM inaddition to the application's requirements. The large memoryrequirement places JIT compilers out of reach for many categoriesof embedded applications.

Figure 1:  You can impliment graphics above the hardware level with Java's heavyweight graphical tool kit (left) or the lightweight version (right). The lightweight version runs faster and has a smaller memory footprint, but writing an implementaion is harder and slower.

Ahead-of-time compilers rival JIT compilers in increasing Javaexecution speed. Unlike JIT compilers, they're used before theapplication is loaded onto the target device, as their nameindicates. That eliminates the need for extra RAM, but it createsthe need for more ROM or flash memory (that is, storage staticmemory), because compiled machine code requires four to five timesthe memory of Java bytecode. Compiling ahead of time tends toundermine one of the great benefits of the Java platform because ameasure of dynamic extensibility can be lost, since it may not bepossible to download new versions of compiled classes.Additionally, any dynamically loaded code, like an applet, won'tbenefit from ahead-of-time compilation and will execute more slowlythan resident compiled code.

Profiling Java code, although somewhat complex, can helpminimize code expansion when you're using an ahead-of-timecompiler. A good goal is to compile only that 20 percent of theJava classes in which the application spends 80 percent or more ofits time.

Dynamic adaptive compilers offer a good compromise between JITand ahead-of-time compilers (Table 1 ). They're similar toJIT compilers in that they translate bytecode into machine code onthe fly. Dynamic adaptive compilers, however, perform statisticalanalysis on the application code to determine where the code meritscompiling and where it's better to let the JVM interpret thebytecode. The memory used by this type of compiler isuser-configurable, so you can evaluate the trade-off between memoryand speed and decide how much memory to allocate to thecompiler.

Table 1:  Comparing bytecode compilation techniques

Placing the bytecode into ROM can contribute to fasterapplication performance. It doesn't make the code run faster. Itdoes, however, translate the code into a format that the JVM canexecute directly from ROM, causing the code to load faster byeliminating class loading and code verification, tasks normallyperformed by the JVM.

Another way to speed up bytecode execution without usingahead-of-time or dynamic compilation techniques is to rewrite thebytecode interpreter in the JVM. Because the interpreter is a largeC program, you can make it run faster by rewriting it in assemblylanguage.

Java hardware accelerators, or Java chips, are the ultimateoption for speeding up code execution. They're emerging in twofundamental configurations. Chips of the first type, such asChicory Systems' HotShot and Nazomi Communications' JSTAR, operateas Java coprocessors in conjunction with a general-purposemicroprocessor, in much the same way that graphics accelerators areused. Java chips in the other category, like Patriot Scientific'sPSC1000 and aJile's aJ-100, replace the general-purpose CPU.

Clearly, the latter are limited to applications that can bewritten entirely in Java. As for the first type, adding componentsof course raises costs, so this type offers a viable option onlywhen the cost is acceptable. Indeed, the price of Java chips hasbeen high because of relatively low production volumes. Ahigh-volume solution, however, may be forthcoming in the form ofthe ARM940 general processor with an integrated Java accelerator,called Jazelle.

Memory Requirements

The Prechelt study determined that the average memoryrequirement of a program written in Java is two to three timesgreater than for one written in C/C++. Even the compact nature ofbytecode, usually about 50 percent smaller than compiled C/C++machine code, can't offset that overhead. Recognizing that tryingto drop Java in its original, desktop-oriented form into embeddedsystems won't work, Sun Microsystems, Java's originator, took thelanguage through several evolutionary steps in an effort to tailorit to the embedded environment. Today, the Java 2 Platform, MicroEdition (J2ME), represents the latest, most evolved, and slimmestversion of Java for the embedded space.

You can trim J2ME by eliminating classes and code componentsthat aren't needed for your application. The JVM, native libraries,core classes, and application bytecode go into ROM. JVMs forembedded applications generally run under 500 kB, whereas classlibraries for J2ME typically don't exceed 1.5 MB. Java componentsthat affect RAM requirements include the JVM (for bytecodeexecution), the potential dynamic compiler, the Java heap, and thenumber of threads (the latter two obviously depend on theapplication). Executing as much of the application as possibleusing an interpreter—while maintaining acceptable executionperformance—helps contain the memory footprint.

Selecting a highly scalable operating system and C run-timepackage allows you to tune these software components for optimalmemory efficiency. Scaling the Java environment can be complex,however. Usually, a two-stage process is involved. First, you canuse the command line verbose option, java -v, to see the classes anapplication uses and then manually extract the needed libraries andclasses. If this process doesn't save sufficient space, you can usefiltering tools, like JavaFilter from Sun's EmbeddedJavaplatform.

If you're using Java, you should expect to increase memory andCPU resources compared with using C/C++ (Table 2 ).

Table 2:  Typical requirements for Java systems

Choosing the Right Java Platform

Of course, your choice of JVM is one key to optimizing Javaperformance for your application. Obviously, you need a JVMdesigned for embedded applications.

Embedded JVMs are highly user-configurable to match differentembedded system requirements, but which embedded JVM should youuse? Java benchmarks are meant to help you evaluate JVMs and Javaperformance, but you need to be careful about which ones you useand about the conclusions you draw from them. A good benchmarkscore for a particular JVM doesn't necessarily mean that using itwill make your application go faster.

Consequently, before evaluating a JVM, you have to evaluate anybenchmark to determine how meaningful it may be to yourapplication, taking into account the whole Java environment that'sassociated with it. Some benchmarks are very application-specific(a chat server benchmark like VolanoMark, for instance) and may notapply to the kind of Java applications you're developing.Additionally, because JVM vendors commonly optimize their productsto achieve good benchmark scores, the scores can be misleadingabout how much a given JVM will improve the performance of yourparticular application. Conversely, if your application hasspecific problems in certain areas, an environment that's optimizedto improve general processing won't solve those specific processingproblems.

Measuring Application Performance

When considering a benchmark to determine the overallperformance of a Java application, bear in mind that bytecodeexecution, native code execution, and graphics each play a role.Their impact varies depending on the nature of the specificapplication: what the application does, how much of it is bytecodeversus native code, and how much use it makes of graphics. How wella JVM will perform for a given application depends on how theunique mix of these three functional areas maps onto itscapabilities. Given these variables, the best way to benchmark aJVM is against your own application. Since that's not possiblebefore the application has been written, you must find thosebenchmarks that are most relevant to the application you intend towrite.

Sorting through Java benchmarks to find the ones that arerelevant for embedded applications can be confusing. SpecJVM98, forexample, provides a relatively complete set of benchmarks that testdiverse aspects of the JVM. Sounds good—but Spec-JVM-98 runsin a client/server environment and requires a minimum of 48 MB ofRAM on the client side for the JVM. That excludes it from anyrelevance to most embedded applications. In addition, it can't beused with precompiled classes.

Other benchmarks have different pitfalls. VolanoMark, forexample, is a chat server implementation and is therefore relevantonly for benchmarking applications with the same set ofrequirements as chat servers. The JMark benchmark assumes that theapplication includes the applet viewer and a full implementation ofJava's Abstract Windowing Toolkit (AWT). This benchmark can beirrelevant for the many embedded applications that have no graphicsor have limited graphics that don't require full AWT support, suchas devices running a PersonalJava minimal-AWT implementation.

Embedded CaffeineMark (ECM), the embedded version of theCaffeineMark benchmark from Pendragon Software (it has no graphicstests), is easy to run on any embedded JVM, since it requiressupport for basic Java core classes only, and it doesn't require alarge amount of memory. More importantly, there's a highcorrelation between good scores on this benchmark and improvedbytecode performance in embedded applications.

To get the most meaningful results from ECM, you must useexactly the same hardware when testing different JVMs. You mustalso pay attention to implementation differences among the JVMsyou're testing. If, for example, you're comparing a JVM with a JITcompiler against a JVM without one, it's important to run the JVMthat has the JIT with the java -nojit option on the command line toensure an apples-to-apples comparison.

ECM will typically make any JVM using compilation look good, nomatter the type of compilation, because it includes a very smallset of classes and always repeats the same small set ofinstructions. Dynamic compilers just cache the complete translationof the Java code in RAM and execute next iterations of the tests innative code. Ahead-of-time compilers can easily optimize the loopsand algorithms used in ECM, too.

Although the industry abounds with other Java benchmarks, likeJava Grande, SciMark, jBYTEmark, Dhrystone benchmark in Java, andUCSD Benchmarks for Java, there is no “ultimate” benchmark that cangive you certainty about Java and JVM performance in embeddedapplications. The best strategy is to identify a suite ofbenchmarks that seem most relevant to your application and use thecombined results of those benchmarks to help predict Javaperformance in a particular system environment.

Furthermore, the existing benchmarks may not measure otheraspects of your application code. Tuning Java applications to meetperformance goals may require addressing many program functionsbesides bytecode execution. Some of those functions—forexample, thread management, synchronization, method-to-methodcalls, class resolution, object allocation and heap management(including garbage collection), calls to native methods, bytecodeverification, and exception handling—occur within the JVM.Because few if any benchmarks address such functions, it falls toyou to conduct an in-depth study of a JVM's internals to understandhow its design may affect crucial aspects of your application.Writing special programs that exercise critical aspects of a JVMcan help you evaluate it for the application. If, for example, yourapplication uses a heavy mix of Java and C code, you can benefit bywriting a program that tests native method call performance. Otherfunctions, including native code execution and such factors asnetwork latency, may occur outside the JVM.

Graphics Performance

What if your application includes graphics? To start, there aretwo major factors that affect graphics performance in Javaapplications: Does the application's graphics display driver usegraphics coprocessor hardware acceleration? Is the applicationconfigured with a lightweight (faster) or a heavyweight (slower)implementation of the Abstract Windowing Toolkit? (See the figure.)In addition, like any other high-level Java service, graphicsperformance is affected by the way that the graphics servicesintegrate with lower-level native libraries.

Wind River's Personal JWorks includes a good benchmark forevaluating graphics performance in embedded systems. The benchmarktargets the PersonalJava AWT with a set of 39 tests of images,buttons, scrolling, text, and basic 2-D graphics.

Real-World Performance

Finally, you need to consider the performance of your CPU. Tohelp you identify CPU-bound performance, you should supplementsimple benchmarks by running real-world applications that exerciselarge amounts of different, complex Java code. Such test code mustmeet a number of requirements: It should contain a large number ofclasses that reflect an estimate of the real application (20-plusis a good ballpark). It must also be large (thousands of lines, atleast) and have no file system access and no graphics. Someexisting programs meet all those criteria. The GNU regularexpression package, regexp, for example, comprises about 3000 linesof code and more than 21 classes, providing a large number ofexpressions to parse and match. Another program, the Bean Shellinterpreter, is a simple prime number sieve that has 70 classes andseveral thousand lines of code. JavaCodeCompact, Sun's PersonalJavaROMizing tool, also would make a good test program.

The result of running these programs as test cases illustratesthe wide variance in the meaning of benchmark scores. For example,a JVM using a JIT compiler may run Embedded CaffeineMark up to 30times faster than when the nojit option is turned on (thus runningin pure interpretation mode), but the same JVM runs the Bean Shelland regexp tests only about one and a half times faster when usingthe JIT compiler. (The apparently impressive thirtyfold speedup ona simple benchmark like Embedded CaffeineMark is achieved throughcaching techniques that the compiler uses on the small amount ofcode and classes in ECM.) The difference in results clearlydemonstrates that high benchmark scores may not translate into acommensurate level of performance improvement in real-worldapplications.

Actually, SpecJVM98 and JMark yield results that most closelyapproximate those for real-world applications. They do suffer,though, from the limitations discussed above. In particular, theusefulness of the former in the embedded space depends greatly onyour ability to overcome the problems associated with your testinfrastructure requirements.

About the Author

Vincent Perrier is Product MarketingManager, Java Platforms, for the Wind River Platforms Business Unitin Alameda, CA.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.