|
I
t's been reported that Java is 10 to 20 times slower than C, on average. Accurate or
not, those figures are attempting to describe "interpreted Java." In other words, if you employ a Java virtual machine in your system to interpret Java bytecode on the fly (instruction by instruction), you will pay a serious run-time performance penalty. Such a large performance penalty, combined with the ROM and RAM requirements of the embeddable JVMs available today, is likely an insurmountable obstacle to using Java in the majority of embedded systems.
However, I have recently tried using Java in
another way. Using a traditional embedded software development model and a Java cross compiler (sometimes called an ahead-of-time, or AOT, compiler), I have been developing part of my embedded software in Java. Following this scenario, my embedded system has no JVM at all, and the execution speed of programs written in Java compares favorably with those written in C++.
Development tools
If you're contemplating bringing Java into a
development shop dominated by Solaris, you should know about one of Java's supreme ironies: most of its developers are Windows-hosted. This means that virtually all of the major IDE vendors produce a Windows version of their latest release first, with the Solaris port trailing behind by six months or more. The same principle applies to vendors of other Java products, like performance analysis, testing, and CASE tools.
In general, Java development tools targeting the embedded market are not as mature as
those targeting desktop developers. Most embedded tool vendors are building on top of C-based legacy technology, and they often have less experience with Java than comparable vendors of development tools for the desktop. I've found it useful to compare embedded Java tools side-by-side with their desktop counterparts. By knowing what's available in the top-of-the-line tools, you can more wisely choose the features that really matter to you and make sure your embedded tools have them.
Because Java
is designed for portability, it's often possible to develop, test, and debug significant portions of an embedded application in a feature-rich desktop environment. I do this by putting "interface objects" at the boundaries of my Java code. These interface objects are responsible for services like timing and low-level I/O. On startup, the system installs only the set of interfaces that is appropriate to the environment in which it finds itself, rather like plug-and-play device drivers. Our current system
uses four such configurations-one each for development, simulation, emulation target hardware, and "real" target hardware. The configuration can also be fixed at compile time by using compiler conditionals or simple changes to the source.
Java libraries
In a conventional (JVM-based) Java runtime environment, classes in the base libraries are written in Java, just like all the other classes. The developer has access to the base class
source code, and can modify or enhance it as needed.
Statically linked Java doesn't necessarily work that way. The base class libraries that come with an AOT compiler might be written in a lower-level, higher-performance language. They might not be the least bit object oriented on the inside. But the base classes only need to look object oriented from the outside. What goes on behind those method entry points is really none of the Java programmer's business anyway, so long as the classes work as
advertised.
Some Java libraries like java.io and java.net make extensive use of platform services through native methods (calls to functions written in C or assembly). It's possible to port a base class to a new platform by simply implementing each of its native methods on the target platform, and using an AOT compiler on the Java code. But this may not be the best approach, for at least two reasons:
- Base libraries made from compiled Java code probably won't be as fast as they could
be. These libraries may have been written with portability as the primary requirement
- Vendors that use Sun's Java code to make their base libraries will have to pay license fees to Sun. It probably makes more business sense for them to develop their own ("clean-room") libraries
Libraries built from Java code will probably be the most flexible. What's going to happen to those optimized clean-room libraries when Sun comes out with the next version of the JDK and it has changes
in the base classes? Is the compiler vendor going to update that hand-hewn code to reflect the changes? How soon? Will a project get stuck using an older version of Java-including older language constructs-just because the new Java libraries haven't been ported yet? For projects that only want to use the core features of Java, it probably isn't as important to keep up with the latest JDK version. So this may be less of an issue for developers using AOT compilers.
The structure of base libraries is
also a factor when eliminating dead code in order to minimize footprint for application deployment. Base classes built from compiled Java can be recompiled to exclude classes and methods that are not used by a given application. Clean-room libraries should also provide some means of excluding code that an application doesn't need.
Build process
In my organization we use RCS for revision control of our C code. The source files to
build any given C-based system all reside in a single directory, and our staff have developed a rich set of scripts and utilities for managing the source within this environment.
The package-structured nature of Java code is fundamentally incompatible with this system. To permit reuse of packages as components, Java source code needs to be organized into a directory tree whose structure parallels the package hierarchy of the code. The make process must be able to select components from the source tree
that are to be included in a given build. For my group this means that the staff have had to develop new scripts and processes in order to incorporate Java into our traditional build processes.
The AOT compiler we are currently using comes with a "driver" program that can essentially manage the entire build process, at least for wholly Java applications. One or more of the classes in the source code tree need to have
static main()
methods, just as if we were building a standalone Java
application targeting a JRE.
The compiler's "driver" program compiles the root source file first. Any references to symbols outside a class's own package must be resolved at compile time through
import
statements in the code. The driver works its way through the transitive closure of method calls from main() in the root class, compiling each module and statically linking them all together into a single object module.
This one-step compile-and-link procedure is a simple way to get a
wholly Java application up and running, but it doesn't provide the flexibility needed to build real-world embedded systems. Specifying different compilation options for different files is likely to require the use of a traditional makefile, especially for systems that also include code written in other languages.
Note that static linking precludes some of the really fancy things that can otherwise be done with Java's reflective capabilities. For example, it is not possible to construct the name of
a class at runtime and then "load" the class. Depending on the way a given compiler handles class metadata, certain portions of the
java.lang.reflect
API may not be available.
RTOS integration
The next step is to link the object module generated by the Java compiler into an image that can be run on a given target platform. This is where, for example, the creation of a threaded object:
class AuditDaemon implements Runnable
{
public void run()
{
//System audits go here
}
}
public AuditDaemon auditor = new AuditDaemon();
is mapped to the creation and scheduling of an OS task. To improve portability, some Java compilers target a standard threading API like POSIX pthread.
The most common way of accessing low-level resources from Java is the use of native method. These look like any other method to their clients in the Java
world, but they are implemented in a language other than Java-usually C or assembler. Inside a native method, your code is not subject to any of Java's security restrictions or runtime error checking, so you have unrestricted access to platform resources.
A Java toolset for embedded development might also come with libraries that give the developer access to machine resources in a convenient way. For example, at least one comes with a set of classes that provide simple, object-oriented access to
regions of target memory.
Garbage collection
Most desktop Java systems use a blocking garbage collector (GC), which means that when the garbage collector runs, all Java threads stop until it's finished compacting the heap. For applications that can tolerate this behavior, a blocking collector probably offers the best average
performance.
1
However, for many embedded applications that have even
soft real-time requirements, blocking garbage collection is just not acceptable.
One alternative is to not use a garbage collector at all. Some specialized Java libraries support an operation similar to
free()
, which allows the programmer to deallocate an object's memory in the conventional C/C++ programming style. Even if your tools don't support this, it may be possible to preallocate all of the memory that will be used by an application, making the garbage collector unnecessary. This
could mean designing the system to use only permanent objects, or reusing unneeded objects by returning them to pools for reallocation. Systems that don't use a garbage collector can eliminate the collector itself, thereby reducing the memory footprint.
Is it possible to have both the convenience and safety of a garbage collector and some measure of runtime determinism? Yes, certain things can be done, but remember that determinism will cost you something.
Maybe the garbage-collected
portion of an application's memory is small enough that the GC can compact the entire heap in less than the desired response time. A blocking GC might be just fine for an application like this. Some garbage collectors allow certain objects to be designated as permanent, meaning that they don't have to be scanned by the collector when it runs. If an application's memory can be partitioned such that only a small portion of it needs to be scanned, the worst-case execution time of the GC can be reduced. It's also
possible to invoke Java's GC explicitly, using System.gc(). If an application knows that it definitely won't need to respond to an event for a given period of time, perhaps it can seize that opportunity to compact the heap.
Some Java tool vendors offer an incremental garbage collector as an option. With this feature it's possible to bound response latencies by telling the collector to never run uninterrupted for longer than the application's desired response time. Many incremental collectors are
also pre-emptible, meaning that they can be interrupted at will.
That may be well and good, but what happens if a system gets so busy processing interrupts or other events that the GC never gets to run at all? Eventually the application will be unable to allocate more memory, and it will have to let the collector run at least until it can fulfill the current allocation request. To keep this from happening one can increase the priority of the thread or task running the GC, but this is done at the
expense of application code. Also, because an incremental collector has to maintain extra state information and do extra work, it will have lower average throughput than a blocking collector. Depending on its implementation, an incremental collector might add a small amount of memory overhead to every object in the garbage-collected heap.
NewMonics' PERC system adds another tool for making Java code deterministic. PERC is based on a runtime executive whose job is to schedule Java tasks. Each task is
required to supply information about the resources that it will need in the worst case, including memory and execution time. Before scheduling a new task, the PERC executive examines its current workload and decides whether the proposed task can be accommodated while still meeting all previously promised task completion times. If the new task cannot be accommodated, the executive refuses to accept it for scheduling. Of course the executive adds overhead at runtime, but it does permit the development of
reusable software components with specified real-time characteristics-quite an intriguing notion.
Although garbage collection takes most of the heat for Java's lack of runtime determinism, it isn't the only factor limiting Java's suitability for real-time programming. The Java language spec defines the
synchronized
modifier in such a way that it is not generally possible to predict how long a blocked thread will wait to obtain a lock. An experts group has been chartered under the Java Community
Process to define a specification for real-time
Java.
2
Debugging
Getting the most out of object-oriented development means a highly interactive, fast-paced cycle of design-code-compile-link-test. Such a process demands a remote source-level debugger that's well integrated with the Java compiler. Many of the available Java debuggers for embedded development started out as C tools and have
only recently been extended to handle Java as well. That's fine, but it's also one of those opportunities to carefully compare a remote target debugger with the one in a state-of-the-art desktop Java IDE. Make sure your host-based target debugger has the features that are important to you.
Here's an example. While evaluating one debugger we found that it would always display the structure of objects on the execution stack according to their type as declared in a given stack frame. That may be
exactly what you need for looking at C
struct
s, but it's not sufficient for inspecting Java objects whose actual type may be something completely different than their role in the current frame. This was one case where side-by-side tool comparison was useful. Without being accustomed to the debugger in my luxurious desktop IDE, I may not have noticed this missing feature until it was too late.
Most embedded systems that use Java will probably include some C or C++ code as well. Such
mixed-language systems require a debugger that works smoothly with all the languages they employ-usually C, assembler, and Java. In this area, their C legacy may actually give embedded tool vendors a leg up on the desktop world.
High-performance Java
I always try to follow Kent Beck's advice about the optimization process:
- Make it work
- Make it right
- Make it fast
Once you have a properly
factored system that meets the bulk of its requirements, you are ready to accurately analyze performance and make improvements that won't wreck your design.
Before torturing your beautiful OO designs in an effort to wring every last instruction cycle out of your Java code, make sure you understand why you're using Java in the first place.
Object-oriented techniques offer programmers some powerful tools for managing complexity in problem domains that are inherently complex. Elegant and
well-abstracted designs are particularly important for systems that need to be quickly grasped by many people, that are constantly being revised and extended, or that need to be reused in other contexts-other devices in the same product family, for example. But this isn't really the case for a lot of embedded code, especially code that's "close to the metal." So think twice before deciding to give up speed and space to gain flexibility that you don't really need.
The embedded systems I've worked on
tend to be organized as shown in Figure 1.
Figure 1: Organization requirements for the various subsystems.
Requirements for the various subsystems look something like Table 1.
The upshot of this model is that it generally becomes more tempting to use OO techniques as you get higher up in the architectural layers. I think the case for OO is strongest in modeling application logic, for the
following reasons:
- The user requirements for application logic tend to be the most complex. Anything that improves communication between marketers and developers-like OO modeling-can pay off big
- Application logic is frequently at the heart of the value added by an embedded device vendor. As such, it is likely to be valuable intellectual property that should be made as reusable as possible
- The very act of building a cross-product enterprise data model-particularly one that is
accessible to marketers-can suggest new directions for product evolution
Memory management.
Just because you have a GC doesn't mean you should completely forget about memory management. Thanks to that garbage collector, you no longer have to track references to a data record across multiple threads. You never again need to worry about crashing the whole system just by freeing an object too early. For all the things it does for you, doesn't the GC deserve just a little bit of
consideration in return?
Try to be aware of the kind of garbage being generated by your code, including Java code that you use but that you didn't write. Some very well designed OO code creates and immediately discards a lot of small, transient objects.
Know the characteristics of your GC and make sure it matches the needs of the application:
- Blocking collectors run when available memory dips below a configured threshold. When a blocking collector is running, no other Java
thread can use the heap until it is finished
- Incremental collectors spread their activity over time so they never get very far behind
- A preemptible collector doesn't need to run to completion each time it is started
- Defragmenting collectors rearrange the heap to maximize the size of free blocks
- Generational collectors partition their heap space into differently managed areas according to object lifecycle information gathered at runtime. Many generational GCs allow the
user to configure initial sizes and management policies for their partitions
Exceptions.
Java's simple but powerful exception-handling capabilities allow the developer to more precisely specify the external behavior of software building blocks. In my first experience with compiling and embedding Java I got carried away with this idea, throwing exceptions to indicate conditions such as "I found the database location but it's empty. Shall I put a new record there?" The performance
looked great running on my 333MHz PC system during development, but on a 25MHz target hardware it was a dog. The profiler showed that about 40% of the time, my too-slow code was performing exception handling. In this case I was able to speed the code way up by using special "undefined" placeholder objects instead of throwing exceptions. Nowadays, I limit my use of exceptions to occurrences that truly are uncommon.
Strings.
Java's innocent-looking
String
concatenation operator (the
plus sign) allows programmers to write code that looks reasonably efficient but is not. This operator allocates a new
String
whose combined length is the sum of the lengths of the two operands, and copies the contents of the two operands into the result. The operands then become garbage unless other objects still refer to them. All this is not so bad until we write an expression like:
string1 + string2 + string3 + string4 +
An optimizing compiler can do a fine job with this
if all the operands are available at compile time. If not, however, this expression is going to be evaluated at runtime. It will generate nư1 throwaway
Strings
-where n is the number of concatenations-and the GC will be working overtime to keep up.
A more subtle form of the same problem can occur when concatenating
Strings
as part of a recursive algorithm. The program may only do one concatenation per recursion, but if the recursion gets deep the garbage can really pile up.
The solution in either case is to use the
StringBuffer
class for multiple concatenations. Create a single new
StringBuffer(initialSize)
whose initial size is large enough to contain the final result, and
append(eachOperand)
to the buffer. When finished appending, the buffer's
toString()
method can efficiently copy its contents into a new String.
Device control. High-performance embedded Java code doesn't just have to run fast-it also needs to provide interfaces to code
in other languages. If the other code is responsible for device control or synchronous protocols, it probably has more strict timing requirements than the Java code, and its performance had better not be limited by the speed of Java.
This problem came to light in one system where we built a configuration management system in Java, with a device control layer below using the managed database to determine its reactions to real-time stimuli. The shared database is a containment tree of addressable
managed objects that can be navigated and queried for the values of their parameters. In our first implementation, the objects stored their parameters, which are typically simple data types, in Java data structures like Vectors and Hashtables. Client code written in C would gain access to the parameter values by invoking methods on the managed objects such as
parameterNamed(aString)
or
parameterNumber(anInt)
. While this is a fine example of encapsulation and data hiding, it forces
timing-sensitive C code to wait for Java methods that look up their result using elegant high-level data structures. Not such a great idea.
The solution in this case was to use a different representation for the parameter data, in which parameter values were stored directly in named fields of the managed objects. Any encoding and decoding necessary for interfacing to the higher-level configuration management code is now done in simple accessor (get and set) methods for the fields of interest. A utility such as
javah can be used to generate a C header file showing the layout of the managed object as a struct; C code with access to this struct definition can then read the parameter fields directly in their native format.
As embedded devices become more complex and more networked, object-oriented development and Java will continue to become more appealing for the developers who must use and maintain them. However, efficient use of machine resources will continue to be a critical success factor for most
embedded software project. Ahead-of-time compilation is one of several techniques that will permit some of us to have the best of both worlds.
Greg Wickham is a software engineer at Calix Networks. He has been developing both object-oriented and embedded applications for the past 10 years. Greg holds a BS in engineering physics from the University of Colorado at Boulder. He can be reached at
greg.wickham@calix-networks.com.
References
1. As measured by the fraction of total CPU time spent allocating and deallocating memory.
Back
2.
www.rtj.org
Back
|