How quickly an embedded design responds to a particular function key being pushed on an information appliance or to an event triggered by the appliance doesn't just depend on how well the device was designed. Nowadays it depends on things that used to be far outside the problem domain of traditional unconnected, or locally connected, embedded systems.
In connected embedded systems, response time to an event depends on how fast the routers and the servers on the intranet, the virtual private network, or the World-wide Web can respond. This is especially true as companies such as IBM, Microsoft, and Sun shift to services-based models as a way to counteract the shift away from platform loyalty and take advantage of the Internet. They want to shift to a model in which software is maintained on a server and sell services on a transaction-by-transaction basis. The designer of a net-centric embedded device now has to deal with a problem domain as large and as complex as the servers and routers that link the device to the controlling system.
In any design that depends on access to resources located remotely, as is proposed by a number of services-oriented models for future Web activity, it will be necessary determine not just the parameters of the problem. You'll also have to look for sources of delay that require modifications of the way the various elements interact, which ones are related to the nature of the Internet medium, and which depend on the architecture of the various nodes.
The architecture of the various computing devices will affect the performance parameters important to an embedded designer; specifically in the microprocessors at the heart of every server.
Embedded designers are going to have to look closely at the architecture of the processors in that server-mediated chain of causality imposed on all connected net-centric computer systems. The demands that are placed on Web servers are much different now from even a year ago, and they will differ even more in a few years. Processor architectures will have to reflect that change.
The nature of the server loads in the context of both present and future Web uses is amazingly diverse. It encompasses on-line transaction processing; Web serving to a variety of applications on PCs, handheld information appliances, wired and wireless telephones, and, increasingly, a variety of embedded net-centric control applications; a much richer object-oriented program environment, collaborative groupware and a variety of distributed computing methodologies including Java and Jini, CORBA, and C++, and less known languages such as Smalltalk, Limbo, and Microsoft's two new ingredients to the dialect stew: C# and C–.
Unlike older languages, object-oriented languages such as C++ and Java make extensive use of virtual pointers that lead to branches, resulting in miserable branch-miss prediction rates. The use of dynamic memory allocation is also higher, which leads to more allocation of memory from the heap. In addition, memory from the heap is more scattered than memory from the stack, causing higher cache-miss rates. And Java's highly touted “garbage collection,” which is problematic in most deterministic embedded applications, also causes problems in server-side Java implementations. Garbage collection has access patterns on servers, in particular, that lead to poor cache-miss rates because it references many objects and uses each only a few times.
What is common to all of these applications is that on servers they make the instruction data working sets very large. Another common characteristic is that the workloads are inherently multi-user and multitasking. This large working set and the high frequency of task switches cause the cache-miss rates to go even higher. In addition, such applications also seem to have data that is often read-write shared.
What should an embedded developer be looking for in the server architectures? One thing to keep watch for is the same thing that is of concern in the choice of a processor in deterministic applications: the ability to hold the state of several tasks and/or threads. Just as in embedded applications, in a server the support for multiple threads that provide additional levels of instruction-level parallelism is important in allowing the CPU to utilize all of the resources at hand.
Of course, the multi-user, multitasking nature of most commercial servers should provide an abundance of natural thread-level parallelism. But engineers have told me that in the context of the new server application environment it is the wrong kind of multithreading, fine-grained instead of coarse-grained. In fine-grained multithreading, which is also common in many embedded processors, a different thread is executed every cycle. In coarse-grained multithreading, a single foreground thread executes until some long latency event, such as a cache miss, happens, resulting in a switch to a background thread. This seems to provide an extra degree of flexibility in the new server environment because it enables a single thread to consume all execution cycles as in a traditional non-threaded server, but only when the thread has no events that trigger a thread switch.
There seem to be a lot of other server-based issues that should be of concern to an embedded developer of a connected application and that will have significant impact on how those applications will be designed in the future. One really interesting issue is the widespread use of distributed-object-based computing based on CORBA and Java. In the present generation of 32-bit servers, when load is increased the 32-bit address handles now available may not be enough to deal with the overwhelming number of objects that will have to be retrieved off the Web.
Some engineers feel that simply shifting to server architectures based on 64-bit processors will go a long way in solving this problem. With the luxury of 64-bit architectures and memory addressability, objects such as Java applets and Web pages could be retrieved from across the Web from any node and forwarded to the requestor without any of the current back-and-forth communications traffic that is currently accepted as necessary overhead.
It also appears that 64-bit processors will deal with some of the issues I raised earlier in this column because the additional memory addressability will make it possible to create arrays of processes and threads that are multiple copies of the originals without any context switching, creation and destruction overhead. Because in a 64-bit address space each of these array elements can now be located in a unique address space, they will not need multithreading.
Is it all that simple? Or are there complexities I have not heard about? I have a lot of questions about the issues I have raised here and I depend on you to give me the answers. So, give me a call or e-mail me. I'm interested in hearing from you.
Bernard Cole is the managing editor for embedded design and net-centric computing at EE Times. He welcomes contact. You can reach him at or 520-525-9087.