The changing role of software as hardware -

The changing role of software as hardware

If you look at software's historical role, you'll see that it has had little to contribute to the functionality of a device. More often than not, software has served more or less as the “case” or enclosure over a hardware foundation that implements all of the actual features. Software was much like the enclosure around a TV set that covers all the oddly shaped chunks of electronics doing all the work. Software typically handled surface issues, such as providing the human-machine interface (HMI) that enables a person to access hardware features, or acting as the system host, or controlling the hardware components handling all the real work.

Take personal media players. From a technological standpoint, products like the iPod could have been developed a decade ago, but they would have been the size of a suitcase. Understandably, more people are interested in purchasing one now that they can fit it in a shirt pocket and run for hours on a pair of AA batteries. Achieving a practical level of miniaturization and power consumption is clearly due to silicon advances and hardware design. The current iPod's software could have “encased” the suitcase iPod but wouldn't have made the product practical.

The historical partitioning between hardware and software reflects the value that each side contributes to the finished system product, as shown in Table 1. Over the 30-year evolution of semiconductor technology, hardware has almost exclusively provided the benefits of integration, defined system economics, determined product practicality, and–often but not always–enabled the system to be more significantly differentiated.

Table 1: Traditional hardware/software values

But that historical partitioning of software and hardware is changing. Increasingly the heavy lifting that was previously the exclusive domain of hardware design will be enabled through software as hardware. This change will be an evolution rather than revolution. And it's already under way.

The power of integration
In its most basic terms, the argument for implementing features in hardware over software is predicated on the assumption that a hardware approach (such as an ASIC) provides both the highest performance and the lowest cost for volume applications. The same argument can be applied to FPGAs, in that programmable logic provides a high level of performance at a cost-effective price for applications that have somewhat lower volumes that don't justify an ASIC.

In the typical ASIC or FPGA design model, hardware integration is the act of bringing together discrete IP blocks in a single device. For example, to create a video system, a designer would bring together several IP blocks such as a video encoder, video decoder, color space converter, and so on. If the need for a new function arises, such as digital rights management (DRM) mechanisms, the ASIC designer can introduce a discrete encryption block for that purpose.

ASIC and FPGA designs are thus the sum of the IP blocks from which they're built. Chasing after Moore's Law has led designers to think in terms of gates and to treat IP blocks are like stones in a wall with mortar in between. From a time-to-market and IP reuse standpoint, the use of discrete IP blocks in this fashion is very efficient. A designer places IP blocks and stitches them together with buses and interconnect, such as the hypothetical example shown in Figure 1. This approach is efficient in managing both development cost and manufacturing cost. Development cost and efficiency derive from using preverified IP and eliminating flexibility that must be both anticipated and verified. Also, the primary manufacturing cost advantage of an ASIC comes from the fact that its features are finely tuned to the requirements of a particular application. As the requirements of that application change, it's often the case that the IP blocks require adjustment to accompany any changes. Given the high cost of reengineering an ASIC, small variations in the feature set are often cost prohibitive to implement. Of course, if the IP is implemented in software the game is changed–but only if software can do the heavy lifting.

Figure 1: In traditional ASIC design, designers place discrete IP “bricks”and “mortar” them together with interconnect

Managing the balance between an ASIC's performance, cost, and power has become more challenging as standards proliferate. In order to support a handful of standards an ASIC may require a handful of IP blocks, one for each standard. When one standard is being processed, all the other dedicated resources for other standards sit idle but still add to the cost and size of the device, reducing the cost-effectiveness of using an ASIC.

If instead of statically implementing hardware IP, software does the heavy lifting, a single programmable/configurable hardware resource (in other words, a processor) can implement all of the standards. Certainly, the processor may have larger silicon area than the hardware resources needed by an ASIC to implement the same format. However, the sum of the hardware IP required to implement several standards will narrow the gap between the processor solution or may even exceed the processor solution.

The drive to software
If it was possible to integrate, differentiate, and implement low-cost practical products with their features implemented in software the benefits would be numerous. For example, a single “platform” differentiated in software could enable a range of products from low end to high end, differing only in their software. The same range spanned by a “hard” ASIC might require multiple devices. The temporal efficiencies of processors allow the same hardware resources to be applied to more than one function. And clearly, field-upgradeable functionality is a unique software value, whether it's to address bugs or to evolve product features in a fast-changing market.

The uncertainly of evolving media standards has increased the risk of implementing features in a fixed ASIC to the point where the majority of personal media players use mostly software-programmable processors; software is doing the heavy lifting. Even if standards settle, once a function moves into software, it never goes back to hardware. That's because the inherent flexibility and resource reusability of software means that once a function can be executed in software, it will remain in the software domain.

The efficiency of reusing resources, as well as the flexibility of implementation that software enables, offers sustainable efficiencies not possible through hardware. Software is easier and faster to design, test, verify, scale, modify, and upgrade than hardware. This holds true even for commodity functions. Consider the 56kbs modem, which is almost entirely implemented in software. The only hardware used are interface components. The signal processing is usually performed on the x86 processor. But the first modems were dedicated fixed hardware. They were followed by DSP-based products that could implement multiple standards. In the latest generation of modems, the modem workload is easily handled by the multi-GHz x86 processor. Modem hardware is now modem software.

One can even go as far as to say that the only reason not to implement a function in software is that it cannot be accomplished efficiently enough, such as with leading-edge functions that stress the current limits of silicon technology. In other words, if the integration, differentiation, cost or practicality goals can't be met any other way the solution is a hardware design. This argument applies to both ASICs and FPGAs.

Another factor that has driven innovation and product differentiation into the software domain is that the cost of developing hardware continues to rise exponentially, both for ASIC and FPGA designs. In fact, the cost of producing an FPGA design is at level comparable to the cost of creating an ASIC only a few years ago.

Functionality in software
The development landscape is clearly changing. Programmable processors as the “enclosure” over the hardware are becoming a faded legacy as the partition between software and hardware design crumbles.

The cost-effectiveness of ASICs is based upon the assumption that no other implementation can perform the same function in less silicon area. However, when features are integrated in hardware, it is a linear process. Since IP blocks are independent of one another, the number of gates required is the sum of the IP blocks in use. Hardware IP is additive: the more blocks used, the more the final chip will cost as shown in Figure 2a.

Figure 2: ASIC functionality (2a) is additive: the number of gates required is the sum of the IP blocks. Software functionality is quantized (2b), the same hardware resources can be used for multiple functions by time-slicing

When features are integrated in software, the same hardware resources can be used for multiple functions by time-slicing a programmable processor to perform various functions as shown in Figure 2b. Thus, the sum of hardware IP blocks can be more expensive—#151;in terms of silicon, board area, and power consumption and especially verification—#151;than the sum of functionally equivalent software blocks. But even if the gap is only narrowed the inherent flexibility, time-to-market advantages, and lower risk increasingly tilts developers toward software on processors.

With software-configurable architectures, a whole new level of efficiency can be implemented. While ASIC implementations often offer high performance, the discrete IP block methodology for designing ASICs leads to a variety of operational inefficiencies. Several IP blocks may have the potential to share rather than replicate a significant number of gates or they may have generalized interfaces that are functionally redundant. The partitioning of IP blocks is, to some degree, arbitrary. For example, it may be efficient to merge a codec block with an encryption block. Data that is already available in registers can be passed directly from codec to encryption without passing through temporary memory.

In order to do this, however, developers need to be able to puncture the abstraction envelope of individual IP blocks, merging their features to increase parallelism, eliminate inter-block data passing, and capitalize upon other potential efficiencies. To achieve this with an ASIC or FPGA requires significant redesign of the IP blocks as well as a separate ASIC for each redesign. Traditionally, such efficiencies have been left on the table, so to speak, because the cost of capturing them exceeds the achievable gains. From a software perspective, such inefficiencies can be discovered and eliminated by the inherent flexibility of software and optimizing compilers.

Implementing functionality in software also introduces the concept of headroom. Processing resources are introduced to a design in discrete chunks. Developers can easily scale designs by introducing more processing resources, either through a faster processor or more processors. By underutilizing available resources, developers can leave room for the introduction of new features at a later date. This is not possible when features are locked in hardware. Additionally, because working at the algorithmic or functional level is easier for designers than at the hardware level, the development cost and time-to-market for software compared to hardware will be less as well.

Perhaps one of the most important swing factors in favor of software is the generic nature of a programmable/configurable processor and how this lowers the overall risk of a design. Such a processor becomes specific by virtue of the software written for it. Software is easily managed and modified and can even be updated when it is in the field, enabling developers to upload bug fixes or incremental improvements, or to introduce additional features such as supporting a new multimedia standard in ways simply not possible with an ASIC.

Another key factor is managing the optimal balance of performance, size, power, and cost of an IP block. Tuning an IP block to match the specific requirements of a particular application is a manual and time-consuming process, whether it's implemented in hardware or software. As a product line expands and changes, keeping IP blocks tuned for each instantiation of the product takes more and more work. This process is much easier to achieve in software, especially as development tools continue to abstract functional details further and further away from the actual implementation. Through software, developers can incorporate layers of abstraction in ways that they can't with hardware for reasons of cost, efficiency, and reusability.

Software as hardware
The aim of “software as hardware” is to enable software to do the heavy lifting of integration, differentiation, and achieving cost and time-to-market efficiency. While we've established the potential benefits of expanding software's role, what evidence can we point to that there's a trend toward a new role? Actually, the signs are everywhere.

The classical–say 1980s–model of an embedded system has exactly one processor. Everything else is defined in hardware blocks integrated onto a single chip or across a board. The evolution of software as hardware began with the introduction of DSP processors. In addition to the main processor, one or more digital signal processors replaced hard-logic functions in signal-processing applications. Before DSPs, signal processing was done in fixed hardware.

In the early 1990s, the explosion of ASIC design and the emergence of synthesizable processor cores from ARM and MIPS continued the trend. Initially ASICs were one “soft” processor core surrounded by logic and memory. Today most ASIC designs contain one or more processors in addition to the “main” processor. What would in the past have been X logic IP blocks is now Y processors plus software. The benefits are clear and this is now standard practice.

By the late 1990s the configurable processor emerged. Tensilica and ARC pioneered the idea of allowing developers to tailor processors to better perform their intended function. The level of configurability was initially fairly limited. For example, users could select cache size or choose whether or not to include a multiply-accumulate unit. Later, fully configurable instruction sets emerged at both ARC and Tensilica, later to be joined by MIPS. These extended instructions have proven to dramatically shift performance of these processor cores upward. By tailoring the instruction set of the processor to the application(s) it will be running, the processor often becomes an order of magnitude more powerful. In this way, more functions that would have previously required logic design or hardware IP could instead become software running on a configurable processor. These performance gains, though, come at cost of adding fixed gates to the processor. Thus, these processors are only configurable at design time. The processor can only be optimized once.

Whereas these configurable processors are almost always found in ASICs, processors and configurable processors now abound in FPGAs as well. Xilinx's MicroBlaze soft processor or Altera's Nios core enable either fixed or partially configurable processor cores to run on FPGAs–obviously in lieu of logic IP. Thus, even with a rapidly expanding gate density, developers are choosing to use software wherever they can. Since the programmable logic can be redesigned in system, these processors can be optimized more than once–although it's not practical to dynamically configure FPGAs.

Another approach in the off-the-shelf processor category is one that can be configured through software. Its instruction set is continually configured for the application. Stretch's software-configurable processor implements custom instructions dynamically in an “instruction set extension fabric” comprised of programmable logic. The aim is to replace C/C++ functions with custom instructions implemented in hardware so that dozens or hundreds of “regular” instructions collapse to a single-cycle “software as hardware” instruction.

The effect of software-configurable processors is to shift the partition between hardware and software, increasing the number and type of functions that can be more efficiently implemented in software than in hardware. By endowing software with the ability to affect the underlying processor architecture, a system can adapt its structure to more optimally match the problem at hand rather than simply processing a static instruction stream.

For example, to create a wireless WiMAX system using traditional hardware/software split as shown in Figure 3a requires three discrete components to implement each of three distinct subsystems: a processor to handle MAC calculations and network-stack management; an ASIC, FPGA, or special fabric to perform OFDM processing; and a security processor with AES/3DES capabilities to provide security features. A configurable processor could implement these WiMAX subsystems within a single chip–with all the features defined through software–by configuring the same pool of hardware resources dynamically to implement the three subsystems as shown in Figure 3b.

Figure 3: 3a: WiMAX basestation in traditional hardware/software architecture requires multiple components for each WiMAX function. 3b: In software configurable architecture (Stretch S5500) all WiMAX subsystems are implemented in software. Programmable logic in processor datapath reduce each kernel from hundreds of instructions to a single instruction.

One design, one language
As the shift toward software continues, the abstraction level moves up. Hardware features are then described in the same language and development environment as software. The end result is that software engineers will be able to design hardware without a detailed understanding of the architecture's underlying hardware design. The compiler is responsible for allocating hardware resources, not a person. At this point, software truly becomes hardware.

With the traditional partitioning there are two design teams working independently on the same design: the hardware team and the software team. Coordinating the two is a challenge. One must not only create the product specification but partition the solution with some software/hardware boundary. The crispness of that specification, fixing it early, and resisting changes are all critical to meeting the timeline. But the boundary is actually artificial. It's due to the inability of software to do hardware's job.

When the majority of features are implemented in software there is no disconnect between design teams. Hardware design becomes more a matter of providing general, configurable hardware resources rather than fixed, specialized feature set. Effectively, the hardware team provides as many processors as are needed to absorb the requirements of the software. At this point, the shift of functionality from hardware to software is complete.

Because software and hardware features are described at the same time using the same language, the development environment is able to closely match the use of hardware resources with the software implementation. Performance is not compromised; it is optimized. This reduces significantly the class of problems that must be directly implemented in hardware because they can't be implemented in software. As a consequence, developers will be able to develop leading-edge functionality immediately in a software-configurable architecture. Additionally, automating the translation of software to hardware enables developers to evaluate more variations and silicon implementations. Iteration time is critical to productivity. A developer can recompile in minutes. Hardware iteration timescales span hours to days.

The development costs for a 65nm ASIC are expected to be between $50M and $100M, so it's crucial that platform technologies be available that enable integration, differentiation, and economic viability without taping out a new chip for each application. The FPGA suppliers clearly believe this will be their destiny. But the signs are clear that FPGAs programmed in RTL will share the stage with platform devices both configured and programmed in a high-level software language, probably C/C++.

From this perspective, the area/cost advantage of an ASIC will erode as features shift to the software domain. “Software as hardware” challenges the fundamental idea that an ASIC is the most efficient way, in terms of processing and cost, to implement a given function. Not only is the cost advantage eroding but the development cost and risk of spinning an ASIC are rising exponentially. The physical cost of configurable silicon doesn't need to match that of an ASIC for it to make economic sense when one considers the combined cost advantages of time-to-market, design flexibility, ease of maintenance, shrinking design cycles, and the significant reduction in design complexity that results from collapsing a dual hardware/software development process into a single software development process.

Considering all of these factors, it becomes clear that software is quickly closing the gap and will soon overtake hardware design. This won't eliminate hardware design as a discipline nor does it represent a reduction in the number of hardware-design jobs. Circuit design wasn't killed by logic synthesis. It's alive and well in libraries, I/Os, memory, and mixed-signal design. Logic synthesis simply enables the majority of developers to move up to a higher level of abstraction. Hardware design will thus join the list of specialties, like semiconductor process design, circuit design, and EDA, that make up the critical infrastructure of the semiconductor industry but are practiced by a declining percentage of engineers.

Gary Banta is CEO of Stretch, Inc. Prior to co-founding Stretch, Gary was vice president of marketing at Silicon Spice/Broadcom, MoSys, Inc. and Plus Logic. Gary also ran Analog and Telecom Marketing at National Semiconductor. Gary has a BSEE from University of Wisconsin.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.