Hardware/software verification enters the atomic age: Part 1 - Embedded.com

Hardware/software verification enters the atomic age: Part 1

While the 1940s and 1950s saw the dawn of the nuclear age for power and weaponry, a half a century later we're just now entering an atomic age of a different sort. Having little to do with radioactivity, this atomic age is about software and hardware design, specifically using atomic transactions, by far the most powerful technique for managing one of our most tricky problems, concurrency.

Even if you haven't heard of atomic transactions before, they're likely to have an impact on your future work, whether you design embedded software, architect systems, perform verification, or design hardware. Atomic transactions are not a new technology, but the increasing complexity and amount of concurrency in systems is pressing their emergence in all of these areas.

So, just what are atomic transactions? And, how are they already manifesting themselves in embedded systems software, processor, and hardware design? After introducing atomic transactions and ways in which they're revealing themselves, this article will take a special look at their implications to verification.

Atomic Transactions 101
If you are familiar with database transactions, you've probably already been exposed to atomic transactions. When processing multiple, simultaneous transactions, databases use atomic transactions to maintain consistency. Imagine a couple who share a bank account and simultaneously make two withdrawals of $100 from separate ATM machines across town (Figure 1). Several steps are involved in each withdrawal: A) checking the balance; B) providing the money and calculating the new balance, and; C) updating the balance with the new amount.

Now imagine that each step were independent (1A, 1B, and 1C for one of the couple and 2A, 2B and 2C for the other) and performed in the following order: 1A, 2A, 1B, 2B, 1C, 2C. What would happen? Both would receive $100, but the bank account would only be debited a total of $100, instead of $200.

View the full-size image

Databases use atomic transactions to prevent this inconsistency from occurring and ensure that steps A, B, and C happen together and indivisibly for each withdrawal transaction. Atomic transactions ensure that the two withdrawals are performed in one of two orders: 1A, 1B, 1C, 2A, 2B, 2C or 2A, 2B, 2C, 1A, 1B, 1C.

Atomic transactions have the same properties that we saw in the bank ATM example. These transactions are atomic , which means that they are indivisible and all-or-nothing. Atomicity ensures that multiple, related operations occur indivisibly, as though they're happening in isolation and without having to be concerned about other system activities.

And, atomicity ensures that multiple, related operations are all-or-nothing–all of the operations in a transaction are completed or none of them are completed. These properties ensure a consistent state relative to all other transactions in the system.

Many high-level specification languages for concurrent systems express concurrent behavior as a collection of rewrite rules, where each rule has a guard (a Boolean predicate on the current state) and an action or set of actions that transform the state of the system.

The active rules, where the guards are true, can be applied in parallel, but each rule operates as an atomic transaction –each rule observes and ensures a consistent state relative to all other rules in the system.

This atomicity model is popular because it enables concurrent behavioral descriptions of the highest abstraction and simplifies the reasoning around correctness.

TRS expresses atomicity
One example of a high-level specification language using this computational model is the term rewriting system (TRS) . TRSs offer a convenient way to describe parallel and asynchronous systems and prove an implementation's correctness with respect to a specification. TRS descriptions, augmented with proper information about the system building blocks, also hold the promise of high-level synthesis. High-level architectural descriptions that are both automatically synthesizable and verifiable would permit architectural exploration at a fraction of the time and cost required by current commercial tools.

A TRS is defined as a tuple (S , R , S 0 ), where S is a set of terms, R is a set of rewriting rules, and S 0 is a set of initial terms. The state of a system is represented as a TRS term, while the state transitions are represented as TRS rules. The general structure of rewriting rules is:

where s 1 and s 2 are terms, and p is a predicate. We can use a rule to rewrite a term if the rule's left-hand-side pattern matches the term or one of its subterms, and the corresponding predicate is true. The new term is generated in accordance with the rule's right-hand side.

If several rules apply, then any one of them can be applied. If no rule applies, the term cannot be rewritten any further. In practice, we often use abstract data types such as arrays and FIFO queues to make the descriptions more readable.

With proper abstractions, we can create TRS descriptions in a highly modular fashion. And using a compiler for hardware synthesis from TSRs, such descriptions can be translated into a standard hardware description language like Verilog. By restricting the generated Verilog to be structural, commercial tools can be used to go all the way down to gates and layout.

The terms' grammar, when augmented with details such as instruction formats and sizes of various register files, buffers, memories, and so on, precisely specifies the state elements.

Each rule is then compiled such that the state is read in the beginning of the clock cycle and updated at the end of the clock cycle. This single-cycle implementation methodology automatically enforces the atomicity constraint of each rule.

All the enabled rules fire in parallel unless some modify the same state element. In case of such a conflict, one of the conflicting rules is selected to fire on the basis of some policy.

An example of the power that the use of TSR atomicity brings to hardware synthesis is illustrated in Figure 2, in which Euclid's algorithm for computing the greatest common divisor (GCD) of two numbers is expressed in TSR notation. From this, the Term Rewriting Architectural Compiler generates a Verilog description of the circuit shown in Figure 3.

View the full-size image

Atomicity and generating hardware/software
How do atomic transactions such as those employed in TRSs contrast with other approaches used in software and hardware, such as with C/SystemC, Verilog, and VHDL? Contrasting these approaches with atomicity highlights just how low-level, manually intensive, and fragile these approaches are.

In all of these other approaches, coordinating and managing the access to shared resources, which includes both arbitration and control, must be done explicitly and from scratch by the engineer in every instance. This makes these approaches:

Tedious, due to the ad hoc , low-level nature of managing concurrency.

Error prone, subject to race conditions, interface protocol errors, mistimed data sampling (for hardware), deadlocks, among others.

Brittle to changes, requiring the control to be redesigned each time there is a small change in the specification or implementation.

Another issue with these approaches is that shared resource management is non-modular in nature, extending across module boundaries throughout system designs. Even if you ensure that individual modules operate properly from an atomic operation standpoint, you cannot use them as black boxes as you compose larger systems with them. Atomicity management requires control logic, and control logic is inherently non-modular. As you compose larger and larger systems, threads, events, always blocks and processes do not scale.

Atomic transactions, in contrast, dramatically raise the level of abstraction by automatically managing the complex aspects of concurrency: shared resource arbitration and control. And, atomic transactions can be easily composed at a system level, allowing modules to be integrated as black boxes, without having to consider the impact of inter-module interactions on the internal resources of the modules. With atomic transactions, module-level testing can leveraged at the system level, without having to reverify that interface protocols at every interface point are properly being handled and remaining within required boundaries.

Implications for embedded
As a much higher level of abstraction for concurrency, atomic transactions are becoming available in areas that previously have not benefited from their power. While widely available as a database tool, how are atomic transactions likely to surface for those writing embedded systems software, working with processors, and designing hardware?

Programming for multicore and multithreaded processors requires conscious, low-level synchronization of access to shared memory resources. In order to make software development more efficient for these architectures, this burden needs to be removed. The burden for coordinating access to shared resources needs to shift from the engineer to the programming languages, operating systems, compilers, and processors.

Expect to see atomic transactions as an additional concurrency tool in software languages, supplementing the roles currently played by lower level mechanisms like semaphores, events, and locks. According to “M'soft: Parallel programming model 10 years off” (Rick Merritt (EE Times , 7/23/07, available at www.embedded.com/201200461), Microsoft is planning to build atomic transactions into C# and is already building software transactions into Haskell. Burton Smith of Microsoft was quoted in this article, “I think we ultimately will see atomic transactions in most, if not all, languages. That's a bit of a guess, but I think it's a good bet.”

Also, expect to hear about “transactional memory.” It's already getting a lot of attention as a key mechanism in architecture circles for ensuring atomicity with multicore, multithreaded architectures. A transactional memory mechanism enables a series of shared memory reads and/or writes to occur atomically. If all memory accesses cannot complete, the transaction may be fully aborted and retried later.

Transactional memory allows a software programmer to think about each transaction as a single-threaded operation, leaving the details of shared resource management to the underlying software and hardware. While transactional memory solutions have been implemented in software, the performance overhead for maintaining transaction logs argues for hardware support.

For this reason, a lot of research is focusing on the right kinds of hardware support to build into processors. Hardware mechanisms in commercial products shouldn't require a long wait. According to Sun, support for transactional memory is “imminent.” (At ISSCC recently, Sun outlined details of their Rock processor with transactional memory.) For those working directly with processors, writing software and debugging applications, expect to encounter transactional memory and associated language, compiler, run-time, and (possibly) operating-system support.

Implications for verification
There are several ways in which atomic transactions are likely to affect verification: first, as a hardware and/or software component, such as transactional memory, that must be verified; second, as an additional tool in the verification toolbox; and, finally, as a device-under-test (DUT) that has been designed with atomic transactions.

Verifying atomic-transaction mechanisms: As transactional memory and other mechanisms become more prevalent, more people will be involved in the verification of multicore-, multiprocessor-, multithread-based designs that use atomic transactions.

Doing so will require a deep understanding of the atomic transaction mechanisms and their potential failure modes. Test scenarios for these types of designs are complex and numerous, as they involve system-level hardware and software with many subtle corner cases. Fortunately, while these mechanisms may be more complex to verify, they dramatically reduce the complexity for software teams that are end-users of the devices in which these mechanisms live.

Using atomic transactions as a new verification tool: Although verifying atomic-transaction mechanisms is akin to handling yet another complex piece of IP, its more interesting to explore what atomic transactions can deliver as a verification tool.

Managing complex concurrency is an issue for hardware verification engineers as well–and will soon be a much broader one for those verifying software. Atomic transactions can simplify the complexity of concurrency and consequently accelerate verification efforts and reduce bugs in testbenches and models developed by verification teams.

When validating concurrent designs, test-case stimulus generation can get complex, involving contortions and a lot of complex control to induce desired conditions. Atomic transactions simplify the specification of the requisite conditions under which a test case should be performed–and simplify the generation of the proper stimulus. Atomic transactions allow each test case to be written succinctly and separately.

Other verification activities that can be challenging, especially when concurrency comes into play or cycle accuracy is required, are creating golden reference models, system models, and verification models. Designing these types of models typically takes much longer than desired or required and often involves much more debugging than acceptable.

With its much simpler concurrency model, atomic transactions accelerate the development of these types of models and dramatically reduce the bugs. And, the best thing of all, these models and testbenches can be synthesized into efficient RTL for use in emulators and FPGAs. Imagine being able to develop a golden reference model quickly and run it in hardware against a synthesizable testbench at orders of magnitude faster than simulation.

DUT designed with atomic transactions: When verification teams are the downstream beneficiaries of designs built using atomic transactions instead of traditional, lower-level concurrency mechanisms, the verification teams will adapt their methods and experience improvements in the verification process thanks to the design's stronger interface semantics and simpler concurrency model.

I'll explain the methods and improvements in part two of this article, posted online at Embedded.com.

George Harper is vice president of marketing at Bluespec and has more than 15 years of experience in the semiconductor industry. He holds a BSEE and MSEE in electrical engineering from Stanford University and an MBA from Harvard University. He can be reached at gharper@bluespec.com.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.