Achieving first day multicore SoC software success -

Achieving first day multicore SoC software success


The past few years have seen a dramatic shift in how companies design and market their system-on-chip (SoC) offerings.  Designs that used to contain large amounts of homegrown or internal intellectual property (IP) are becoming increasingly reliant upon pre-built blocks from third-party suppliers.  It’s not uncommon for a new SoC design to contain 80-90% of its content from outside suppliers.

This shift is forcing companies to rethink how their products are differentiated from competitive offerings.  When the vast majority of the design is based upon components that can be used by anyone, how do you make your product stand out?

In response to this trend, companies are seeking differentiation in a few key areas, including architectural configuration, software features and time to market. 

Architectural configuration is the process of assembling the SoC IP from disparate sources in such a way that the configuration options and overall layout maximize the designer’s goals that can range from faster throughput to lower power consumption to overall cost optimization. 

Software features are probably the number one area of differentiation as the capabilities of the underlying hardware can only be exploited with properly written software at every level of abstraction. 

Time to market is self-explanatory, the faster a product comes to market the easier time it with typically have gaining critical market share.

As a result of this shift, the design process from architectural analysis through software debug and hardware verification needs to take account of this new reality.  Since there is a great deal that can be written about each of the three topics mentioned above, the remainder of this article will focus more on the second and third.  It will show how design teams are achieving differentiation using software features and accelerating time to market.

Software-Driven Differentiation

The challenge is to ensure that when the SoC prototype returns from the fab it will achieve first day software success and run immediately.  Otherwise, one of two problems is created.  

First, there may be errors in the silicon not detected previously which require a respin of the chip, creating additional delay and driving up cost.  Secondly, an extended software development and integration period is required, once again increasing cost and delaying time to market.

In order to develop and test the software in parallel with the SoC development, a model of the actual chip must be used because, by definition, the chip is not available yet.  If the RTL code of the chip is complete, then there are various approaches to model the chip using field programmable gate arrays (FPGAs) or hardware emulation. 

A better approach that can be used much earlier is to create a software model of the chip.  This model is known as a virtual platform.  Virtual platforms have existed for years but have not been as widely deployed as their benefits would seem to indicate due to the difficulties of creating models. 

An integrated model creation ecosystem avoids these pitfalls and makes virtual platforms a clear choice for modern electronic systems where the software and hardware can only be validated together.

An integrated model creation ecosystem has a few vital components:  a process for obtaining the accurate models required by the architecture and firmware teams; a method for tying in the high speed models in the system to enable the software team; an approach to tie these models together into a single platform to avoid redundant development efforts; and, finally, some capability –– preferably web-based –– to manage all of these internal and external models at various levels of abstraction. 

Virtual Platforms

A virtual platform (Figure 1 below ) is a software model of the hardware of a system with enough fidelity that it can run the software load of the system including, in particular, bringing up the operating system(s) being used.


Figure 1. Virtual platforms allow developers to operate in a unified design environment that allows seamless movement between the various stages of development: architectural analysis, firmware development and software development.

In a modern system-on-chip or electronic system, there really isn’t any way of verifying the hardware other than running the full software load.  Although each of the individual system components has, hopefully, been individually verified, the only true way to test the system is to execute all of the components together with the actual system software.  This is especially true for low-level software, such as device drivers and power management.  Software and hardware must be co-verified.

Historically, this has been a major problem because a virtual platform that is fast enough for software developers to do their job is not accurate enough to be useful for debugging the hardware and a cycle-accurate model that is has enough fidelity to be useful to hardware developers is too slow for software developers:  they need to be able to boot the operating system in seconds not hours or days.

Another recurring problem with virtual platform modeling has been the cost and delay of creating models.  Well-funded teams sometimes build two complete sets of models. 

One set of models would be accurate enough to be useful to hardware designers while still being fast enough to be able to run at least some of the software.  A second set of models would be fast enough to be used by the software team but of no use to hardware developers since the signals they need to do their job are simply not modeled.

Counting the RTL code as a third, there are now three complete and unrelated sets of models hitting different speed-accuracy tradeoffs.  Since it is time-consuming, tedious and sometimes impossible to validate all of these models against each other, this seemingly vital task is often ignored.  As a result, software developed on many virtual platforms is never validated on real hardware until the final silicon arrives in the lab, greatly endangering the possibility of first day software success.

Performance and accuracy

Let’s look at performance and accuracy, two interrelated issues.

There is an unavoidable tradeoff between performance and accuracy; design teams give up one to get the other.  But models fast enough for application software development need to be many orders of magnitude faster than RTL and there is no way to get that sort of speedup automatically.  

Just as a designer cannot get from a Spice model to an RTL model by simply removing unnecessary detail, he or she can’t get from an RTL model to a virtual platform behavioral model fast enough to execute application software by simply removing unnecessary detail.

Trying to create a model with both speed and accuracy seems to be the worst of both worlds.  The model either has insufficient accuracy to be used for verifying the interaction of low-level software with the chip (in order to get higher performance) or else, if it has that accuracy, it will be too slow for software developers.

A better approach is to accept this and create both a high-speed model for software development and a 100% accurate model for hardware and firmware debug.

The 100% accurate model can be created automatically from the RTL code.  An integrated ecosystem takes RTL models that are accurate by definition and delivers speedups by optimizing away low-level timing details to produce a cycle-accurate model. This guarantees the fidelity of the model to the actual chip.

Since these models are created directly from the RTL code, they avoid the problems that inevitably arise when the behavior of the accurate model differs from the behavior of the RTL code.  While they are accurate enough for hardware development, they are not fast enough for either software development or for booting up the system to get to a point at which it makes sense to examine the hardware in detail.  For that, high-speed models are still required.

H igh-speed models need to be created by hand.  Oten, performance gain comes from changing the modeling approach.  Curiously, one result of this is that hardware designers often make poor modelers since they try to model the hardware the way it actually works.

As an example, consider a counter that counts down and interrupts when it gets to zero.  The actual hardware will have a decremeter containing a register that gets clocked on each clock cycle.  However, if the counter is modeled this way, the virtual platform will consume most of its compute resources clocking this register and others like it.

The correct way to model the counter is to work out when, in the future, the device will interrupt and scheduling that with the underlying time management of the virtual platform and then ignore the counter until then.  In the meantime, if the software accesses the register to read its value, the model will need to calculate the value that should be there based on how many clock cycles have passed.  

This is a simple model, but the trick of creating all such models is to keep the model as inactive as possible, only waking up the code when something absolutely essential happens.

High-speed peripheral models, as already explained, are built by hand.  But as more and more of an SoC or electronic system consists of IP blocks, then more and more of the high-speed models already exist.  Companies such as ARM and MIPS create high-speed models for their processors and standard peripherals, and using the integrated ecosystem they also create cycle-accurate models, pre-qualified to work correctly in a specific virtual platform.  A web-based model portal makes these models easily accessible for quick creation of virtual platforms.

High-speed models give software developers what they need, but there is still one missing capability required to make the portfolio of models useful to hardware developers:  the ability to switch between high-speed models and cycle-accurate models.  Accuracy when you need it, performance when you don’t.

A technology known as Swap’n’play performs this switch.  This gives designers a way to use high-speed models to boot the operating system and run whatever software is necessary to get the system into a state where the hardware designer needs to delve into the details.  At this point the virtual platform is checkpointed, the state of the high-speed model is extracted and it is injected into the cycle-accurate model.  The virtual platform continues in cycle-accurate mode and all the signals of interest can be examined.

The Use of Virtual Platforms

Virtual platforms, as a model of the system, can be used in many different ways as the design progresses from architecture down to the detailed hardware and software verification.

Early in the design, they can be used for analysis of architectural tradeoffs and IP selection.  Precisely what type of architectural exploration can be done early depends on the details of what parts of the hardware and software design are being reused from previous versions.  Potentially, cache architecture, throughput, latency, arbitration, hardware/software partitioning and other factors that have an impact on performance and power can be analyzed since the platform can easily be reconfigured and re-run. 

Most of this can only be done at the cycle level.  Performance during boot sequences is usually not of interest, so normally the optimal approach is boot the system fast and then leverage Swap ‘n Play to use cycle-accurate models to make accurate measurements.

Low-level firmware, such as device drivers and power-management, needs to interface to the hardware.  This code typically is reading and writing device registers either directly or indirectly.  Some driver development can be done with high-speed models, but to check operation completely, especially for more complex devices with complicated timing, it must be done with cycle-accurate models.

Again, the best approach is to bring up the system using high-speed models with a break point set where the device driver is first initialized, and then switch to cycle-accurate models to run against the actual driver code.  Using this approach, most device drivers can be created and debugged in parallel.  A virtual platform should be fully integrated with software debuggers and other visualization tools, such as ARM’s Realview.

The further software gets from the hardware, the less direct interaction there is between the two.  However, for all except the most device-independent software, there is still a need to run the software in an environment that provides inputs and makes output visible.  A virtual platform, with as much as possible using high-speed models, provides this substrate.  A big advantage of a virtual platform over real hardware is that when the system hits a breakpoint, the whole system freezes.

For example, when a chip for a router hits a breakpoint in the real world, the packets keep on coming. With a virtual platform the system freezes, can be analyzed and then continued from the breakpoint.

Testing software is another area where a virtual platform offers advantages.  With real hardware, once it is available, it is relatively straightforward to test “normal” operation of the SoC.  However, it can be difficult to test error conditions with real hardware, requiring expensive custom test equipment.  An example would be to generate noisy transmission to a cell phone.  With a virtual platform, peripheral models can generate as much data as needed that is faulty or where the timing stresses the design. Even once the SoC is available, the virtual platform approach to software testing can be much more exhaustive.

Developing software alongside the SoC, and using one to validate the other, results in schedule savings compared to the old approach of developing hardware first, hoping for the best, and then developing software using the hardware as the execution engine.  Even if the design is lucky enough not to have hardware errors that are only uncovered during software development, serializing hardware and software development results in a much longer schedule.


Multicore designs, using symmetrical multi-processing (SMP), present a whole new set of design problems that are ideally suited to the virtual platform approach.  A multicore system is especially hard to debug since with real hardware it is almost impossible to repeat a run; multicore designs are typically not deterministic due to subtle timing variations.  This means that a problem that shows up on a run will not necessarily show up if it is run again.  

A virtual platform can be completely deterministic.  If an error shows up then re-running will produce that error again so that it can be investigated.

In multicore designs, even with the traditional workhorse of debugging, the breakpoint is problematic.  Some multicore processors have the concept of a “limited skid breakpoint” whereby when one processor hits a breakpoint the others stop not long afterwards.  In others, a breakpoint on one processor does not halt the other processors.  However, with virtual platforms, the entire system freezes completely when a breakpoint is hit, making it is easy to investigate what is going on and then step through or continue the run.

An integrated ecosystem makes it easy to create or download models, the building blocks of virtual platforms.  It allows design teams to combine models into a platform that can initially be run fast and then switched to full accuracy to make precise measurements or to investigate low-level issues.  It delivers all the advantages of virtual platforms that have, until now, been elusive.

Bill Neifert , chief technology officer cofounder of Carbon Design Systems , responsible for development the company’s integrated virtual platform ecosystem consisting of SoC Designer, Model Studio , Swap’n’Play and IP Exchange . He has 13 years of electronics engineering experience with more than 10 years in EDA.  Neifert has a BS and MS in Computer Engineering from Boston University. He can be contacted at

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.