The future of asynchronous logic design is looking a little bitbrighter. As the semiconductor industry struggles with mountingproblems trying to achieve significant yields, higher performances andlower power without significant increases in fabrication costs,developers are turning to asynchronous alternatives to solve theseproblems.
Asynchronous, or clockless, logic as a mainstream circuit logicalternative could be critical in many embedded designs in consumerelectronics and mobile devices.
Several things have occurred to make this alternative more viable.First, companies active in developing asynchronous logic are shiftingfrom selling a particular IP approach to becoming fabless IC companies,using their logic expertise to address segments of the marketsynchronous logic is having a hard time satisfying.
Second, the numerous variations ” and names ” of asynchronous logicare settling out to three or four ” optimized for specific segments ofthe market. A third trend is the increasing use, even amongst thelargest semiconductor companies, of asynchronous techniques to achievethe performance, power, and cost objectives the market demands.
And finally, efforts at universities and within the sameasynchronous companies are increasingly focused on developing EDA toolsand design flows that can be integrated into the custom and semi-custommethods now used by the industry for synchronous design.
Traditional synchronous designlimits
Traditionally, most circuit designs in the mainstream are built withsynchronous logic, small blocks of combinatorial logic separated bysynchronously clocked registers. The biggest advantage of this approachis that that synchronous logic makes it easy to determine the maximumoperating frequency of a design by finding and calculating the longestdelay path between registers in a circuit.
But as devices move into the 90 nanometer range and below, it is itis becoming extraordinarily difficult to find and predict the criticalpath delays and to achieve the all-important timing closures. And asprocess technology works down to 45 nm and below, things only getworse, with shot noise, charge sharing, thermal effects, supply voltagenoise and process variations all making calculations of delay moreuncertain and difficult.
Because synchronous logic designs are always on, balancing power andperformance becomes critical as integration levels increase. 'Simple'power consumption and power dissipation issues are not the onlyproblem. In many millions of transistor SoC designs there are largeclock current surges necessary, which tax a circuit's powerdistribution nets as well as the thermal stability of the circuit.There is also the growing inability to control noise and metalintegrity. And in some of today's system-on-chip designs with millionsof gates, the job of maintaining the global clocking across the area ofa chip is becoming problematic.
While synchronous logic designers have been extraordinarilysuccessful at squeezing every bit of performance out of their designsand at finding work-arounds to the myriad of design problems facingthem in the nanometer range, it is becoming more expensive and it istaking longer to develop designs.
The advantages of clockless logic
At 90nm and below, asynchronous logic may be able to take advantage ofthe increasing process volatility. “Unlike synchronous designs wheredevelopers have to assume worst case values, asynchronous logic workswith the average values and the average process,” said Peter Beerel,Associate Professor, Electrical Engineering-Systems Department at theUniversity of SouthernCalifornia, who heads the asynchronous logicresearch efforts there. “This is an enormous advantage in the face ofthe many process variations that must be dealt with in the currentgeneration of 90nm and below designs. At 90nm, that can mean as much asa two fold improvement in performance.
“As every move down in geometries occurs, the problems get greater.The industry has been able to solve those problems, but it has not beencheap. Each year, as synchronous designs become more difficult to doand it becomes more expensive, asynchronous becomes more and moreattractive. And each year the tools and methodologies the asynchronouscommunity is developing get better.”
Unlike the familiar design that synchronous methodologies use,asynchronous circuits (also called self-timed, locally clocked,clockless and a number of other names descriptive of the differentapproaches) remove the need for a global synchronizing clock. Instead,the process of computation is controlled though local clocks and localhandshaking and handoff between adjacent units. What this means forhigh performance and power-efficient design is that such local controlpermits resources to be used only when they are necessary.
Although asynchronous designs usually require more transitions on acomputational path than synchronously designed CPUs, the transitionsonly occur in areas involved in the current computational task.Moreover, most forms of asynchronous logic are, to one degree oranother, delay-insensitive because of their clockless derivation,making designs in the 90 nanometer and below regime easier, at leasttheoretically, because they depend on average clock skews, not worstcase.
But, given the diversity of approaches, the difficulty inimplementing the designs because of the lack of appropriate andfamiliar design tools and flows, and the perceived lack of sufficientperformance improvements to justify the additional time and cost ittakes, asynchronous logic has not had a warm welcome within mostmainstream electronics companies.
From providing IP to targetingniches
Rather than trying to fight the institutional momentum behindsynchronous logic, a number of asynchronous logic-based companies “such as Achronix, Fulcrum, Handshake Solutions, Silistix, and Theseus -have stopped trying to sell their particular approaches as IP to anunconvinced design community. Instead they are using the specialadvantages their particular asynchronous design methods give them totarget specific market niches where this methodology will give them anedge.
Rajit Manohar, founder and chief technology officer of Achronix,said the company has already demonstrated a 650-plus megahertz FPGAbuilt with relatively conservative 180 nanometer design rules, evenlooser than conventional slower synchronous-based FPGAs.
Later this year the company will introduce an Ultra line of FPGAsfabricated with 90 nm CMOS that will operate at clock frequencies inthe 700 MHz to 1.2 GHz range, based on a synchronous interface, anasynchronous core and a set of software tools to convert synchronousdesign flows to asynchronous logic. It is targeting many applicationswhere ASICs have predominated and were previously inaccessible to FPGAmanufacturers.
Fulcrum is targetingnetworking processing applications with itsasynchronous logic technology and EDA tools, initially in a crossbarswitch it incorporated that into its PivotPoint SPI-4 switch chip, nowused by about 15 vendors, said Mike Zeile, vice president of marketing.The same crossbar core is also used in its most recent family of10-Gbit Ethernet switches — called FocalPoint — supporting up to 24ports.
The asynchronous paths allow the devices to offer full 10-Gbitspeed and 200-nanosecond total latency through the chip in aconservative, low-leakage, 130-nm process. The registers and Ethernetports are traditional, synchronous logic, while the sections thatdetermine the performance and power consumption – the crossbar and itsSRAM — are asynchronous.
Handshake Solutionshas focused on the low power advantages ofclockless logic and has been successful in the smart card market, wherethey have sold millions of units of eight bit asynchronous MCUs. Thecompany is taking advantage of the fact that its parent company Philipsis well positioned in automotive electronics, where it has worked withARM Ltd. on the introduction of a low-power, 32-bit ARM designed forthat market.
|ClocklessPipeline. . Using Handshake's clockless methodology, the pipelinewithin the just introduced ARM996HS core mirrors the normal ARMv5TEpipeline, with the exception that instead of a global synchronousclock, dedicated control logic ensures that each stage in the pipelineis enabled only when required. The pipeline handshakes with the systemcontroller to fetch instructions and to load and store data.|
Silistix hastargeted the on-chip interconnect segment of themarket, where existing globally synchronously-clocked shared-bustopologies are proving inadequate. According to David Fritz, vicepresident of marketing, it has developed a globally asynchronous,locally synchronous interconnect fabric it calls Chain, supported byEDA tools and libraries called Chainworks to allow designers togenerate asynchronous and delay-insensitive links between traditionalsynchronous logic blocks in existing designs. It supports multiplelocal bus protocols, including AHB, APB, and AXI as used by ARMHoldings plc, enabling existing IP blocks to be used withoutmodification.
Theseus Logic uses alow-power optimized asynchronous logicmethodology as part of its new product and service strategy aimed atdeveloping low cost, highly integrated mixed signal system-on-chipdevices for wireless sensor nodes.
Companies and engineers who want to look under the hood to see howthese performance and power consumption advances are achieved canexamine a handful that research and experience have shown to haveadvantages in particular segments of the market. Achronix and Fulcrum,for example, base their approaches on the quasi-delay insensitive (QDI)logic pioneered at the California Institute of Technology, and in theformer case, refined at Cornell University. A QDI circuit does not useany assumption of or knowledge of delays in operators and wires and arethe most conservative asynchronous circuits in terms of the use ofdelays. But they are also the most robust to variations in physicalparameters because the circuit's dependence on delays is minimal.
Handshake Solutions uses a combination of regular synchronous andasynchronous logic in a clockless variation it calls handshake logic,which it implements in several variations, as two-phase or four-phaseprotocols, and as single-rail or double-rail data encoding. In mostrecent designs it has opted for a four-phase single rail implementationbecause it does not require dedicated 'asynchronous' standard cells,allowing the use of standard EDA tools and reuse of standard data pathblocks.
Silistix's asynchronous approach is based on research at theUniversity of Manchester involving the use of self-timed packet-basednetworks to solve timing closure problems in complex system on chipdesigns. Theseus also uses a delay insensitive clockless logic, butbases it on a class of NULL Convention Logic (NCL) circuit techniquesdeveloped by the company's founder, Karl Fant, which integrates datatransformation and control into a single expression, thus yieldinginherently clockless, delay insensitive circuit.
Synchronous edges closer toasynchronous
On their side of the logic divide, traditional semiconductor companieshave not been blind to the perceived advantages of asynchronous logicand have maintained on-going research efforts, and in some cases theyhave come up with variations that are essentially clocked synchronouscircuits, but that owe a lot to asynchronous methodologies, includingmesochronous, plesio-synchronous and adiabatic logic.
Mesochronous involves logic designs in which various portions of anSoC design are not synched independently of the logic signals, butwhere the clock signals accompany the data. In plesio-synchronouslogic, multiple clock domains distributed throughout a chip share thesame clock, but with timing from a separate parallel clock signaldistribution system. In adiabatic logic, instead of supplying aconstant voltage to a chip and then clocking signals through, circuitshave a periodic, sinusoidal power system that activates logic gates asneeded.
Also included in such alternatives are techniques such as Intel'sself-resetting logic, which shares many characteristics withasynchronous logic, including the ability to use the same custom andsemi-custom EDA tools and design flows.
Building asynchronous logic withsynchronous tools
USC's Beerel said that clockless logic companies and researchers aremoving toward similar EDA tools and design flows, reflecting the sametrends that are occurring in synchronous logic design. Currently, thereare essentially two main approaches in mainstream circuit design: fullcustom design flows, which may use EDA tools but in which there is ahigh degree of so-called “handcrafting”, that is, optimizing the designfor performance and semi-custom, fully automated methods.
With the latter, using tools provided by firms such Artisan,Cadence, Magma, Mentor Graphics, Synopsis and Virage, the strict CADtool flow imposes limitations on what a circuit designer can do interms of optimizing performance, power, or some other importantparameter. To develop a repeatable and predictable design flow and aconsistent set and interface, restrictions are placed on the kinds ofcircuits that can be used and how they can be used.
As a result, where an Intel can build a full custom circuit with a2-3 GHz clock rate, users of the more mainstream, fully automated,semicustom approach must settle for 500 MHz in a typical design. “Butin exchange for that the developer gets a mature, well-oiled set of CADtools that allows the creation of chips with much less effort andexpense than the alternative,” said Beerel, “with a much fasterturnaround and with greater assurance of a working design at the end.The tools are also well supported and upgraded each year withincremental improvements in terms of power and performance. “
A similar dichotomy is emerging in the asynchronous community, butin reverse. Because they are focused on competing with synchronous,many of the leading clockless-based logic companies such as Achronix,Fulcrum and Silistix have developed custom design flows based on acombination of standard EDA tools and methodologies, but they in manycases depend primarily on careful handcrafting of the final design tooptimize their circuits for the targeted markets.
Theseus – which pioneered the use of standard EDA tools in thedesign of clockless circuits – and Handshake out of Philips have whatmay be the most automated clockless circuit design flows, with manytraditional EDA tools. The only question in the case of Handshake isits use of a C-like hardware description language for asynchronouslogic called HASTE, rather than the industry standard Verilog or VHDL.Despite its simplicity compared to Verilog and VHDL used with most EDAtools, HASTE is a non-standard approach and requires a learning curveto use, become proficient with it and use it effectively, said Beerel.
“While it is a serious hurdle to Handshake's acceptance, it can beovercome,” he said. “And they have been in this for quite a while andhave created a niche for themselves in the low power segment of themarket, in smart cards and low power 8-bit microcontrollers. They alsohave some penetration of the automotive market. The big question iswhether they can make HASTE as mainstream as VHDL or Verilog.”
The search for the holy grail
The holy grail of the clockless design community, whether at thecompanies or at universities such as Caltech, Cambridge University,Columbia University, Leeds University, USC, and University of Utah, isto create an EDA tool flow that is as close to the traditional one forsynchronous designs as possible. Same languages, simulation, andmodeling tools, with asynchronous rules inserted when they are the onlything that that can accomplish a specific circuit level task.
“If it is necessary to develop the tools and building blocks forevery aspect of a circuit design for asynchronous logic from scratch,”said Beerel, “there would not be any possibility of asynchronous logicmoving into the mainstream of integrated circuit design. What we areworking on here at USC is identifying those points in the design flowthat require tools that are unique to clockless/asynchronous logic andeverywhere else use traditional tools.”
Beerel and his coworkers have taken a two-phase asynchronoushandshake approach — developed by clockless pioneer Ivan Sutherland atSun Microsystems for use in control circuits – and extended it with aproposed single-track full-buffer circuit family for use in both thecontrol and data path.
“Using our single-track family, we have developed a prototypelibrary of cells in 0.25 micron technology and were able to build a600,000 transistor chip that achieved between 1.2 to 1.4 GHz measuredperformance,” he said. “The key point is that this chip was designedusing a push-button fully automated place-and-route design flow.” Theyare now working with Fulcrum on extending these fully automatedtechniques to commercial devices, under a joint research grant to USC.
“I think we are getting very close to a crossover point this yearwhere the industry will see applications and tool flows where the onlypractical solution will be asynchronous logic,” said Beerel. “We may beon the eve of a revolution. That may be my enthusiasm talking, but whatI see in the lab and on the production line tell me that there is atipping point in the near future.”