Stockholm, Sweden – Intent on enhancing wireless multimedia and modem processing, StarCore LLC has equipped the latest version of its digital signal processor architecture with 25 new instructions, a partially interlocking pipeline, memory protection and an extended interrupt scheme.
The combined improvements pave the way for more compact code, more efficient video codec processing and more peripherals support in system-on-chip designs, according to StarCore.
Announced at the recent Embedded Processor Forum 2004 in San Jose, Calif., StarCore's V4 architecture will be followed up by the SC2000 core, with a six-stage pipeline and dynamic branch prediction. While wireless is the headliner, the new architecture and core also target converged consumer devices, wireline communications, voice processing and general-purpose DSP applications.
With plans afoot to announce its V5 architecture later this year and the corresponding SC3000 core in early 2005, StarCore will roll out three architecture families in less than 2-1/2 years, said Alex Bedarida, vice president of marketing at the Austin, Texas, company.
StarCore was spun out 18 months ago from what had been a collaboration between Agere and Motorola. The architecture developed there was turned into a synthesizable core that was announced as a product in October. The core came in two versions: the two multiply-accumulate (MAC) SC1200 and the four-MAC SC1400. “Since then, things have been going at a rapid pace,” said Bedarida.
The architecture on which the SC1000 is based is the V2. The V1 and V3 are Freescale (formerly Motorola Semiconductor)-only versions that were either done or in the pipeline before StarCore was formed. But all of the versions, including the V4, are backward-binary-compatible, said Bedarida.
The general architecture's signal-processing and control-code-processing capabilities let it perform the functions of both a DSP and a microcontroller, Bedarida said. Other aspects include a variable-length execution set (VLES) of 16 to 128 bits, with six issues per cycle; a very long instruction word architecture with a five-stage pipeline; and compiler “friendliness.”
“Being compiler-friendly is important since we're not just talking about single loops [as in a DSP-like processing function]; we're running protocol stacks and other functions that have hundreds of thousands of lines of code for handsets or infrastructure,” Bedarida said. “You can go from pure control-type code to heavy-duty DSP code with code density comparable to that of an ARM.”
With the V4, the goal was to further the code density and make it easier to compile while also lowering power consumption and enhancing video processing. The company came up with 25 new instructions, some of which perform video codec processing in software.
For example, said Bedarida, enabling up to four sum-of-absolute differences (SAD4) instructions per cycle lets motion estimation be accelerated up to 13x.
The core also introduces partial pipeline interlocking, which Bedarida said reduces the number of coding restrictions, making for more efficient compiling and assembly-level programming. The addition of privilege mode and memory protection enhances operating system support.
The SC2000 core that derives from the V4 architecture takes the pipeline from five to six stages by splitting the “dispatch and decode” stage into two stages. Bedarida said that the relaxed timing requirements allow for higher-frequency implementations. The inclusion of a dynamic branch-prediction mechanism reduces latency in nonloop change-of-flow situations, while stack pointer precalculation cuts the cycle count of instructions that use indexed access to the stack.
SC1000 code requires up to 17 percent fewer cycles on the SC2000, said Bedarida. The new core is available now.
(This story has also been posted at EETimes.com)