Houston, Texas – After three or so months of not-so-subtle promotion of its new DaVinci digital video system-on-chip platform, Texas Instruments today introduced the first two devices in its new TMS320DM644X SoC RISC/DSP family.
The opening hand in a bid to capture a big piece of a market estimated to grow to almost $500 million by 2008 includes the TMS320DM6443 and the TMS320DM6446 and several still-to-be-announced variants.
Offering up to 1080i MPEG-2 decode and up to 720p MPEG-4 simple profile encode, the DM644x devices are based on an SoC platform consisting of a TMS320C64x+ DSP core, an ARM926 processor, dedicated video accelerators, networking peripherals and external memory/storage interfaces all specifically tuned for video performance (see Figure 1, below).
|Figure 1 (Source: TI)|
The TMS320DM6443, tuned for video decode applications, provides all of the processing components required to decode digital video, including both analog and digital video output with integrated resizer and on-screen display (OSD)engines. The TMS320DM6446, tuned for video encode and decode applications, adds video encoding capabilities through a dedicated video processing front end capable of capturing various digital video formats.
“If we do this right, we think the DaVinci platform will capture significant pieces of the business in the various segments of the digital video market,” Greg Mar, DSP SoC platform manager, TI, “with specific variant subsets and supersets of the DaVinci architecture targeting home security, A/V, video phones, IP TV, DVD recorders, digital TV, personal video recorders, video conferencing, portable video games, DVD players and digital cameras.”
Supporting the sophisticated hardware is software development framework that Mar believes will require much less technical expertise in order to design and field products quickly, built around application programming interfaces (APIs), frameworks and development tools, all optimized for digital video systems.
DaVinci’s technology underpinnings
Underlying the new DaVinci platform are key technology advances in at least four critical areas, some deployed first on this Soc architecture and some taken out for test runs on earlier products over the past year or so. On the hardware side this included (1) a new switched fabric serial bus for use in moving data to and from the processors the peripheral DMA and between off chip peripherals; (2) a cross-point switched central resource (SCR) for moving data among the ARM, DSP and video processing cores; and (3) a redesign of the company’s Enhanced direct memory access block (EDMA) used in earlier DSP and OMAP platforms to handle the data to and from media peripheral blocks.
On the software side, to aid developers develop code in this multicore, TI has incorporated an extension of an interprocessor communications (IPC) used on earlier multicore DSP and OMAP devices and developed a multimedia code abstraction layer atop in to free developers from needing detailed knowledge of the underlying hardware specifics.
Leaving the shared AMBA bus behind
While not the first company to do so, the company has left behind many of the original ARM bus architecture elements in its DaVinci platform. For delivery of multimedia-heavy data flows between the ARM, DSP and Video processing subsystems, the shared ARM high speed bus (AHB) has been replaced by the SCR, a 400 MHz four channel 64-bit wide cross-point switch capable of an aggregate data rate of 4.8 Gbytes/sec.
“In this configuration the SCR allows the DSP core to do a video algorithm dump to the DDR RAM and concurrently do an audio transfer from the DSP to the RISC or some external resource, without delays,” said J.B. Fowler, DaVinci video system application engineer at TI.
Because of the media oriented nature of many of the DaVinci’s peripheral functions, the ARM core’s internal shared peripheral configuration bus (APB) has been extended by the VYLNQ switched fabric serial bus. Originally developed by TI’s broadband group as a board level North/South bridge processor bus, the switched fabric structure allows addition of external peripheral functions with minimal latencies due to bus contention. “It allows a developer to program the peripheral without knowledge of whether it is on chip or off,” he said.
With four transmit links and four receive links, VYLNQ has a theoretical upper bandwidth of about 75 Mbytes/second on a 75 MHz clock. The company’s designers, said Fowler, have also come up with some mechanisms that allow prioritization of traffic using a round robin queuing structure with some degree of programmer control, which further reduces any contention for access by the peripherals to the internal data bus structure.
Further enhancements to EDMA
Given the bandwidth of the SCR and the clock rate of the SoC core elements, he said, most applications that TI expects to find a home for on DaVinci will have more than enough performance headroom.
But to further offload the DaVinci DSP and RISC cores, TI engineers have made further improvements to the Enhanced DMA architecture brought over from previous DSP and OMAP devices and added some tweaks, said Fowler. “In EDMA 3.0, we have added a much greater degree of programmability,” he said. “Also, in this version we can do a number of operations particularly important to the market segments we are targeting, such as 3-D transfers, data transposition and sorting.
“We could do such operations in earlier versions but it required a greater degree of processor participation and programmer direction. On this platform, in a video application where you want to rotate the image in real time, this can be done without any additional resources from either the DSP or RISC cores. ”
Going its own way on Inter-Processor Communications
Rather than go with one of the several IPC mechanisms available commercially from companies such as Enea, Polycore or QNX, or build on the preliminary work being done on a hoped for standard based on TIPC, in this platform, said Mar, the company has extended its own proprietary IPC called DSPBios Link/Bridge.
Similar to the IPC mechanisms incorporated into some RTOSes and provided by DSP vendors, such as Analog Devices to support multicore DSP and DSP/RISC designs. While it would have preferred to use an industry standard mechanism, he said, TI developed its own because “there was not anything around when we needed it.”
Despite efforts within Eclipse.org in relation to TIPC and the new Multicore Forum, he does not foresee TI replacing its’ proprietary mechanism until something adequate to the needs of new multicore designs emerges. “We hate to reinvent the wheel when there is a standard spec around that could do the job,” he said. “But nothing suitable is available yet.”
The IPC, said Fowler, serves two purposes, one as a mechanism for coordination of the programming tasks involved in coding a multicore design and another as the framework upon which the company’s software designers built the multimedia code abstraction layer.
“The BiosLink allows the programmer to work with a common set of function calls whether or not he needs to code the RISC engine or the DSP,” he said. “Originally designed to provide a common programming interface in a mulhticip design that included a TI DSP chip and a companion RISC processor, the DSP BiosLink/Bridge has been modified and extended to work more effectively in a multicore design.”
Providing a transparent link between the DSP and the RISC core, it allows a developer to program to either through the agency and programming model of just one of the cores. “The benefit of that is that a developer can build an application on the ARM core using standard tools and OSes program either control or DSP operations without knowledge of the DSP even being there,” said Fowler. “It allows a developer to use a standard RTOS and write code, do function calls transparently of whether they are executed on the RISC or DSP. The basic mechanism for communications is the use of a simple message queue structure, nothing fancy. But it does the job.”
Rather than a more complex IPC structure, TI developers stuck with this simple IPC mechanism because it did not impact the overall latency of the system, critical in many multimedia rich applications. “We could have gone with something more sophisticated but those IPCs would have been taking us into unknown territory,” he said. “We had used this mechanism on previous DSP offerings and it was well understood and fast, written in the architectures assembly language, rather than in C. Optimizing the assembly code gave us the ability to jump very quickly in and out of routines quickly.”
As the foundation of the codec abstraction layer API framework this mechanism also masks the complex hardware and software details of implementing codecs from developers, enabling them to interchange multimedia codecs without having to modify application code.
“Our aim is to bring as much hardware capability into the devices as is possible and at the same time create an environment in which adding digital video to an application becomes as simple as writing to an API,” Mar. “As important as hardware capabilities it does you no good if it takes forever to get a design to market. We think we have come up with the answer.”
For more Technical Insights about such multicore design go to More About Multicores and Multiprocessing.