Designing a low-cost, low-power multicore ARM-based AV player
Low power system design has become a mandatory requirement not only for hand-held mobile devices but also for automotive infotainment systems. Furthermore, automotive systems need to be able to carry the user experience in the home and office over to the car. This increasing demand for high computation, high quality and low power solutions has forced embedded computing to turn to multicore systems. Consequently, embedded solutions based on multicore platforms have become common for many applications such as gaming, video, and image-processing in areas such as mobile, automotive, medical and industrial applications.
The challenge is to utilize multicore based system with available media frameworks that offer scalability of open source access and portability to achieve requirements for low-power, high-performance embedded applications. With the advent of powerful software and hardware programmable system-on-chip (SoC) FPGA devices, embedded system designers are able to design solutions to an exact form, function and performance fit for the requirements of the customer. These solutions are optimal, efficient, cost-effective addressing end customer requirements. The Xilinx Zynq SoC family of FPGAs is such a device.
This article describes the use of a low-cost, low-density Zynq FPGA in creating a computational platform for implementing infotainment systems for passenger vehicles, such as cars, buses, trains, airplanes and ships. Other applications for this kind of platform include digital signage and information displays in private and public venues such as hotels, hospitals, gas pumps, or kiosks as well as digital picture frames for consumer markets.
In-vehicle infotainment experiences face a dual requirement of matching the home or office user experience while meeting energy efficient requirement for automotive industry. In our case, the specific requirement was to build very low cost, low power, 720p30fps AV player solution with Video-Audio sync functionality able to interface with the customer’s hardware block implemented on programmable fabric. This objective was to be achieved with development of tightly coupled multicore software with real-time acceleration through hardware with an eye on vastly lower BOM, lower NRE costs, lower design risk and most important, faster time-to-market.
We proceeded to break down the requirements for the Atria Logic AL-AVPLR-IPC AV player to consist of a file reader, a de-multiplexer, an H.264 Baseline Profile HD decoder with color space converter, and an AAC-LC stereo decoder. Also included is an AV player application with build-in GUI for basic player operation, such as Play, Pause, Stop and Fast-Forward trick mode. OS support is via Ubuntu LTS as it provides multicore usages with very efficient core utilization factor. The AV player application is fully Linux GTK based, while the decoder itself is fully implemented in the Linux GStreamer open source multimedia framework. A block diagram of the player is shown below in Figure 1.
click for larger image
Figure 1: Atria Logic AV Player block diagram (Source: Atria Logic Inc)
We zeroed in on the Xilinx Zynq-7000 device family’s Z-7010 as the most suitable option for this implementation meeting all the requirements. This ARM powered programmable SoC provides maximum CPU performance with the best thermal performance. At the same time, this device provides enough programmable fabric for those requirements that require accelerator engines able to fit into the logic arrays, ensuring sufficient performance even for this low power platform. One more reason for us to go with this device is the readily available, small form factor Zybo development board which expedited our development efforts for quick turn around.
In addition to programmable logic for custom implementation of glue logic, RAM and DSP functions, the Zynq architecture includes a dual core ARM Cortex-A9 CPU with Neon DSP engines, a complete array of serial I/O, USB and PCIe interfaces, encryption/decryption engine, GbE and memory controllers. Low power, integrated CAN bus interfaces and extended temperature variants make the Zynq family of fully HW and SW programmable SoCs an excellent fit for low power, low cost automotive infotainment applications.
Video decoding is handled up to HD resolutions of 1280x720 at 30 frames/sec, and stereo audio decoding at 48KHz. Audio and video are kept in perfect sync so that lip sync issues are avoided.
The challenge was to leave as much as possible programmable logic available for implementation of other functionality, while keeping power dissipation low. This meant that the implementation needed to take full advantage of the available ARM cores and Neon DSPs, while optimizing the firmware to run as efficient as possible. This was achieved by taking full advantage of the multi-threading and symmetric multi-processing (SMP) capabilities in Ubuntu.
Multi-Threaded Dual Core Player
The player specifies audio and video as part of pipelines running on separate threads for parallel execution on the two ARM Cortex-A9 cores. The AAC-LC audio decoder load is much smaller than the H.264 video decoder. Therefore, the video decoder is divided into two threads to take full advantage of both cores, which are running at 667MHz each. In Figure 2, Video Thread2 spawns a new thread, Thread4 and these two video threads are executed on two different cores.
click for larger image
Figure 2: Multi-threaded video and audio decode execution (Source: Atria Logic Inc)
The AV player GUI application program starts on single thread, Thread1. Compressed audio data and video data are queued separately in Thread2 and Thread3, maintaining the stack overflow limits. De-multiplexing the audio and the video data in these two threads decouple the processing of the sink data in separate audio and video processing threads.
Continue reading on page two >>