The key to realizing full multicore design functionality -

The key to realizing full multicore design functionality


In today’s increasingly complex and interconnected world, system-on-a-chip (SoC) performance requirements are influenced by existing, evolving and emerging applications.

Continued evolution of the functionality required to meet performance and cost targets makes this a great time for designers to undertake a deep exploration of the architectural underpinnings of multicore solutions they are considering. Ideally a multicore SoC architecture includes the following characteristics:

1)  Supports a mix of execution engine (core) styles including digital signal processors (DSPs), vector signal processing (VSP) and reduced instruction set computing (RISC);

2)  Provides full multicore entitlement thus using all the capabilities of the device for the intended application enabling industry leading performance;

3) Powers a family of devices to enable reuse;

4) Incorporates a software ecosystem that eases programming burdens and shortens development time.

This article reviews the architectural elements a SoC will need to provide for ideal characteristics of devices targeted at advanced communications infrastructure applications such as media servers and wireless baseband infrastructure.

Multicore, multilayer SoC architecture
A SoC is a concept where the basic approach is to integrate more and more functionality into a given device to the point where it performs nearly all or all the functions a targeted application requires. The SoC is embodied in the silicon device and the overall solution often incorporates substantial software.

Many SoC designs pair DSP cores with RISC cores to target specific application processing needs such as processing voice and transcoding in media gateways or radio channel and transport network processing in wireless infrastructure.

Traditionally performance improvements have come through process node migration and increasing clock frequencies. In today’s small geometry process nodes, the benefits of increasing clock frequencies and process node migration also result in an increasing cost to system power so the trade off analysis is more complex.

An alternate approach, where multiple processing cores are implemented to provide the desired performance lift at lower clock rates and lower power consumption while allowing all system parameters to be met, has emerged as the preferred choice for embedded applications based on multicore SoCs. In addition, application specific acceleration and coprocessors are incorporated to further increase capacity and reduce system power.

In this scenario, it’s important to provide parallel access to processing resources so that the full entitlement of the device can be realized. It is critical for the SoC architecture to provide capabilities within the chip infrastructure so the interconnect capacity yields full multicore entitlement.

The most straightforward approach to this is a large cross point matrix but this approach has power and cost penalties because at any point in time a large portion of the matrix is powered but not in use. A more sophisticated network-on-chip approach provides local capacity for processing elements closely associated and a common backbone to interconnect these localized functions.

Advancing Moore’s Law
The move to more advanced process nodes has been a key driver in keeping up with Moore’s Law. The migration to the 40 nanometer (nm) process provided an impressive performance boost and the move to 28 nm will do the same, but today’s applications require more.

Today, the largest benefit derived from a new process node is the possibility of integrating more of an application’s functions into a single device. Hence it is a key enabler for SoCs. The first and most obvious tactic to leverage the integration potential for boosting performance is by adding programmable cores.

Multicore devices are characterized as either homogeneous, meaning all the processing cores are the same or heterogeneous, where there is a mix of core types. In fact, nearly all applications require a mix of processing capabilities including signal processing and control code. DSP cores and ARM RISC cores are ideal for this processing mix.

At Texas Instruments, the latest DSP cores support both fixed-and floating-point operations and perform VSP at high clock rates, simplifying algorithm development and deployment. Various ARM cores are available, allowing the SoC provider to optimize the RISC core selection to match processing requirements, power dissipation and process node.

From an architectural perspective, it is important to support homogeneous core implementations. Homogeneous devices (all ARM or all DSP) can be created from a heterogeneous architecture but the reverse is rarely true without severe performance degradation. Figure 1 below illustrates the KeyStone multicore architecture that TI engineers developed, an example of a heterogeneous multicore architecture. 

Figure 1. TI’s KeyStone multicore architecture

Multicore Navigator 
The architecture is made up of functional elements packaged in a way to facilitate application flexibility and scalability.

A flexible architecture is designed with the ability to easily add or remove elements as an application dictates. Applications such as wireless base stations and radar array processing have very similar processing and I/O requirement but quite different acceleration and coprocessing requirements.

Layer 1 PHY accelerators are mandatory for wireless base stations but are not needed for radar array processing. While it is unlikely that the same organization is developing both radar and base station products, they still benefit from the cost savings and volume benefits that accrue to the SoC developer.

When there are a range of products, scalability in the SoC architecture is very important to add or remove processing elements to address varying requirements. Today, wireless base stations range from small cell femto products to large scale multicell macro base stations. Similarly a radar manufacturer may have needs for small and large scale devices.

Simplifying the software ecosystem
Multicore SoC developers often deliver and pre-integrate much of the non-differentiated software; the fundamental software that is functionally the same from any end equipment manufacturer. This ranges from device drivers, real-time operating system (RTOS) ports to key standardized algorithms for target applications.

When properly implemented, this software provides full silicon entitlement to application developers and is production ready. Beyond this multicore SoC, vendors create a development ecosystem to help with application development, testing and board design.

Figure 2. An example of a comprehensive multicore toolkit
For developers, multicore development becomes challenging when writing code for the multicore environment, especially when the application code needs to scale from small to large devices. Where this is the case, both hardware and software need to scale across a range of devices where core counts and hardware accelerators may vary widely from one device in a family of devices to another.

Fortunately, given the complexity of the software and the varying processing elements in a multicore SoC, hardware facilitated software has become reality. Innovative new hardware that is designed to simplify multicore software development is now embedded in the latest generation of multicore devices. This hardware facilitates automatically scaling software for use across a variety of devices derived from a common architecture.

Utilizing hardware assisted software is possible if software is written as small tasks rather than monolithic functions and the hardware is designed to autonomously manage tasks. One innovative approach to this challenge uses large scale hardware queues associated with functional descriptors that identify the processing resources needed generally, as needing DSP or FFT functionality rather than explicitly as in DSP core 2.

The task and data are queued and then the hardware autonomously manages the processing from there. Thus moving from a two core to an eight core SoC does not require software changes. The queuing and descriptor system automatically manages the transition.

Flexible and broad approach to software
Existing and emerging applications will require different uses for the processing elements and cores. Some applications may use each core independently, and others may want to use one processing element as a master while other processing elements are designated slaves. In a third variant, all processing elements could be peers, in which tasks are dynamically allocated. Some applications might want to use the device as a high-performance compute (HPC) engine enabled through such standards as OpenCL and OpenMP.

To address such diverse application requirements, designers will require a development toolkit that encompasses and simplifies the enabler software, development tools and an operating system for a variety of applications. The toolkit should be developed in lockstep with the silicon advances to ensure optimized access the processing cores, accelerators and the multilayer connectivity planes utilized by application developers.

The more progressive SoC developers have embraced Eclipse-based tools that enable their customers’ individual developers to tailor their development environment to their personal preferences. Eclipse-based tools provide the best characteristics of an open development platform with the optimizations that are best enabled by the silicon developer.

The state of multicore processors is evolving at a rapid pace. Leading multicore vendors addressing the infrastructure markets are now delivering second and third generation products that incorporate lessons learned from the pioneering products. In addition, families of products addressing a range of capacities and specialized applications, rather than point solutions, are now available.

These newer designs are based on a common architecture yielding development cost and development time savings for equipment manufacturers. Multicore architectures have taken their rightful place as the lead differentiator when evaluating competing offerings. The power of multicore has been unleashed on developers enabling them to develop new and exciting products for today, tomorrow and for years to come.

Tom Flanagan is the director of technical strategy for the multicore and wireless base station infrastructure business unit at Texas Instruments . His 28 years of industry experience helps TI determine how it can continue innovating and delivering new DSP-based products and technologies to the market. Prior to his current role at TI, Flanagan served as the director of broadband strategy for TI’s Broadband Communications group, where he identified market trends and provided the vision and strategic direction for TI’s broadband portfolio, including cable, DSL and WLAN products.

Sanjay Bhal is a strategic marketing manager for the multicore and media infrastructure business unit at Texas Instruments. In this role, he works on multicore software product management and marketing efforts. Bhal has more than 11 years of experience in the embedded processing industry.

John Warner is the director for the multicore and media infrastructure business unit at Texas Instruments. In this role, he manages the associated product management, marketing and business development efforts. Warner has more than 20 years of experience in the telecommunications industry and helps set the strategic direction of the networking infrastructure group.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.