Specialized compilers address ADAS needs
All major OEMs and software suppliers of the automotive industry are committed to advanced driver assistance systems (ADAS). A close look, though, raises questions on the demands ADAS applications place on compilers and toolsets. There are differences between traditional automotive applications and ADAS, and current compliers need some adaptations to better address ADAS needs.
ADAS applications as a challenge
To better support the task of driving autonomously, vehicles need to be much more aware of their surroundings. Several new sensors (Radar, Lidar, cameras, etc.) can be used to detect road markings, other vehicles, obstacles, and other relevant environmental data with high resolution (Fig. 1). In the past, it was common practice for automotive systems to process only individual measurements from specific actuators (steering angle, pedal positions, various engine sensors, etc.) in real time.
Figure 1: Sensor-based monitoring areas of different ADAS applications
As is common with physical measurements, however, the environmental data acquired for ADAS applications are subject to noise (Fig. 2) and measurement errors. Therefore, the data require electronic post-processing by hardware and software before they can be used for their ultimate purpose, i.e., to automatically offload decisions from the driver. This post-processing is not always dealing with individual measurements like before, however. Quite frequently, data from different sources get consolidated (sensor fusion) for reduced error susceptibility.
In order for the ADAS to automatically make decisions on the driver's behalf, it must process a tremendous amount of data in real time. Further, that data is complex. Traditionally, just isolated sensor data involving only some integer or fixed-point numbers with 1 - 5 kbps data rates needed to be processed. Today, data are often provided as floating-point numbers (floats/doubles) at high rates. Camera images, for instance, provide approximately 340 kbps and radar data, around 1.5 Mbps.
Figure 2: The noisy signal of an environment sensor and its filtered result
Obviously, ADAS applications require a lot more processing power than traditional automotive applications. But currently it is very hard to predict which high-performance hardware architectures will prevail for these kinds of applications. So, because ADAS applications must be produced in a reusable and cost-efficient manner, it is clear that compiler support will be required for all architectures. This requirement mandates the use of abstract, portable design methodologies (e.g. C++11/14), model-based design and additional technologies including parallel programming (e.g. OpenCL, Pthreads). Furthermore, highly optimized, certified libraries will be required to implement standard operations efficiently, safely, and with maximum hardware independence. Because ADAS applications intervene with the driving process, these applications and the hardware used to execute them must also adhere to relevant safety standards (ASIL-B or higher; ISO 26262).
Finding a suitable hardware architecture
For companies developing ADAS applications, the fact that no specific hardware architecture has prevailed until now creates a risk. In general, major hardware accelerators -- including the Nvidia GPU derivatives (Drive PX) -- provide adequate computational power in the Teraflops range for the data-parallel parts of ADAS applications. However, apart from lacking sufficient safety features, these devices are rather cost-intensive regarding their power consumption and purchasing price. On the other hand, typical architectures for safety-critical applications up to ASIL-D (incl. AURIX or RH850) have not yet utilized some hardware based opportunities to achieve higher data rates because these will be hard to certify according to ASIL-D.
OEMs or large suppliers of ADAS systems therefore are in danger of selecting an architecture that may fail in the market because it is too large, too expensive, or cannot meet the safety requirements. On the other hand, there is a risk in selecting an architecture that fully supports safety-critical applications that it is too small for the more demanding computations. During the development process, it might turn out that the envisioned application cannot be implemented for efficiency reasons.
Thus, the requirements of ADAS projects are quite complex. On the one hand, it is mandatory that developers create very efficient, target specific code, meet all safety goals, and minimize the risks outlined above. On the other hand, portable and high-level design methods are necessary to enable cost-effective application development. These high level design requirements mandate modifications of the embedded compilers that were originally designed for traditional embedded applications.
Efficiency of code structure
One necessary, new compiler feature is the need to support the typical code structures of ADAS applications in order to create highly-efficient code for this kind of application. The data structures used and the operations they are subjected to in an ADAS application differ fundamentally from those found in classical applications, and the code used for sensor fusion and analyzing sensor data is commonly generated using model-based tools like BASELABS. For instance, the speed-relevant part of ADAS code often involves arrays (vectors and matrices) of floats and doubles, which are subjected to linear algebraic operations such as matrix multiplication, inversion, SVD (singular value decomposition), and the like. Using these operations, the system combines the sensor data arrays to compute an abstract representation of the environment which is then used as the basis for decision making (e.g., for detecting objects in an image or for tracking and assigning the spatial position of an object).
Highly optimized libraries
This heavy use of arrays and floating-point data means that entirely new optimizations are required for compilers in order to provide efficient results. For instance, the most commonly used linear algebraic functions are typically provided by libraries that are highly optimized for the specific target architecture. All computations not included in the libraries, therefore, must be well optimized by the compiler in order to prevent these computations from becoming the bottleneck.
Many of the performance-critical computations in ADAS applications are based on a set of standard linear algebraic operations. The overhead resulting from porting such ADAS applications to different target architectures can be reduced dramatically if a standard interface is used for these standard operations. Libraries supporting a standard interface and which are highly optimized for the specific target architecture are available for the most relevant hardware platforms, including LAPACK from Tasking for embedded, cuBLAS from Nvidia, and Intel's Integrated Performance Primitives.
Quite often, these libraries are as much as an order of magnitude faster than open-source offerings or in-house implementations. Consequently, a standard interface-based application can immediately achieve excellent efficiency even on new target platforms without the need for designers themselves to optimize and test the underlying, performance-critical computations in the target specific library. Note, however, that not all libraries are adequately certified for safety-critical applications or suitable for embedded systems.
An additional new capability required from embedded compilers is the support of current languages like C++11/C++14. The goal in ADAS design is to improve the code’s reusability and to achieve more with fewer lines of code, without giving up the efficiency provided by closeness to the hardware. C++ classes and inheritance are time-tested methods to write such code on a higher, more abstract level.
C++11 and later variants support these methods but offer significant advantages over the older C99 language standard. Furthermore, C++11 (and C11) finally provide the opportunity to write portable, parallel programs. The computational overhead, considering all response-time requirements, of many ADAS applications often exceeds the capabilities of sequential processing implemented on a single core. Parallel and multicore processing, therefore, is a common ADAS system requirement.
Older standards like C99 do not acknowledge parallelism, so programmers using those languages must have excellent hardware and compiler knowledge to correctly write a parallel program. Programmers must, for instance, exclude specific data ranges and code sections from parallel accesses in order to ensure that no data updates are lost or incorrectly read during access. Programmers must also insert barriers (mutexes) into the code to keep critical sections from being executed by more than one core at a time.
However, the barrier insertion technique only works if the compiler is aware of these barriers. Without such awareness, the compiler may move code sections out of the protected parts during compiler or hardware optimizations. Before the advent of C11/C++11, there was no uniform way to notify the compiler of such barriers. So, programmers had to disable important optimizations altogether, resulting in significant efficiency degradations, or they misused attributes like ‘volatile’ to restrict compiler optimizations. It has now become generally accepted that using the ‘volatile’ attribute is not sufficient for writing correct and portable parallel code.
Mutexes and the like are now part of the C++ standard, however, so a C++11 compiler is aware of all barriers. The compiler can therefore prevent the application of optimizations when necessary without incurring any unnecessary speed penalties. Instead of using optimizations, programmers use the ‘atomic’ attribute introduced by C11/C++11. With ‘atomic’, the compiler generates code that addresses the hardware so as to expose the expected behavior and generates a minimum performance overhead. Programmers thus can focus their efforts on their main task, i.e. the code’s functionality, instead of trying to generate specific code patterns via unsuitable means like ‘volatile’ and optimization inhibition.
Unfortunately, it is generally not possible to detect all the program sections and data that have incorrect protection from parallel access. Sometimes, programs with incorrect protection will not generate any compiler errors and thus appear to operate correctly. Yet these programs can spontaneously produce false results as a result of subtle timing issues. These errors generally appear only after very long testing times. Further, they are difficult to reproduce because they depend on relative execution times and time-related disturbances within the system.
Thus, it is not quite trivial to write correct parallel code even when using C11/C++11. Self-written parallel code also bears the risk of being correct but only marginally faster (or even slower) than the easier-to-maintain functionally equivalent sequential code. Fortunately, libraries like EMB² and LAPACK can be used with relatively little risk, as they were written by experts in this field. As an additional advantage, these libraries ensure a relatively large speed increase due to their parallelism and optimization.
Hardware accelerator support
New compiler capabilities are also required to address increasingly heterogeneous hardware architectures featuring widely differing cores (ARM, AURIX, RH850, Nvidia, etc.) and supplemental accelerators (e.g., FFT in AURIX, SIMD in ARM and Nvidia). Several approaches are possible. One such approach is support for intrinsics, the most straightforward method to support hardware accelerators. These constructs can be used to address special hardware instructions from C/C++.
Continue reading on Embedded's sister site, EDN: "Optimizing compilers for ADAS applications."