Programming languages for multicore systems -

Programming languages for multicore systems

Editor’s Note: In this article on programming languages for multicore software development excerpted from Real world multicore embedded systems , Gitu Jain of Synopsys compares and contrasts the use of C, assembly, C++, Java, Python and Ada.

When writing a software application for an embedded system, the choice of programming language must produce not only an application that executes correctly, but one that does so under the resource and timing constraints imposed by the device on which it runs. This device can be limited in terms of memory, battery power, data transfer bandwidth, or input/output capabilities such as a keyboard or display screen. It can have safety or portability considerations. The application may be limited by the development environment as well.

An embedded system may come with an operating system as elaborate as a regular desktop or it may not have an operating system at all. If the device has no operating system, code must be written to deal with all the low-level details of the device, usually in assembly language. Most embedded systems today come with an operating system that has limited or reduced functionality as compared to a desktop or laptop. If an operating system and a suitable development environment are available on the embedded system, you can use a mid- or high-level language such as C or C11. For developing web applications, Java or Python is suitable. For real-time safety-critical applications such as air traffic control systems, consider Ada. A hybrid of two or more languages can also be used: a high-level language for most of the complex code and assembly language for timing-critical portions and for instructions not supported by the high-level language.

In this article, we will look at the most popular programming languages used for development of multicore embedded systems today. The languages will be presented in order of popularity. The features of the languages that support programming for embedded as well as those that support multi-processing or multi-threading will be illustrated with suitable example code throughout the sections.

C-language development
The two most common programming languages used in embedded systems are C and assembly language. Assembly language lets programmers squeeze out the maximum performance from their applications in terms of speed and memory. But recent advancements in compiler technology for languages such as C have enabled compilers to generate code that is comparable to hand-written assembly language code. There is a distinct advantage in using a mid-level programming language such as C in terms of ease of development and maintenance, shorter debug cycles, testability, and portability.

C has emerged as the language of choice for many embedded system programmers because of its ability to access, modify, or update the hardware directly through language features such as pointers (Figure 1 ) and bit manipulation (Figure 2 ).

Figure 1: Pointers

Figure 2: Bit manipulation

C has the ability to declare members of a data structure at the bit level, as shown in Figure 3 . Members of a data structure declared in this manner, such as flagA and flagB , can be used and addressed in exactly the same manner as other members of the data structure, such as varA and varB .

Figure 3: Bit fields

The C language provides a limited set of language features. An embedded device may need certain features that the language does not support (for example bit-wise rotation), which you may need to program in assembly. You can do that in the form of in-line assembly embedded in the C program (see Figure 8 in the Assembly section).

Dynamic memory allocation is a property of C, and other high-level programming languages, where a program can determine, at run-time, whether it needs a certain amount of memory to store a variable, and gets that memory via a system call, malloc(). The memory allocated is placed in the heap. Many embedded systems have limited heap or no heap at all, in which case it may be necessary to disable the dynamic memory management feature of C and do the memory allocation statically in your program. For example, you can change dynamic allocation to static allocation for a linked list as shown in Figure 4 . The program prints ‘0 1 2’.

Figure 4: Replace dynamic memory allocation by static allocation

Another programming trick for embedded systems is to replace recursion by iteration , as recursion is very inefficient in terms of space and time due to the added cost of a function call for each recursion. See Figure 5 to see how recursion can be replaced with iteration in some cases.

Figure 5: Recursion vs. iteration

Another rule of thumb when programming for embedded systems is to declare your variable or function argument as a const if it is not going to be modified. This is because these values can then be stored in ROM (read-only memory) rather than RAM (random-access memory), ROM being cheaper and more plentiful in embedded systems; see Figure 6 .

Figure 6: Use of const

Certain read/write compiler optimizations can cause caching that does not work if the embedded device needs to communicate with I/O peripherals. Use the volatile keyword in such cases. For example, in Figure 7 , suppose a variable cvar is not being used anywhere other than the two lines where it is being set to 1 and 2 in the function ControlFunc() . The lines *cvar = 1 ; and *cvar = 2 are optimized away by the “smart” compiler because it thinks those values are never used and can be removed.

Figure 7: Use of volatile

But suppose this variable was the control line to an external I/O device and setting the memory location pointed by *cvar told that device to start some operation and writing 2 told it to stop. Optimizing these two lines will cause this operation to be completely lost. This can be prevented by using the volatile keyword as shown in Figure 7 . Note, however, that the volatile keyword, when used, may also turn off other compiler optimizations such as SIMD, loop unrolling, parallelizing and pipelining. Use it with care for embedded multicore processors.

Multi-threading support in C. C11 is the new C language standard published in 2011, and it has added multi-threading support directly in the language in the form of a library, . It defines macros, declares types, enumeration constants, and functions that support multiple threads of execution. For more information, refer to the published standard [1].

If you are using an older version of C, you can add multi-threading to your programs by using a standard threading library such as POSIX Threads, Win32 Threads, or Boost Threads (for portability).Assembly language
Assembly language enables programmers tooptimize their applications in terms of speed and memory requirements,which is desirable in embedded systems. Most microcontrollers used to beprogrammed in assembly language, and it is still used by embeddedsystem programmers today as a standalone language or in conjunction withC. Even though assembly is not portable (it is very architecturespecific) and is hard to code, test, debug and maintain, there areseveral cases in which knowledge and use of assembly comes in handy:

  • In some low-cost embedded systems with no support for C compilers, such as small microcontrollers. Assembly is the only programming interface available to you.
  • Situations where the high-level language compiler does not provide support for all the hardware features. You will need to write the functionality in assembly, either in-line or as separate functions. For example, in Figure 8 , you can do bit-wise rotation using assembly, something C does not support.

Figure 8: Rotating bits example in assembly

  • Situations where you need to squeeze every last bit of real-time performance out of the device, for example, when writing digital signal processing (DSP) applications.
  • For writing device drivers where the timing needs to be strictly controlled.
  • Programmers frequently use the assembly output of the C compiler to debug, hand-edit, and hand-optimize the assembly code to maximize the performance in ways the compiler cannot. This lets them go down to architectural details of their code and count instructions to obtain accurate timing characteristics of their application and look at the execution of their code at the machine instruction level.

Whennone of the above holds true, you should use a higher-level languagesuch as C, as the benefits of using it far outweigh any performancepenalties you may incur, which may not be as heavy as you think giventhe advances in compiler technologies today.

Writing assembly code for embedded systems usually consists of either inline assembly , where short assembly routines are embedded directly in C or C11 code and compiled using a common C compiler, or linked assembly ,where all the assembly routines are isolated in a “.asm” file,assembled using a separate assembler, and linked with other objectfiles, possibly from C/C11 code.

This ability to mix C andassembly can help you write programs where you can do most of yourprogramming in C, and write only the performance or memory-criticalportions in assembly if needed.

Multi-threading and assembly. Ifyou need to create threads from assembly code, you should ideally callthe C/C11 thread routines supported on your platform to do so. If youare not using a language such as C/C11, then you can create threadsusing system calls. As far as assembly is concerned, there is nodifference between one thread and multiple threads; each has its ownregister set. The operating system takes care of the scheduling ofthreads.

C11 for embedded multicore systems
C11 is ahigh-level object-oriented language that offers embedded systemprogrammers certain features not found in the C programming language.However, caution must be exercised when using these features as theysometimes come at a price. C11 programs run more slowly and producelarger executable code than C. Embedded system programmers usually useC11 in a reduced form since it has several features that are bulky,inefficient, or inappropriate for use in the resource-constrainedenvironments normally found on such devices.

Class definition. Definition of a class includes a list of public, private, and protecteddata members and functions. The C11 compiler is able to use the public and private keywords to determine which method calls and data accesses are allowedand disallowed at compile time, so there is no run-time penalty. Goodobject-oriented use of classes can lead to modular designs that lead towell-designed applications.

Constructors and destructors. A constructor is called every time an object of a class type is created.
A destructor is called every time the object goes out of scope. There is a slightpenalty associated with the calls, but it eliminates an entire class of Cprogramming errors having to do with uninitialized data structures andmemory leaks. This feature also hides awkward initialization sequencesthat are associated with complex classes. In Figure 9 String is a class declaration with two constructors and a destructor.

Figure 9: String class

Function and operator overloading. With function overloading ,functions having the same names but different parameters are eachassigned unique names during the compilation process. The compileralters the function name each time it appears in your program, and thelinker matches them up appropriately. This does not affect performance (Figure 10 ).

Figure 10: Function overloading

Similarly, with operator overloading , whenever the compiler sees an operator declaration such as the ‘+’ operator in the example in Figure 11 , it simply replaces it with the appropriate function call.

Figure 11: Operator overloading

Virtual functions (Polymorphism). C11 would not be a true object-oriented language without virtualfunctions and polymorphism. Polymorphism is the characteristic of beingable to assign a different meaning or usage to something in differentcontexts — specifically, to allow an entity such as a variable, afunction, or an object to have more than one form. In C11, virtualfunctions are defined in base classes with the same interface, and areover-ridden in derived classes with different implementations.

Objects with virtual functions have an additional pointer called a vptr (virtual table pointer). The vptr points to a virtual table of pointers called a vtbl .Each pointer points to a virtual function in the class. When a virtualfunction is invoked on an object, the actual function called at run-timeis determined by following the object’s vptr to its vtbl and looking up the appropriate function pointer.

For example, in Figure 12 , the call to poly[0]- . area() will access the vtbl of the Rectangle class as poly[0]is an object of type Rectangle, and call the appropriate virtual function area() defined in that class. Virtual functions have a reasonable cost/benefitratio for large classes since they need only one additional memorylookup before being called.

Figure 12: Polymorphism and virtual functions

In addition to these, the techniques previously outlined for C apply equally well to C11. These include pointers (Figure 1 ) and bit manipulation (Figure 2 ) to access hardware directly; in-line assembly for hardware features not supported by the language (Figure 8 ); static instead of dynamic memory allocation (Figure 4 ); replacing recursion with iteration (Figure 5 ); and const (Figure 6 ) or volatile (Figure 7 ) declarations where appropriate.

Features of C11 that do not work well
Thefeatures of C11 that are too expensive for embedded systems aregenerally the ones that cause a dramatic increase in run-time or havenegative impact on code size.

Templates. In C11, templates allow a single class to handle several different data types. Forexample, you can use function templates to apply the same algorithm todifferent data types. Another good use is for container classes such asvectors, lists, or queues (Figure 13 ). Templates are implementedat compile time by making a complete copy of the code for each data typeinserted for the template type. It is not hard to see how this can leadto code size explosion, which could overflow the memory of a smallerembedded system.

You can use templates if your compiler does agood job of compiling them efficiently as they reduce the effort ofcoding, testing, and debugging for different data types to a single setof code.

Figure 13: Templates

Exceptions. In C11, exception handling separates the error handling code from the code written to handle thetasks of the program. Doing so makes reading and writing the codeeasier. Exception handling propagates the exceptions up the call stack.An exception is thrown at the place where some error or abnormalcondition is detected. This will cause the normal program flow to beaborted. In handled exceptions, execution of the program will resume at adesignated block of code, called a catch block (Figure 14 ).Exception handling in C11 is not efficient in terms of both space andrun-time, so it’s best to avoid using this feature in embedded systems.

Figure 14: Exceptions

Run-time type identification. Run-time type identification (RTTI) lets you find the exact type of an object when you have only a pointer or reference to the base type. The dynamic_cast < >  operation and typeid operator in C11 are part of RTTI. With dynamic_cast < > theprogram converts the base class pointer to a derived class pointer andallows the derived class members to be called. The typeid operator isused to determine the class of an object at run-time. There is a spaceand runtime cost for using this feature and it is recommended that, forembedded systems, you should disable RTTI for the sake of more efficientcode.

Also be aware that C11 does not allow you to declare bit fields in your data structures like C does (Figure 3 ),so memory alignment can be an issue and can result in an increase inthe footprint of your code. Locality of reference in the data cache willalso be affected, as some memory in cache is never referenced afterbeing fetched.

Multithreading support in C11 . Thenew C11 language[1] standard was published in 2011, and has addedmulti-threading support directly into the language in the form of a < thread > library. It provides functions for managing threads, mutual exclusion(mutex) management, generic locking algorithms, call once functions, andcondition variables. For more information, you can refer to thepublished standard at [2].

If you are using an older version ofC11, you can add multi-threading to your programs by using a standardthreading library such as POSIX Threads, Win32 Threads, or Boost Threads(for portability).

Java is a high-levellanguage with which you can write programs that can execute on a varietyof platforms. Java was designed to produce code that is simpler towrite and easier to maintain than C or C11.

Java applications are typically compiled into bytecode (files with a .class extension) that can run on any computer architecture with the help of aJava interpreter and run-time environment called a Java Virtual Machine(JVM). Bytecode can also be converted directly into machine languageinstructions by a just-in-time compiler (JIT).

Java is one of themost popular programming languages in use today, and has become popularin high-end embedded systems such as smartphones, PDAs, and gamingconsoles. It is well suited for web applications.

Java is anobject-oriented language with features very similar to C11, such asclass structure, polymorphism, and inheritance. There are somedifferences and improvements as outlined below.

  • Java does not support C/C11 style pointers; it uses references instead. This prevents errors caused by using pointers to trick the compiler into storing any type of data at an address.
  • It provides a technique known as “native methods”, where C/C11 or assembly code can be called directly from Java to manipulate the hardware registers and memory directly using pointers. This can sometimes be useful in embedded systems.
  • It has automatic garbage collection that simplifies dynamic memory management and eliminates memory leaks.
  • Multiple class inheritance in C11 is replaced with interfaces in Java.
  • It has support for automatic bounds-checking that prevents writing or reading past the end of an array.
  • It has a fixed size for primitive data types; for example, an int in Java is always 32 bits, unlike C or C11 where it can be 32 or 64 depending upon the compiler.
  • All test conditions must return either true or false. For example, the error if (x = 3) , which should correctly be if(x= = 3) , will be detected at compile-time, while it would be allowed by a C or C11 compiler as unorthodox but acceptable.
  • It has built-in support for strings and string manipulation that allows statements like “Hello” + “World” .
  • It has built-in multi-threading support that makes applications portable by providing consistent thread and synchronization APIs across all operating systems.
  • It has many useful standardized libraries for graphics, networking, math, containers, internationalization, and other specific domains.

Multi-threading support in Java. Java defines two ways in which you can create a new thread of execution — you can provide a Runnable object (Figure 15 ); or you can extend the Thread class (Figure 16 ).

Figure 15: Implement Runnable interface.

Figure 16: Extend class Thread

Java provides thread communication and synchronization mechanisms through the use of monitors . Java associates a monitor with each object. The monitor enforces mutually exclusive access to synchronized methods defined in the object such as shown in Figure 17 .When a thread calls a synchronized method, the JVM checks the monitorfor that object, and, if the monitor is free, ownership is assigned tothe calling thread, which proceeds with execution. If not, the callingthread has to wait until the monitor is freed by the thread currentlyowning it. Note that Java monitors are not like traditional criticalsections, as they are associated with objects and not blocks of code.For example, two threads may execute the method Increment () concurrently if they are invoking it on different objects.

Figure 17: Synchronized method in Java

Some of the classes and containers in the java.util.concurrent packageprovide atomic methods that do not rely on synchronization and arestill thread-safe. The following methods can be used for inter-thread communication .

  • wait( ): this method tells the calling thread to give up the monitor and go to sleep until some other thread enters the same monitor and calls notify( ).
  • notify( ): this method wakes up the first thread that called wait( ) on the same object.
  • notifyAll( ): this method wakes up all the threads that called wait( ) on the same object. The highest priority thread will run first.

See the producer-consumer example in Figure 18 . The consumer will keep getting values in itemCount as long as the producer keeps putting it there. Notice the use of the synchronized keyword to control access to the shared Buffer object and the use of wait() and notify() for inter-thread communication.

Figure 18: Inter-thread communication using producer-consumer example

In Java, a semaphore is created using the java.util.concurrent , as part of a semaphore class (Figure 19 ).

Figure 19: Use of semaphore in Java

RunningJava programs requires a JVM, which can take up significant resourcesand slow down performance. Because of this “managed run-time”, Java isconsidered a costly language and is not suitable for most embeddeddevices which have limited resources such as memory and battery power.It is also not suitable for real-time and safety-critical applications.

Forthis reason, Java is only popular in high-end embedded devices such asmobile phones, where portability and the ability to browse the internetare needed. There have been attempts to tailor Java for the embeddeddevelopers’ community called Embedded Java, where Java can be run inthree ways:

  • Run using JVM with a JIT compiler — JIT compilers use too much memory.
  • Run using a special-purpose JVM and core libraries — stripped- down version.
  • Run compiled Java (instead of interpreted) — best run-time behavior but not portable.

Manyvendors offer run-time interpreters, environments and compilers forEmbedded Java with stripped-down versions of the core Java libraries andrun-time environment.

Python is ageneral-purpose, highly flexible, high-level programming language thatis gaining popularity due to its ease of use and ability to createcustom code quickly. It supports multiple programming paradigms, fromobject-oriented styles to use as a scripting language for webapplications. Python is much slower than programming languages such asC, so it should not be used for timing-critical applications. Pythonalso requires a lot of memory or disk space to run, so it cannot be yourlanguage of choice for smaller embedded systems.

Python can beused to create applications for high-end embedded systems such assmartphones, which host many web applications. Python is used byembedded system developers to create custom prototype code quickly,which is one of the strong points of the language.

Python isvery easy to learn for people of various programming backgrounds, suchas Java, C or Perl. Python leads to the creation of highly readable,compact, and well-structured code. Python code, compiled and running onPCs or emulators, can be used to test applications.

Multithreading support in Python. The library provides high level threading interfaces on top of the lower-level _thread module, which is based upon POSIX Threads. This module defines the following functions and objects, modeled after Java’s Thread class and Runnable interface:

  • Thread — this class represents an activity that is run in a separate thread of control. There are two ways to create a thread: by implementing the Runnable interface, or by overriding the run() method by deriving a subclass from the Thread class.
  • Lock — a primitive lock is the lowest synchronization primitive available in Python. It has two basic methods, acquire() and release().
  • RLock — a reentrant lock is a synchronization primitive that may be acquired multiple times by the same thread. A thread calls acquire() to lock and release() to unlock. Acquire()/release() call pairs may be nested.
  • Condition — a condition variable has acquire() and release() methods that call the corresponding methods of the associated lock. It also provides wait(), notify(), and notifyAll() methods. The wait() method releases the lock, and then blocks until it is awakened by a notify() or notifyAll() call for the same condition variable in another thread. Once awakened, it re-acquires the lock and returns.
  • Semaphore — a semaphore manages an internal counter which is decremented by each acquire() call and incremented by each release() call. The counter can never go below zero; when acquire() finds that it is zero, it blocks, waiting until some other thread calls release().
  • Event — This is one of the simplest mechanisms for communication between threads: one thread signals an event and another thread waits for it.

Adais a high-level, object-oriented programming language that has built-insupport for parallelism. In the late 1980s, the US Department ofDefense (DoD) mandated the use of Ada for all its software projects.Today, Ada is still used for the majority of DoD’s projects, althoughthe mandate has been lifted and many more languages are in use, such asC, Fortran, and C11. Ada is also used in many embedded and real-timesafety-critical commercial systems such as air traffic control, railwaytransport, and banking systems.

Ada’s features include exceptionhandling, concurrency (tasks), modularization (packages), hierarchicalnamespaces, object-oriented programming, and generic templates. Unlikeother programming languages such as C11 or Java, dynamic allocation fordata such as arrays and records is not performed unless explicitlyrequested by the programmer; this is a useful characteristic forembedded systems.

Ada provides a large number of compile-time andrun-time checks that help produce high-quality, maintainable softwareapplications. For example, you can specify a range of values for ascalar variable in Ada, where an attempt to assign an out-of-range valuewill be detected.

Ada compilers also help detect potentialdeadlocks, a software bug characteristic of multi-threaded applications.Because of all these run-time checks the performance of Adaapplications can suffer, making this a poor choice forperformance-critical applications (for which assembly or C might bebetter suited). The performance can, however, be improved by turning offsome of these checks.

Concurrency support in Ada. The unit of concurrency in Ada is the task , and tasks generally interact with each other through encapsulated data (protected objects ) or via direct communication (rendezvous) .A good comparison of the real-time features of Ada and Java arepresented in the paper in [3]. Section 7 compares the mutual exclusionmechanisms of Ada and Java and section 8 compares the task/threadsynchronization and communication controls provided by the twolanguages. Ada has predictable and portable thread scheduling that workswell for real-time applications.

  • Task is the basic unit of concurrency in Ada; it is equivalent to a thread. It has a declaration and a body. Tasks begin as soon as they are created. They can pass messages and can share memory. Two or more tasks communicate by sending messages using entry and accept methods.
  • Rendezvous represents synchronization via message passing where sending and receiving tasks have to wait.
  • Protected Objects encapsulate data and operations. The operations then have automatic mutual exclusion. Guards can be placed on these operations for conditional synchronization.

Thereare a wide variety of programming languages available to developerstoday. Not all of them are suitable for developing applications forembedded systems and, in particular, for multicore embedded systems.There may be resource, run-time, and safety constraints that dictate thechoice of language. In addition, you have to take into account the easeof development and the familiarity of the programmer with the languageof choice. If you are developing for a multicore embedded system, youhave to choose a language that supports efficient development for thesesystems, especially in terms of memory synchronization and debuggingissues.

In this article we presented the top choices forprogramming languages for multicore embedded systems with suitableexamples illustrating how you can apply them to your needs. Somelanguages were designed with inherent concurrency built into thelanguage, such as Ada and Java. Others have concurrency support addedlate, such as C and C11, even though these languages have been used todo multi-threaded programming for several decades using thread libraries

Someoffer function-level locking, such as C or C11, while others offerobject- level locking, as in Java. Some offer reliable scheduling ofthreads, such as Ada. In the end, all the mid- to high-level languagespresented here provide support for developing applications on multicoreembedded systems.

You will need to consider all these factorscarefully when choosing a programming language. In addition, thelanguage features which affect performance, memory requirements, safety,and ability to access and modify hardware directly, ease of use, andpopularity will also be critical when making a decision.

[1] International Standard ISO/IEC 9899:201x,  Programming languages – C , 2011.
[2] C++11 Thread support library , 2011.
[3] B.M. Brosgol, A comparison of the concurrency and real-time features of Ada 95 and Java . Ada UK ’98 Conference, October 1998.

AComparison of the Concurrency and Real-Time Features of Ada 95 and Java- See more at:

Dr. Gitu Jain is a Software Engineer at Synopsys and also teaches at the UC Santa Cruz Extension in Silicon Valley. Shehas 20 years of experience in software R&D at semiconductorcompanies, with expertise in parallel computing and EDA algorithmdesign. She has a Ph.D. in Electrical and Computer Engineering from theUniversity of Iowa.

Used with permission from Newnes, an imprint of Elsevier, Copyright 2013, this article was excerpted from Real world multicore embedded systems , by Bryon Moyer.

4 thoughts on “Programming languages for multicore systems

  1. “It's pretty clear from elsewhere in the article that the author means C++11 when he writes C11. Epic proofreading failure?nnAlso the part about Python seems to ignore the existence of MicroPython which is a Python implementation that can be run on the b

    Log in to Reply
  2. “It is clear the author is a bit mixed up about C and C++, as well as the standards versions. By “If you are using an older version of C11”, he means “an older version of C++”. And templates and exceptions are not exactly new features of C++11 !nnR

    Log in to Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.