Semiconductor Insights conducts technical investigations of alleged patent infringement, and through this work, we have gained insight into how the inter-processor communication (IPC) mechanism works on a multicore Qualcomm chipset, like the MSM 7200. This particular chipset features two ARM processors (ARM11 and ARM9), two proprietary DSP processors and a host of cellular and support hardware.
The ARM11 processor is tasked with running the PDA functionality of the handset, communicating through IPC to other cores that support communication and multimedia functionality.
In this article, we'll examine the IPC mechanism used by Google's Android software to communicate between the main ARM11 processor and the other processor cores on the MSM 7200. We'll also examine the closed-source Windows Mobile driver for a commercial cell phone which happens to use the same MSM 7200 chipset.
Android is based on the Linux kernel (the so-called "Titanux" distribution) and provides support for communications between the user-level application programs, running under Linux, on the ARM11 and the other processors.
First off, the IPC mechanism discussed here is at the lowest level--all other inter-CPU IPC mechanisms use it as the base. For example, a TCP/IP connection through the ARM11 processor to another processor ends up going through this IPC mechanism. Diagnostic messages are another example of messages that rely on this low-level IPC.
The IPC mechanism is implemented with two sides--a "client side," which faces the kernel and provides a callback-based style of interface, and a "CPU side," which provides the interface to the other CPUs. The CPU side is implemented as a shared-memory interface, with interrupts and a "doorbell" mechanism. At the highest level, to send messages from the ARM11 to another CPU, the message content is placed in a buffer in shared memory and a hardware port is tickled to indicate to the other CPU that data is available. In the reverse direction, the data is placed into shared memory by the other CPU and a hardware interrupt is triggered on the ARM11. This hardware interrupt causes the ARM11 to examine the shared memory's buffer, retrieve the message and route it to the client.
The shared memory layout is as follows in Figure 1:
![]() |
Figure 1: MSM7200 shared memory layout
The shared data structure consists of four headers: an interprocessor communications control area, 32 unsigned words of version information, information about the heap and 128 table-of-contents (TOC) entries. The four headers are followed by 64 instances of an 8,212-byte data structure, which consists of 20 bytes of header and 8,192 bytes of buffer.
The TOC entries contain information about each of the 8,212-byte data structures in shared memory (we'll refer to these data structures as "channels"). A TOC entry indicates the contents of the structure; that is to say, the structure has (1) an indication of whether it is allocated, (2) an offset within the shared memory and (3) the size of the shared memory.
Each of the 64 struct half_channel array elements contains a header and an 8,192-byte data buffer:
struct smd_half_channel
{
unsigned state;
unsigned char fDSR;
unsigned char fCTS;
unsigned char fCD;
unsigned char fRI;
unsigned char fHEAD;
unsigned char fTAIL;
unsigned char fSTATE;
unsigned char fUNUSED;
unsigned tail;
unsigned head;
unsigned char data [SMD_BUF_SIZE];
};
It's interesting to see that old, RS-232 hardware signal names are used: DSR (Data Set Ready), CTS (Clear To Send), CD (Carrier Detect) and RI (Ring Indicate).
The head and tail members allow the data buffer to be a variable size and usable with a ring-buffer implementation.
As mentioned previously, the IPC mechanism operates on an interrupt basis. When one of the other processors wants to send data to the ARM11, it places the data into one of the 64 channels (the struct half_channel); modifies the fTAIL, fHEAD and/or fSTATE flag; and raises an interrupt line to the ARM11.
In response to the interrupt, the ARM11 interrupt handler walks through the list of opened channels (smd_ch_list in the Figure 2 below) and checks the status flags of each one. If there has been a change, the callback function associated with the channel is called. The callback procedure is bound by the caller when the channel is opened and can be any function.
![]() |
Figure 2: Interrupt handling flow chart
This general-purpose mechanism allows for synchronous and asynchronous message passing to take place between the processors.
For asynchronous message passing, the user sets up a callback routine that does something simple, such as sets a flag. When the interrupt is handled and the callback routine is called, the flag gets set. Some time later, the user's other processing can query the flag, and perform appropriate actions: The user calls smd_open() to allocate a channel and register a callback function. The interrupt service routine will asynchronously call the callback routine. It's up to the user of this channel to arrange the details between the callback routine and the foreground processing.
For synchronous message passing, the situation is almost identical. Instead of modifying the value of a flag and having the user function periodically check the flag, the callback routine posts to a semaphore, and the user function blocks, waiting on that semaphore.
At a slightly higher level, a few more APIs are built on top of this general-purpose mechanism, for example, traditional remote procedure call (RPC) functionality. RPC allows software on one processor to call software on the same or another processor. Simply put, the caller's RPC protocol stack is responsible for marshalling the call parameters into a message and somehow waking up the other processor (the "callee"). The callee processor wakes up, unmarshalls the data and sets up the call to the actual target subroutine. When the called subroutine returns, the return parameters are marshalled into a message, and the callee processor sends the message to the calling processor. The calling processor unmarshalls the data and returns it to the caller, as if it were a local (albeit longer executing) subroutine call.
Since RPC messages can be an arbitrary length, software exists to perform packet assembly/disassembly on the messages travelling between the processors. This packet assembly/disassembly is performed by a foreground thread that waits on a semaphore to tell it that data has arrived.
This is part of the RPC interface functionality:
![]() Click on image to enlarge. |
Figure 3: RPC packet assembly and dequeue
The diagram above (Figure 3) illustrates packet assembly. The function do_read_data() is responsible for assembly of the packet. It spends most of its time waiting on a semaphore (via the rr_read() subroutine). As data comes in, the semaphore gets posted and rr_read() unblocks, returning the packet data. When the packet is assembled, the type of packet is checked, and either a control message is processed, or data is delivered via another queue within RPC.
Now let us have a look at a handset that is based on the same Qualcomm MSM 7200 chipset, but runs Windows Mobile operating system. Because the chipset is the same we were able to leverage our understanding of the IPC mechanism used on Android in order to find similar software under Windows Mobile.
We knew where the shared memory region was on the chipset, allowing us to search for code that used that region. We found code that implemented a similar IPC mechanism as on Android--in fact, it was even more interesting because it appears that there was a REX (a proprietary Qualcomm OS) emulator running under Windows Mobile! So, it appears that the commercial version of the software started out under REX and was later ported to Windows Mobile.
The following screenshot (Figure 4) illustrates a disassembly of the initialization function that maps the virtual address to the physical address corresponding to two of the MSM 7200 memory regions (MSM_CSR_PHYS and MSM_VIC_PHYS).
![]() Click on image to enlarge. |
Figure 4: Function to establish virtual addresses
Reverse engineering software is a bit like solving a puzzle, where each iteration of the solution introduces new constraints and imposes others. Having knowledge of the open source Android software allowed Semiconductor Insights to understand the Windows Mobile closed proprietary platform with greater ease.
Robert Krten is senior software analyst at Semiconductor Insights, a division of TechInsights. He is responsible for reverse engineering software and mapping the derived architecture against claim elements of customers' patents.