Why MIPS is just a number - Embedded.com

Why MIPS is just a number

Millions of instructions per second doesn't always represent the true computational capability of a device. Here’s what you can do about it.

It is common to represent microcontroller (MCU) computation capability in terms of MIPS (millions of instructions per second). However, no two MCU or system on chip (SoC) architectures are same, nor is the amount of integration to accelerate performance of various applications. Therefore, firmware applications may take fewer CPU cycles if proper hardware features are used. While migrating to different architecture, if developers rely solely upon MIPS to predict the computational capability needed for an application, they can be grossly mislead. This article analyzes various architectural features of MCU/ SoC in the context of some typical computational problems with the goal of exploring why MIPS doesn't represent the true computational capability of a device and what to do about this. Specifically, it will focus on MCU/SoC devices running at under 100 MHz as there aren't many benchmarking standards that focus on comparing system-level capability of these devices.

Characteristics of sub 100-MHz architectures

Sub 100-MHz range MCUs typically use 8-, 16-, or 32-bit architecture with a data bus that is 8-, 16-, or 32-bits wide. There are also other major differentiators such as Harvard/ Von Neumann and RISC vs CISC, each of which creates interesting differences. For most MCUs, different instructions require a different number of machine cycles to execute. Also, many times oscillator frequency is different from machine cycle; in other words, for a classic 8051, 12 oscillator cycles equals one machine cycle, while for many PIC devices, four oscillator cycle equals one machine cycle.

Let's look at one example to understand this well. Assume a device has an oscillator frequency of 20 MHz and that two oscillator cycles equal one machine cycle. Also, instructions take one to six machine cycles to execute. What is the MIPS rating of such a device? Dividing the oscillator frequency by two gives us available machine cycles of 10 million. How to convert available machine cycles to MIPS, however, depends on what point of view you take. If you're a marketing person, you will like to look at the best case so assume every instruction is a single cycle, making this a 10-MIPS device. If you want to know the theoretical minimum capacity, you will assume every instruction takes six cycles for a 1.66 (10 / 6) MIPS device. These are the peak and lowest MIPS. For a typical application, the actual MIPS will be something in between, depending upon the instruction mix of the application. Another assumption here is that different architecture instructions have similar computational capability, something which is hardly ever true.

Here we have assumed that the number of machine cycles is the only thing that determines how many instructions a device can execute. Next let us consider how flash can affect processing capacity. Usually flash cannot supply data at a rate higher than 20 MHz. Therefore, if CPU is running faster than 20 MHz and executing instructions out of flash, the flash data rate becomes the primary limiting factor. The way this problem is solved is by making the flash bus wider than the data bus width and creating an instruction buffer to keep up with the instruction rate. This is achieved by the CPU fetching the next instruction while the current instruction is being executed. This approach works well with linear code. Unfortunately, real system code is hardly linear. Every time code branches, the instruction buffer needs to be rebuilt. Another way to improve performance is to add cache memory. In short, if one MCU/SoC manages flash efficiently while another doesn't, for the same machine cycle and instruction mix we'll get significantly different performance figures.

Factors such as these are reasonably well-known, and developers commonly consider them when comparing the performance of different devices. Now let's talk about factors that aren't so obvious.DMA's impact on MIPS
Certain MCU/ SoC device offer DMA (direct memory access) capabilities, which can improve performance by offloading memory accesses from the CPU. How do we judge the impact of DMA on MIPS? Let's examine a typical use case of the serial communication protocol Serial Peripheral Interface (SPI) in master mode. SPI is a good example because it's typically the highest throughput intraboard communication peripheral on an MCU/SoC and is used with memory, Ethernet, wireless transceiver chips, and so on.
Let's assume following:

•   SPI speed: 8 Mbps
•   Packet size: 128 bytes
•   Data throughput requirement: 160 µS per packet

With an SPI speed of 8 Mbps, it takes 1 µS to transfer 1 byte. Therefore it will take 128 µS to transfer 128 bytes. Our budget is 160 µS per packet. This leaves 32 µS (160 to 128) available for SPI management. This 32-µS budget needs to be divided evenly across the 128 bytes as the system needs to load a new data byte every 1 µS. Dividing 32 µS/ 128 gives us 250 nS for SPI management per data byte transfer.

For the examples Figure 1 and 2 , DMA reduces the MCU/SoC speed requirement by 160 MHz. However, it reduced the CPU processing power requirement by 200 MHz. If we assume a single cycle equals MIPS, for this application the DMA was equivalent to a 200 MIPS processor.


Click on image to enlarge.

The effective MIPS due to DMA depends highly on throughput requirements. If we take another extreme of this application and assume there are no timing restrictions per packet, DMA reduces CPU cycle count by 50 cycles per byte: for 128 bytes, this reduces the cycle count by 6,400. If the MCU needs to run at 16 MHz to support 8-MHz SPI operation and if 128-bytes packet are transferred only once per second, a MCU/SoC without DMA needs to run at speed of 16,006,400 instructions per second compared with 16,000,000 instructions per second speed of MCU with DMA. Therefore, for this particular use case, DMA's impact is negligible.

Coprocessor impact on MIPS
It's not uncommon for an MCU/ SoC to have coprocessor. Coprocessors allow parallel processing of certain compute-intensive tasks to offload the CPU and increase the effective MIPS of the processor.

Consider an application where input audio data is coming in and is sampled by an ADC at 44.1 Ksps. Say we want to reject a line frequency of 50 or 60 Hz. For this purpose, we'll use a digital band stop filter. The sampling speed will be 44.1 Ksps, 22.7 µS between samples, and the FIR (finite impulse response) filter tap size will be 128. For simplicity, we won't address the output stage of the filter.

For the example in Figures 3 and 4 , a coprocessor reduced the CPU speed requirement by 44.1 MIPS. Note that the example used a simple FIR filter. If a more complex filter is required, the MIPS requirement could be substantially higher (in the hundreds of MIPS).


Click on image to enlarge.
Programmable digital has an impact on MIPS, too
Some MCU/ SoC devices have programmable digital logic in the form of CPLD or FPGA logic. This allows developers to implement CPU functions traditional implemented in software in the hardware domain. Let's examine what impact programmable digital logic may have on MIPS.

Consider an application for a three-phase brushless DC (BLDC) motor that rotates at 50,000 RPM. These motors require pulse sequencing to rotate them. For simplicity, also assume that Hall sensors are being used to detect the position of the motor rotor. Three such Hall sensors are required to achieve this. At every 60-degree electrical rotation, one of the Hall sensor's output changes. If the motor has two rotor pole pairs, two electrical cycles will make one mechanical rotation. This means that for one full rotation, there will be 12 changes in Hall sensor output. The Hall sensor output results in changes in six pulse width modulator (PWM) outputs. Three PWMs, each with complementary outputs, are used to create these six PWM outputs.

Figure 5 shows the Hall sensor input to PWM output relationship. A positive PWM value indicates that the high side of the PWM is active while a negative value indicates that the low side of the PWM is active.


Click on image to enlarge.

Now let's analyze how the BLDC commutation is typically implemented and how it could be simplified if a device has programmable logic (CPLD or FPGA) capabilities.

In Figures 6 and 7 , programmable digital logic reduced the CPU speed requirement by two MIPS. If the motor was rotating at a higher speed, the impact on MIPS will be greater, and vice versa. This example assumes the use of more simple open-loop control with optimized assembly. Real-world applications are typically more complex and usually use C code for easy maintenance and reuse. If generic C code is used, the MIPS requirement may increase by three times. Almost all motor control applications need multiple control loops, like PID control, which will increase the computational requirements. However, if control is implemented in hardware, this will keep CPU loading to zero. Therefore, the MIPS requirement for a complete motor control application can be in 5 to 10 MIPS range while with a hardware-based approach it can be kept to 0. The programmable logic based implementation is highly reusable and doesn't introduce any integration issues.


Click on image to enlarge.

For better views of Figures 6 and 7, click on the figures below:


Click on image to enlarge.


Click on image to enlarge.
The programmable digital logic requirement to implement control of one motor is very low. Therefore, it's possible to implement control for multiple motor simultaneously when commutation logic is in hardware. If the same is done using a traditional approach, the MIPS requirement will multiply as well due to the added complication that two interrupts cannot be processed at same time. Also, to keep interrupt response time reasonable, the CPU will need to run much faster than the minimum speed requirement, thus increasing power consumption as well. With programmable logic, a single MCU/SoC device can easily implement four BLDC motor controllers. To do the same with MCU firmware, the MIPS requirement could be above 100 MIPS.

As this article has shown, MIPS does not represent an MCU/SoC device's true capacity to solve system-level problems. If device has all of the mentioned capabilities, what will be effective MIPS of the device: 200 MIPS, 500 MIPS, or 1,000 MIPS? In all cases, MIPS just becomes a number that means very little.

So how do developers identify the best device for their application? Unfortunately, the answer is not simple:

  • Identify areas in your application with critical timing or CPU performance requirement.
  • Check if the MCU/SoC vendors provide an application note or example project similar to your application. If yes, this may provide some guidance on how far you can optimize the application for a given MCU/ SoC. If not, try to identify potential ways to implement the application with the given architecture and what hardware features are available for you to take advantage of.
  • Do a rough estimate on MIPS requirement as shown in examples. Your calculations don't need to be exact. Rather, you're trying to identify a potentially large gap. For all of the examples shown, the difference in performance was large enough that precise calculations were unnecessary.
  • If the performance gap is small, say on the order of 10 to 20%, but the task is a major component of the application, the only option is to create a specific implement using the vendor's development kit to measure the actual performance gap.

If you're going to buy a large quantity of devices, these requirements can be part of your RFQ (request for quote). This allows vendors to provide information about device performance specific to your particular application.

Gaurang Kavaiya received his BSEE degree from Gujarat University in India.  He has 15 years of embedded design experience and is currently working as PSoC applications director at Cypress.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.