Achieving maximum motor efficiency using dual core ARM SoC FPGAs -

Achieving maximum motor efficiency using dual core ARM SoC FPGAs

Editor's note: Michael Parker of Altera Corp. describes how to use an ARM/FPGA-based SoC for real-time machine control algorithm operations. He also describes how the SoC can function as a network processor to link up to the real-time network protocols used in many industrial automation applications.

With the integration of dual ARM A9 CPU cores, a complete set of ARM peripherals, the ability to implement in hardware either fixed and floating point signal processing, and unmatched I/O flexibility, the latest FPGA system-on-chip devices can perform what used to require a complete circuit card containing dozens of chips. A perfect example is next-generation motor control.

To get maximum motor efficiency, very fast control loops are used that exceed what processor-only based solutions can implement. The inner control loops, implementing what is known as field-oriented control (FOC), require transforms best performed in floating point. Altera’s low-cost SoC device family, based upon the popular Cyclone architecture, is ideal for this.

One of the ARM A9 cores can implement sophisticated outer loop PID motor control algorithms and apply fine tuning to the inner loop operation. The other can act as a network processor, connecting up to proprietary real-time networking protocols commonly used in industrial applications. Both processors can be used to implement safety critical functions. If the designer needs a different task partitioning between the ARM based software and the FPGA based hardware implementation, it is a simple matter to download new firmware images.

Motor Commutation Background

Electric motors rely on basic electro-magnetic principles. Forces due to magnetic attraction and repulsion are used to generate torque, and the resulting rotary motion of the motor.

Magnetic fields are generated in both the rotor and the stator, either through use of permanent magnets or electromagnets. Proper alignment of the magnetic fields as the rotor turns is required. As shown in Figure 1 , when the magnet forces are aligned such that the opposite magnetic polarity fields of the rotor and stator are as close as possible, and the same magnetic polarity as far as possible, the motor will be at rest in a stable position. However, when the magnetic forces are arranged such that the attraction and repulsive forces are at right angles to the rotation to the motor, the motor will try to rotate to a more stable position, generating a torque proportional to the strength of the magnetic fields, which is turn will be proportional to the current flowing in the electro-magnets.

Figure 1: Torque Generation in a Motor

The magnetic fields must continuously change to maintain an alignment that produces torque throughout the full 360 degrees of rotation of the rotor. This is called commutation. The most basic method used is the DC (direct current) brush motor, shown in Figure 2 . This motor design has permanent magnets in the stator and one or more electromagnet pairs in the rotor.

Figure 2: Brush DC Motor

As the motor rotates, the electromagnets are energized using by the “brush” rotary contacts. Each rotor electromagnet is energized for some portion of the rotation angle of the rotor. This is aligned with the stator magnets to produce a useful torque across most of the rotor’s angular path. Note that each electromagnet is energized in both polarities, as the DC current will flow in both directions of the coiled rotor wire at 180 degree opposed rotation angles. The DC brush motor uses a very simple and inexpensive mechanical commutation, although it has the disadvantage of producing sparking at the brushes, brush wear necessitating periodic brush replacement, and producing torque ripple, as shown in Figure 3 .

Figure 3: Brush DC Motor Torque Ripple

The reason for the torque ripple is that optimal geometric alignment of the magnetic field cannot be maintained throughout the rotor rotation angle.
In order to eliminate the sparking and wear of brushes for mechanical commutation, the motor design can be turned inside out by placing the permanent magnets on the rotor, and the electro-magnets in the stator. Replacing mechanical with electronic commutation, the same effect can be achieved, and is known as a brushless DC motor, shown in Figure 4 .

Figure 4: Brushless DC Motor

The commutation is performed electronically, using an H bridge circuit shown in Figure 5 . The transistors are enabled in diagonal pairs, with the upper left paired with the lower right transistor, and the upper right paired with the lower left transistor. Enabling each pair in turn can drive current in either direction through the motor winding. Disabling all transistors will produce no motor current.

Figure 5: H Bridge Drive Circuit

Normally, rotation sensing is used to detect the rotor angle and control the timing of the electronic commutation. Alternatively, the motor can simply be commutated at some reasonable rate, and if it is known that the torque is sufficient for the motor load, the rotor will move synchronously with the commutation rate.

Electronic commutation will not eliminate torque ripple and the associated inefficiencies. A more sophisticated commutation is to use non-binary currents, such as sinusoidal current waveforms. This can be generated by high frequency electronics, using PWM (pulse width modulation), as shown in Figure 6 .

Figure 6: PWM Commutation

This leads to the concept of field orientated control (FOC) used in modern motor control applications, where the magnetic fields within the motor are precisely and optimally controlled using digital methods.

Motor Control Background
With DC motors, the torque output of the motor is approximately proportional to the current. The current is used to control the motor torque, speed, and position as required by the application. The motor control circuit needs to be able to react in a timely manner compared to the rate of mechanical changes occurring in the system using the motor. For example, this could be an arm on an industrial robot or CNC (computerized numerical control) machining equipment.

A common structure used in control theory is the PID (proportional-integral-derivative) controller. This controller compares the desired value of the control variable, or setpoint, to the actual measured value, which is the error signal. The controller then outputs a response to reduce the error, or to drive the measured value closer to the desired value, as shown in Figure 7 . It is a recursive system with a feedback loop. The response time and latency of the feedback loop determine how fast the controller can react to changes in the process or system. In a motor control system, the setpoint is frequently the speed or position of the system.

Figure 7: PID Controller

The error signal is used to create an output that will cause the system or process variable to be closer to the desired variable, in a negative feedback loop. The simplest system is to use an output proportional to the error. When the error is large, the output driving the system is also large. A further enhancement is to use an integral term. The integral term can be used to eliminate long term lag, where the output approaches but never quite reaches the desired value, because at some point the proportional based output signal becomes very small. A derivative based term can be used to increase controller response time. The controller is now reacting to the speed at which the error signal is changing. If the error happens to increase, the controller will react more quickly. The use of the derivative term can cause instability or oscillations if not used carefully. The value of the gain for of each of the three legs will determine the response time and stability of the system.

Click on image to enlarge.
Figure 8: PID Controller Response

In a motor controller, response will depend upon many electrical and mechanical factors. An optimal response will have as rapid a response as the system is capable of without creating overshoot, oscillations, or instability. The controller will also mask the inefficiencies of non-ideal commutation by driving the motor harder to achieve the desired response

Field Oriented Control (FOC)

In most industrial applications, the lifetime cost of the energy used in a motor far exceeds its original or capital cost. The object of FOC is to ensure the magnetic fields are precisely orientated at all times to produce maximum torque, eliminate torque ripple, and thereby increase motor efficiency and lower the cost of ownership of the system.

In a DC brush motor, the control loops can simply drive the motor current variable, and the brushes perform the mechanical commutation function, although in a sub-optimal fashion. With FOC, sinusoidal commutation will be performed electronically in an optimal fashion, using integrated control loops to maximize the magnetic field components producing useful torque and minimize magnetic field components that do not yield torque (perhaps merely exerting force on the motor bearings for example).

In the diagram below, PI controllers are used rather than PID (the derivative term is not used). The FOC function uses Park and Clark transforms as well as PI control loops for torque (useful oriented magnetic field direction) and flux (magnetic field direction producing no torque). Notice the setpoint or desired flux setting is equal to zero for the motor controller shown in Figure 9 (in applications such as AC induction motors, this would not be zero).

Click on image to enlarge.

Figure 9: FOC based Motor Control System

Motors normally use three independent phases. These phases can be replicated around the stator, giving the number of poles or windings. In a three phase system, the sum of the currents is defined as zero by Kirchhoff’s law. This means that a three phase current vector (a, b, and c components) can be expressed as two orthogonal phases (α , β components), using the Clark transform shown in Figure 10 .

Figure 10: Clark Transform

Clark Transform equations:

Reverse Clark transform equations :

The Clark transform is valid at a given rotation angle of the rotor. Using the Park transform, these current vectors are mapped onto the rotating plane of the spinning motor. This results in the α, β components being mapped into q (quadrature) and d (direct) components. This transform requires the rotor angle. The rotor angle θd is the input, often determined by a quadrature encoder attached to the rotor shaft. Therefore, the Clark and Park transforms need to be continuously calculated as the motor rotates.

Figure 11: Park Transform

Park Transform equations:

Reverse Park Transform equations :

The benefit is that the outer control loops for speed and position no longer need to be concerned about the much higher frequency current waveforms. The commutation is decoupled from the motor control. In fact, in a steady state system (constant speed and load), the d and q vectors become simply DC values.

Interrupt, Sampling, and Latency Requirements
While motor control applications are mechanical, the rate at which the drive circuits must be updated and the current, position, and speed sensors read can be quite high. A reasonable scenario might be a motor operating at maximum of 12,000 RPM, or 200 revolutions per second. If we use a rule of thumb that a minimum of 80 samples is needed to generate a well shaped sinusoidal current waveform, then the required sample rate will be the equal to:

motor speed (rev per sec) x 80 x number of motor pole pairs

The number of motor pole pairs is the number electromagnetic windings in the stator. For a motor at 12,000 RPM with eight windings or pole pairs, this works out to a sampling rate of 128,000 per second, and a processing latency of 7.8 microseconds.

Each application will be different, but often the system must be designed such that it can acknowledge and process all interrupts and update the motor drivers within 5 microseconds or less. The process is to sample the feedback motor currents, position, and speed, use FOC to calculate updated motor current driver values at a rate of 200,000 per second with a 5 microsecond processing latency. For a processor based system, this implies an interrupt rate of 200 kHz, which can be a significant challenge for general purpose processors with caches, operating systems, and non-vectored interrupt controllers.

Heterogeneous Processing System
Motor control systems are typically implemented using a combination of hardware and software. The hardware would typically be the motor drive circuits, power supplies, sensor interfaces, and analog to digital convertors and associated interfaces. A fast real-time processor or DSP (digital signal processor) can be used to implement the controllers and FOC. Even so, total control system processing times or latencies tend to be tens of microseconds, often on the order of 20-50 microseconds. A second processor is often used for networking or communication interfacing of the system to the outside world.

Use of a SoC FPGA system can allow a more optimal and flexible implementation. The SoC contains two 800 MHz ARM Cortex A9 microprocessor systems. One can typically be used for networking purposes, and the second as a system controller. The use of separate processors can simply software design, as well as allow optimal choices of operating systems for the task list of each processor.

The ARM A9 processor is a more general purpose, high performance processor, but it is not optimized for demanding real-time applications with guaranteed response times. However, real-time performance limitations can be mitigated by taking advantage of the integrated programmable logic of the SoC FPGAs. The programmable logic can be used for PWM drive circuits, versatile interfaces to any ADCs and DACs, position and speed sensor interfaces, safety cut-off circuits, proprietary network or MAC hardware interface and more. It can also implement the FOC and control loops with less latency and much faster response time than is possible using processor-only based systems. Typical system response time is under 2 microseconds, which is about an order of magnitude faster than most processor-only systems can support.

Alternatively, in systems that do not require such a low latency in absolute terms but have significant signal processing requirements (e.g. vibration suppression), the additional computational bandwidth afforded by the FPGA can support more sophisticated motor control algorithms within the given required response time for enhanced efficiency and extended motor lifetime, thereby further reducing the total cost of ownership over the lifetime of the system.

Hardware Implementation of FOC
A design tool such as DSPBuilder from Altera can be used to take a Simulink representation of a design and implement it directly into FPGA logic. This tool allows for the same testbench used with the MATLAB/Simulink simulation to also be used to verify the FPGA logic. DSPBuilder also generates floating point logic, just like the simulation, to provide greater dynamic range and numerical stability than fixed point implementations allow. The resources used to implement in the smallest SoC FPGA are shown in Table 1 .

Table 1: Hardware resources to implement in SoC FPGA (5CSEA2)

All interfaces to the FOC blocks are memory mapped by the Altera’s DSPBuilder and Qsys tools into the ARM processor system. This allows for software control of the desired torque output, for the gains of the PI loops controlling torque and flux, for monitoring operations, and other functions.

Another option of the heterogeneous SoC FPGA, not explored here, is to use a Nios II processor to handle the high interrupt rates. The Nios II is a soft processor, build from logic cells, and available in any Altera FPGA. It is specifically designed to serve as a real-time embedded microprocessor, and can easily accommodate 200 kHz interrupt rates. In this case, the FOC processing would likely need to be implemented in fixed point.

Hardware Implementation of Control Loops
The position and speed control loops may also be implemented in logic, similar to the PI control used in the FOC function. The gain stages of the different PID circuits can be updated using memory mapped registers to the ARM processor. If needed, this can provide extremely low latency and fast response at the same rate as the commutation function. Sub-5 microseconds total response of the control and FOC is easily achievable, which can provide more stability for very fast reacting systems. Just as in FOC, these control loops can be implemented in hardware from Simulink models, using DSPBuilder to provide floating point hardware in the FPGA logic.

Another consideration is that multiple axes of motor control can usually be added to the FOC hardware with small increase in logic size and processing latency. A four-axis motor controller may have 20% higher logic use and minimal latency increase over a single-axis implementation. In a software-based implementation, the latency will scale in a roughly linear fashion.

Software Implementation of Control Loops
Motor control loops are most commonly implemented in software using C code. Due to the effect of the FOC isolating the control loops from the motor commutation, the interrupt rates and latencies of the motor control can be determined by the response of the system, which is normally much slower than the PWM commutation circuits. Simulation and analysis are required to determine the minimum update rates for expected performance and stability; however an update rate of 10 kHz is normally adequate. In this case, the torque settings to the FOC would be updated at this lower rate (every 100 microseconds), based upon feedback and calculation of the current position and speed information.

The software implementation has the advantage of allowing more sophisticated and dynamic control algorithms, as well as a more familiar environment to many control engineers. If using a Simulink-based control algorithm, Mathworks offers tools for automated code generation.

This SoC approach allows for the hardware to be used for the FOC “inner loop”, and the ARM A9 for the motor control “outer” loop. The inner loop can guarantee latency under 2 microseconds in the FPGA hardware. We will see in the analysis below that the ARM can achieve very reasonable processing latency under most conditions, and is ideal for out-loop processing where interrupt latency requirements are much more relaxed, and the consequences of missing an interrupt are not catastrophic.

Analysis of ARM A9 Interrupt Latency
The following is based on an analysis by the Altera SoC engineering team of real-time system performance of the ARM processors contained in low-cost SoC FPGAs, specifically interrupt latency. Interrupt response time can vary with system configuration, operating systems, cache configuration, and processing task load.

The following scenarios were considered:

  • Stage 1: Highly constrained system on a single core to establish a benchmark for the lowest latency that can be achieved. To measure the maximum achievable performance this stage did not use an OS and the tests are implemented using bare-metal drivers.
  • Stage 2: Similar to Stage 1, but the tests are implemented using an RTOS (µC/OS-II) and repeating the measurements over a number of iterations (1024) to get a statistical model of the interrupt latency.
  • Stage 3: Similar to Stage 2 with multiple background tasks running on the RTOS alongside interrupt latency measurements. This background task writes data onto the UART port in an infinite loop, and in addition, another background task is performing memcpy in the external memory in an infinite loop, which generates AXI read/writes that may require a long time to complete.
  • Stage 4: Emulation of a larger system where not all code can fit within the L1 cache. In this case the critical code is placed in the L2 cache. Since the cache is likely to be large enough to hold the whole of the background processing code, after each interrupt L1 cache flush is triggered. The objective here is to measure the interrupt latency achievable when using the SoC-FPGA with an RTOS (Real-Time Operating System) in a real-world environment where cache flushes play a significant role in system performance.

Cache (L1 & L2) has the most influence on the interrupt latency. A stage 4 situation, with the instruction code locked in L2, the latency is significantly lower than the same system running from external memory. With L1enabled, it is lower still. However, the latency will be much worse in the stage 3 and 4 cases due to the background tasks that cause the ISR code to be replaced with other background tasks.

Table 2: Measured Interrupt Response Latency (microseconds)

The interrupt response latency needs to be added to the processing time of the FOC. A basic FOC algorithm was benchmarked, including trigonometric functions, Clark/Park transforms, PID controller, and inverse Park/Clark transforms.

Table 3: Measured FOC Latency (microseconds)

Due to the processor performance of the ARM A9, a software-based FOC controller has been shown to be able to meet the 5 microsecond requirement. However, there is little margin, and interrupt response times are often the most non-deterministic part of the system. While the results shown are the longest latencies seen under these test conditions, much longer times may occur on occasion, due to the statistical nature of the processing task load. However, in the case where interrupts are occurring at rates on the order of 10-20 kHz, or 50-100 microseconds, it appears that the ARM Cortex A9 can meet the real-time requirements with extremely high margins, and very low probability of not servicing interrupts in a timely manner. By isolating the ARM interrupts from the much higher speed FOC commutation requirements, real-time performance is virtually assured under a wide variety of operating conditions and processing loads.

The use of SoC FPGAs for real-time applications such as motor control provides not only integration benefits, but also the ability to scale performance as needed. Leveraging the logic resources and advanced floating point synthesis tools of Altera FPGAs, the integrated 800 MHz ARM A9 processors can be used to meet the requirements of demanding real-time embedded applications.
This approach allows high rate, deterministic functions (“inner loop”) to be implemented in hardware, while lower rate, more dynamic and complex processing (“outer loop”) can take place in software, providing the best of both worlds to the system designer. The outer loop is much less sensitive to latency and interrupt uncertainty. This approach is applicable to a wide range of real-time embedded control problems, including motor controllers.

The SoC provides best in class latency, where required, using FPGA logic, while simultaneously giving best in class flexibility and throughput with 800 MHz ARM A9 processors. Additionally, both hardware and software portions of the implementation are field updatable through firmware downloads, providing maximum system product life and adaptability after actual deployment.

Michael Parker is principal architect for DSP product planning at Altera Corp., including the Variable Precision FPGA silicon architecture for DSP applications, DSP tool development, floating point tools and DSP and Video intellectual property (IP). He joined Altera in January 2007 and has more than 20 years of DSP wireless engineering design experience with companies such as Alvarion, Soma Networks, TCSI, Stanford Telecom and numerous startup companies. He holds an MSEE from Santa Clara University, and BSEE from Rensselaer Polytechnic Institute.

1. Altera SoC FPGA family information
2. Altera SoC FPGA processor sub-system information

2 thoughts on “Achieving maximum motor efficiency using dual core ARM SoC FPGAs

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.