IoT enables continuous monitoring of environments and machines using tiny sensors. Advances in sensor technologies, microcontrollers, and communication protocols made mass production of IoT platforms, with many connectivity options, possible at affordable prices. Due to the low cost of IoT hardware, sensors are being deployed on a large scale at public places, residentials, and on machines.
These sensors monitor the physical properties associated with their deployment environments, 24/7, and generate a huge amount of data. For example, accelerometers and gyroscopes deployed on a rotating machinery are constantly recording the vibration patterns and angular velocity of the rotor attached to the shaft. Air quality sensors continuously monitor the gaseous pollutants in the air, indoors or outdoors. Microphones in a baby monitor are always listening. Sensors inside smart watches constantly measure vital health parameters. Likewise, various other sensors like magnetometer, pressure, temperature, humidity, ambient light, etc., measure physical conditions wherever they are deployed.
Machine learning (ML) algorithms enable the discovery of interesting patterns in this data, which are beyond the comprehension of manual analysis and inspection. The convergence of IoT devices and ML algorithms enables a wide range of smart applications and enhanced user experiences, which are made possible by low-power, low-latency, and lightweight machine learning inference, i.e., tinyML. Many industry verticals are being revolutionized by this convergence as articulated in Figure 1, including but not limited to wearable technologies, smart home, smart factories (Industry 4.0), automotive, machine vision, and other smart consumer electronic devices.
tinyML with Automated Machine Learning
ML algorithms deployed on tiny microcontrollers (MCUs) in IoT devices are particularly of interest due to multiple advantages:
- Data privacy and security: ML inference happens on the local embedded microcontrollers, instead of having to transmit data streams to the Cloud for processing. The data remains on-device and on-premise, where it is private and secure.
- Power savings: tinyML algorithms consume much less power due to no/little transmission of data.
- Low latency and high availability: Since inference is performed locally, latency is on the order of milliseconds and not subject to network latency and availability.
click for full size image
Figure 1: tinyML Adds Advanced Functionalities to Traditional IoT Devices (Source: Qeexo)
Automated machine learning using sensor data involves the steps articulated in Figure 2. Configuration of sensors and collection of quality data for the target ML application are completed prior to these steps. An automated machine learning platform such as Qeexo AutoML manages the entire workflow for building lightweight and high-performance machine learning models for Arm Cortex-M0-to-M4 class MCUs and other constrained environments.
click for full size image
Figure 2: Qeexo AutoML Workflow (Source: Qeexo)
tinyML with ARM® Cortex™ M0+ Architecture
The proliferation of IoT technologies and the large-scale deployment requirements of sensors are further pushing the boundaries of microcontroller architectures and machine learning compute. For example, Arm Cortex M0+ MCUs running at 48 MHz are widely used on sensor boards designed for IoT applications due to their low power consumption profile. It draws only 7 mA per I/O pin compared to the Cortex M4 version that runs at 64 MHz and draws 15mA.
The low power consumption of Cortex-M0+ MCUs rating comes at the cost of reduced memory and compute profile. M0+ MCUs can only perform 32-bit fixed point mathematical operations, do not have saturation arithmetic support, and lack the DSP capabilities. Based on this MCU, the Arduino Nano 33 IoT, one of the popular IoT platforms, comes with only 256 KB of flash and 32 KB of SRAM. In contrast, a popular sensor module with Cortex M4 architecture, the Arduino Nano 33 BLE Sense can do 32-bit floating point operations, has DSP and saturation arithmetic support, as well as four times the flash and eight times the SRAM.
Deployment of machine learning algorithms on the M0+ is orders of magnitude more challenging compared to deployment on an M4 due to these three main challenges:
- Fixed-point compute: Typical machine learning with sensor data involves digital signal processing, features extraction, and running inference. Extraction of statistical and frequency-based (e.g., FFT analysis) features from sensor signals is crucial for the development of high performing machine learning models. Sensor data streams representing real-world physical phenomena are non-stationary in nature. Generally speaking, the better the information extracted from non-stationary sensor signals, the better the opportunities are for developing ML models with high performance. Performing mathematical operations in fixed-point representation while maintaining commercial grade precision and performance is challenging. Fully fixed-point machine learning pipeline begins with sensor data representation and runs all the way up to model inference for generating classification/regression outputs.
- Low memory capacity: 256 KB of flash and 32 KB of SRAM put hard restrictions on the size of the machine learning models and runtime memory these models can utilize during execution. Real-world machine learning problems often have complicated decision/classification boundaries represented by machine learning models with large number of parameters. For tree-based ensemble models, solving such complicated problems may result in deep trees and large number of boosters, affecting both the model size and runtime memory. Reduction in model size often comes at the cost of sacrificing model performance – generally not the most desirable criterion to trade off.
- Low CPU speed: Low latency has always been a key metric when selecting a model for commercial deployments. The 16 MHz clock speed we sacrifice on a 48 MHz M0+ architecture compared to 64 MHz M4 architecture makes a big difference when it comes to millisecond-level latency measurements.
AutoML M0+ Framework
Developed to address these challenges, Qeexo AutoML provides a fixed-point machine learning pipeline, highly optimized for the Arm Cortex M0+ architecture. This pipeline includes handling of sensor data in fixed-point, fixed-point feature computation, and fixed-point inference for the tree-based ensemble algorithms such as Gradient Boosting Machine (GBM), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost) algorithms. Qeexo AutoML encodes the ensemble model parameters in very efficient data structures and combines them with an interpretation logic which results in extremely fast inference on the M0+ target. Figure 3 articulates the fixed-point machine learning pipeline developed by Qeexo for the Arm Cortex M0+ embedded target.
click for full size image
Figure 3: Qeexo AutoML M0+ Inference Pipeline (Source: Qeexo)
Qeexo AutoML performs patent-pending model compression and quantization to further reduce the memory footprint of the developed ensemble models without compromising classification performance. Figure 4 describes the Qeexo AutoML training process for the Cortex M0+ embedded target.
click for full size image
Figure 4: Qeexo AutoML M0+ Training Pipeline (Source: Qeexo)
Intelligent pruning allows compression of models without loss in performance. Put in simple terms, Qeexo AutoML first builds a full-sized ensemble model as recommended by a hyper-parameter optimizer and then intelligently selects only the most powerful boosters.
This approach of growing a bigger model and then intelligently pruning it for the target deployment is much more effective than building a smaller model in the first place. An initial bigger model gives an opportunity to select high performance boosters (or trees) which ultimately results in better model performance.
As shown in Figure 5, the compressed ensemble model is about 1/10th the size of the full model while having higher cross-validation performance. (X-axis represents the number of trees (or booster) in the ensemble model and y-axis represents the cross-validation performance.) Note that our Qeexo AutoML intelligent pruning method selects only the 20 most powerful boosters, resulting in 90% compression in model size.
click for full size image
Figure 5: Qeexo AutoML Intelligent Model Pruning (Source: Qeexo)
Ensemble Model Quantization
Qeexo AutoML performs post-training quantization of ensemble algorithms. Post-training quantization is a commoditized feature for neural-network-based models and is supported out-of-the-box in frameworks such as TensorFlow Lite. However, quantization of ensemble models is Qeexo’s patent-pending technique that can reduce the model size even further while improving the MCU-level latency with little to no degradation in model performance. Qeexo AutoML M0+ pipeline generates fixed-point ensemble models represented in 32-bit precision. Additional options for 16-bit and 8-bit quantization can further reduce models by ½ and ¼, respectively, with 2x to 3x speed-up.
Example Use-Cases of tinyML
What are some tinyML applications or use-cases? There are limitless possibilities and here we highlight a few:
- We want to make a smart, AI-enabled wall which users can tap on to control lighting (turn ON/OFF and change the intensity of the light). We can define the hand gestures associated with ON/OFF and intensity control, then collect and label the gesture data using an accelerometer and gyroscope module attached to the back of the wall. With this labeled data, Qeexo AutoML can use AI algorithms to build a model to detect “Knock” and “Wipe” gestures on the wall to control lighting. In the video below, you can see a prototype smart wall developed by Qeexo AutoML within minutes.
- Using machine learning and IoT, we want to ensure that shipments are handled with extreme care as per the shipping guidelines. In the video below, you can see how an AI-enabled shipping box is able to detect how the shipment has been handled from source to destination.
- The convergence of AI with IoT can also make smart kitchen countertops. The video below shows Qeexo AutoML built models to detect various kitchen appliances.
- Machine monitoring is one of the most promising use-cases of tinyML. Multiple machine fault patterns are being detected in the video below.
- Anomaly detection is another scenario that benefits greatly from machine learning. Often, it is difficult to collect data for various faults in an industrial setting, while it is relatively easy to monitor the healthy operating state of the machine. Just by observing the healthy operating state, Qeexo AutoML algorithms can develop AI systems for anomaly detection as shown in part 1 (below), part 2, part 3 and part 4.
- Activity recognition using sensors embedded in wearables is another use-case benefitting our daily lives. The video below demonstrates building an activity recognition solution using Qeexo AutoML within minutes.
|Dr. Rajen Bhatt is the Director of Engineering at Qeexo, leading a team of Machine Learning engineers to develop revolutionary ML platforms and products. His broad areas of expertise include machine learning, computational intelligence, computer vision, and product engineering. Dr. Bhatt has authored more than 35 peer-reviewed papers, a book on Pattern Classification Algorithms, and is the inventor/co-inventor of 20 granted patents in India, USA, and South Korea. Dr. Bhatt is an alumni of the Indian Institute of Technology Delhi, Senior Member of IEEE, a certified Product Engineering Leader, and has worked for Samsung and Bosch Research Centers in India, USA, and South Korea prior to joining Qeexo.|
|Tina Shyuan is Director of Product Marketing at Qeexo, where she helps businesses apply Qeexo AutoML to build innovative solutions using sensor data. She has a passion for building and launching cutting-edge machine learning technologies and has launched many successful ML products. Tina is an advocate for running machine learning at the Edge, and actively contributes to the tinyML community. She holds an MBA from Columbia University and a BS degree in EECS from UC Berkeley.|
- Applying machine learning in embedded systems
- Arm cores designed for TinyML devices
- Squeezing AI models into microcontrollers
- Microcontrollers take on growing role in edge AI
- Microcontroller architectures evolve for AI
For more Embedded, subscribe to Embedded’s weekly email newsletter.