One of the defining features of Industry 4.0 is distributed sensing. This latest iteration of industrial automation sees a dramatic increase in the sensor nodes used to monitor equipment and processes, all linked up to gateway devices in a complex industrial internet of things (IIoT).
These sensors are helping to improve the performance and reliability of industrial processes and ensure higher availability. Increased efficiency is a key motivator of this move to Industry 4.0, with reliable 24/7 operation and plants around the world all monitored from a single location. But this also requires more focus on the security of the network to ensure safe operation.
Machine learning algorithms in the gateways that collect data, or even in the sensor nodes themselves, are monitoring the performance of industrial equipment. The data from the sensors can be analyzed to identify problems with the hardware before the problems become critical.
However, what has been lacking from this IIoT roll out is the ability to do the same with software. These sensors and gateways are increasingly sophisticated processing units, with complex code. While the majority of this code may be mature and well tested, the implementation of Industry 4.0 systems requires new software. Adding in many thousands of sensors also dramatically increases the scale of the challenge to monitor software performance. With an exponential increase in data, corner cases of runtime errors that were not previously visible can occur. This can be as simple as a glitching cable that causes intermittent production problems or even the effect of malware causing a distributed denial of service (DDoS).
Being able to thoroughly test the interactions of thousands of sensor nodes in all kinds of different situations with varying latency and data rates, different machine learning algorithms and data collection architectures is just not possible. While digital modelling can help address this complexity, the real world can throw up unexpected results.
IIoT systems can be immensely complex. The sensors are not only monitoring the machines in operations but also the components and materials going into them as well as the liquid or gas chemicals used in the process. While traditional SCADA (supervisory control and data acquisition) systems manage these parameters in programmable logic controllers (PLCs) with high and low thresholds, the advantage of Industry 4.0 is to optimize these thresholds with regular changes, making the process more efficient with less waste. These PLCs are now adding machine learning algorithms, making key decisions closer to the equipment for lower latency and more efficient operation.
Enter the device feedback loop
However, this is creating an increasingly complex network of devices across the factory floor. And sooner or later, any of the distributed device nodes are likely to have unexpected issues. As the number of nodes increases, so does the likelihood of problematic behavior.
Knowing the performance of the different software in all kinds of different sensor nodes, as well as monitoring the software in each gateway and data collection system is a huge challenge that is driving the need for a DevOps-style device feedback loop.
The idea of the device feedback loop is to provide continuous device monitoring and automatic warnings. This gives developers awareness of potential problems before they become an issue on the production line. Ideally, the warning is early enough that a fix can be produced and distributed via an over the air update before the line is impacted. Such a feedback loop helps embedded developers improve software quality and product performance in order to support the reliability and scalability of industrial control systems.
Calling in the software agents
One approach to implementing a successful device feedback loop is to use a small, independent block of software, called an agent, in each node. This needs to be linked to the communications network and looks at the operation of the device’s hardware and software. This can include hardware exceptions and watchdog timer data, error codes from the real time operating system (RTOS), middleware or peripheral driver APIs, or capturing data in the event of a restart. If there is a failure, the agent can provide the detailed information on the status of the registers from the point of detection, with stack dumps and memory dumps that make finding the problem, and fixing it, much faster. The agent can even trace the execution of the software to show what happened just before an issue was detected, providing context for reproducing the issue.
But this is not just about analysis of software failures but also about pre-emptive warnings. Monitoring certain metrics such as the stack usage, heap usage, CPU load, storage, and application-specific metrics can create the conditions for proactive warnings, such as the “stack usage exceeds 95%”, that flags to the developer that a problem may be imminent.
The agent can also be used to improve the quality of the control algorithms and feed this back into the testing of the system. Alerts can be generated on any software event of interest, giving visibility of how the algorithms work in practice. This visibility is just not possible any other way and can help to improve and refine the performance of the system. These lessons can be applied to the roll out of new plants and the upgrading of brownfield sites.
Monitoring also critical for increased cyber security
Another key issue for industrial control is security. Most industrial systems have been isolated in closed loop systems. Industry 4.0 opens these systems up to the wider Internet and higher risks of compromise. An independent agent with a separate communications channel can monitor the performance of each node and gateway, looking out for and flagging unexpected operations. This can be used to boost the security of the whole system.
The latest Pipedream malware is deliberately targeting industrial automation and SCADA systems. This does not exploit a vulnerability but uses the inherent functions of the programmable logic controller (PLCs). As a result, the U.S. authorities are recommending using monitoring software to look for unexpected movement of data.
This monitoring needs to be highly automated to address industrial control applications with thousands of sensor nodes and gateways. The agent has to be independent and robust, surviving software failures in the node and providing a separate channel back to a centralized system where developers have full control, even if this means anonymizing data to keep the diagnostic data private.
The agent also has to be scalable, with a small footprint to fit into the microcontrollers that are used on sensor nodes as well as the Linux engines in the gateways. But the data also has to be usable. With many thousands of nodes across an Industry 4.0 implementation, automated alerts are essential to highlight problems, with direct links with the relevant data to visual traces in the code that can quickly and effectively show what happened. This gives the developer the fastest route to a solution, with an over the air update route to make the changes, ideally before a problem actually happens and certainly before it is noticed.
Johan Kraft is CEO and founder of Percepio AB. Prior to founding Percepio in 2009, he worked in embedded software development at ABB Robotics. He is the original developer of Percepio Tracealyzer, a tool for visual trace diagnostics, that combines with Percepio DevAlert, a cloud service for monitoring deployed IoT devices, to create a DevOps-style feedback loop from deployed devices back to the software developers, allowing for rapid continuous product improvement. He holds a PhD in computer science.
- Software tracing in field-deployed devices
- Cloud-native DevOps service enables management of edge IoT devices
- 3 tips for decreasing time spent debugging
- Evaluating a Yocto-based Linux system using visual trace diagnostics