Enabling a secure edge in cloud-based AI with Google Anthos: embedded

Enabling a secure edge in cloud-based AI with Google Anthos bare metal

Deployment of Google Anthos bare metal can help enable deployment of any containerized service to the mission critical edge without compromising security or performance. This article takes a look at how services such as Google Cloud Visual Inspection AI service can be implemented to provide a validated solution for secure, video-based quality inspection.

Advertisement

The deployment of artificial intelligence (AI) in industrial settings, often in association with machine vision, is growing rapidly. In this context, it is increasingly difficult to separate out functions that take place in the cloud and those carried out at the edge: the two worlds are becoming completely interdependent, a transformation creating huge challenges for security. This article describes how developers might address these security risks.

The current state of industrial automation

To understand both the benefits and challenges of the deployment of real-time image capturing in an industrial context, let’s look at VHIT, a Bosch spinout, that creates lubrication and vacuum pumps for electric, petrol and diesel vehicles. VHIT ships about six million pumps annually, which historically would have required workers performed visual inspections of the filters. The introduction of a camera system improved the scrap rate, but the company was still experiencing a false positive rate (i.e. filters that were deemed to be faulty but which were perfectly good) of 2.3% (138,000 units). The vision system was upgraded to include AI whereby an updateable database of images incorporating good and bad parts was used to create an inference model to make decisions in the factory. This approach has dropped the false positive level to 0.2%.

Figure 1 anthos-deployment-on-lynx-secure-edge-Image 2
Figure 1: The program captures data from cameras and harnesses machine learning algorithms to identify quality issues.

The program captures data from cameras on manufacturing plant floors and logistics warehouses and harnesses machine learning algorithms to identify quality issues and feed information into the manufacturing execution system (MES), in order to generate an optimal decision in real time (figure 1). When securely connected to the cloud, the system benefits from continued access to advanced artificial intelligence algorithms and data analytics packages. The operative word in that sentence is ‘securely’.  Since these systems are critical to the manufacturing process, they need to be protected against hacking and the malfunction of another program running on the same hardware.

How can this be achieved?

The current security context

Virtualization technology, whereby multiple operating systems can be run on shared hardware, is extremely well understood if somewhat inefficient in its use of resources. Just a few decades ago, everyone used virtual machines (VM) to host and manage the infrastructure. More recently, industries have shifted towards using containers with systems such as Docker and Kubernetes.

The original virtualization architecture system was based on the implementation of a number of VMs. Every VM has to run its own instance of an operating system, resulting in a duplication of responsibility. It is also hard to manage such an infrastructure as there are multiple servers which are all independent virtual machines.

Containers try to achieve the same concept as virtual machines but eliminate duplication of effort between machines. Instead of loading an entire operating system for an app, Docker lets containers use the kernel of the host OS while allowing them to sideload app-specific libraries and programs. By adjusting the container and its image, it is possible to fine-tune the specific libraries and configuration your app will use. This results in performance gains without the overhead of running an entire OS.

A modern application consists of many containers. Operating them in production is the job of Kubernetes. Since containers are easy to replicate, applications can auto-scale: expand or contract processing capacities to match user demands.

One of the challenges behind containers relates to security. Effectively they must be harnessed with a zero-trust approach. Consequently, their use in mission critical environments requires a “least privilege” approach, whereby applications are given the minimum of system resources required to execute their task and strong isolation between applications to give plant or facility managers confidence the solution meets the OT security, availability and performance requirements.

Google Anthos bare metal support enhances security

The ability to support deployment of Google Anthos bare metal creates a solution allowing any containerized service to be deployed to the mission critical edge without compromising security or performance. For example, software services from the cloud such as Google Cloud Visual Inspection AI service can be implemented to provide a validated solution for secure, video-based quality inspection in industrial and energy facilities.

Google Anthos bare metal support means that an entire Kubernetes cluster can be run locally in as little as one hardware system at the edge, with enterprise grade Kubernetes and workload management, fully managed service mesh with built-in visibility and a consistent development and operations experience for on-premise and on-cloud deployments. In deploying this capability, Lynx Software Technologies has enabled ‘virtual air gapping’, providing isolation between the different parts of the system.

Previously, the merging of operational technology (OT) and IT worlds – training AI and machine learning models in the cloud and deploying cloud-based workloads at the edge – posed security challenges in mission-critical industrial environments. Lynx ensures the three functions – image capture (camera), insight via inference engine (Google Anthos) and the action with a supervisory controller – are completely sandboxed with the option of secure one way (data diode) connections between them.

For visual inspection (VI), the model generation could be an on-cloud service. Labeled data would be generated on-premise and fed to the VI model generation service in the cloud. Alternatively, a hybrid cloud service could be deployed, whereby the VI model generated on-cloud could be deployed within an on-prem Google Anthos environment to perform image inferencing at the edge. There is also the possibility of an all on-premise solution with both the VI model generation and the image inferencing services provisioned on-premise.

Google Anthos deployment model

The solution makes use of Google Anthos as a managed application platform that enables organizations to run Kubernetes and its associated workloads across multiple public clouds, hybrid-clouds and on-premise compute clusters. What does deployment of this platform at the mission critical edge look like? The primary building blocks of a typical mission critical edge deployment include:

  • Platform software – deployed on the target systems (or node) and allows for multiple workloads to be hosted on the target systems.
  • Controller software – deployed (mostly) on-premise to manage various nodes.
  • Application framework – provides for a consistent control plane for end-user workloads (served as either standalone applications or as containers).
  • Workloads –software that end-users deploy onto the application framework above.

Anthos clusters on bare metal support three deployment models to meet different requirements: standalone cluster deployment, multi-cluster deployment and hybrid cluster deployment. While all three Google Anthos deployment models are relevant to the mission critical edge, we will focus here is standalone cluster deployment, which has a single Kubernetes cluster to support both the admin and the user cluster functionalities. A Google Anthos user cluster is a Kubernetes cluster that runs user workloads, whilst an admin cluster manages user clusters.

A standalone deployment model requires both control plane and worker nodes. The block diagram (figure 2) offers shares a high-level view of a standalone Google Anthos cluster running on a separation kernel hypervisor, LynxSecure, within the LYNX MOSA.ic software framework.

Figure 2 anthos-deployment-on-lynx-secure-edge-
Figure 2: A high-level view of a standalone Google Anthos cluster running on LynxSecure, a separation kernel hypervisor.

Five VMs are setup to handle specific tasks: four Google Anthos cluster VMs, and a fifth VM for device management.

The four Google Anthos cluster VMs are:

  • 1 control plane Kubernetes cluster node (as a VM) – no high availability support.
  • 2 worker Kubernetes cluster nodes (as VMs) – includes support for high availability.
  • 1 Workstation VM – used to provision the control plane and worker nodes.

The fifth VM for device management handles the incoming management requests. Typically, this is used in conjunction management software (either a company’s proprietary backend infrastructure or third-party technology such as ServiceNow).

The strict isolation provided by LynxSecure means the individual VMs operate in their individual fault zones. The cluster VMs (control plane and worker nodes) and the workstation VM are connected via virtual-ethernet links (implemented over managed shared-memory). Although the VM that hosts the device management has access to the Lynx management center, it has no internal connectivity to the cluster VMs. This arrangement, coupled with the strict isolation guarantees provided by LynxSecure separation kernel hypervisor, ensures that the Google Anthos workloads are practically disconnected from the management plane activities.

Conclusion

For industrial and energy companies that are already feeling the strain of supply chain disruptions, labor shortages and more, video-based quality systems enhanced with AI can play a significant role in improving performance and quality of output. As the VHIT example demonstrates, efficient visual inspection can enable defects to be reduced by up to 10x, defective parts prevented from being shipped out, and insights can be gleaned into the cause of any defects to optimize processes.

However, the security risks associated with these deployments are real and significant. Previously, the merging of operational technology (OT) and IT worlds – training AI and machine learning models in the cloud and deploying cloud-based workloads at the edge – posed security challenges in mission-critical industrial environments. In this article, we’ve been able to show how technology now exists to mitigate these risks and create air gapped implementations that fully deliver the benefits whilst minimizing the risk of security breaches due to software malfunction or external attack.


Ian Ferguson, Vice President of Marketing and Strategic Alliances - Lynx

Ian Ferguson is VP of sales and marketing at Lynx Software Technologies. As such, he is responsible for all aspects of the company’s outward facing presence of to its customer, partner, press and analyst communities. He is also responsible for nurturing its partnership program to accelerate its engagement into automotive, industrial and IT infrastructure markets. Ian spent nearly eleven years at Arm, where he held roles leading teams in vertical marketing, corporate marketing and strategic alliances. Ian is a graduate of Loughborough University (UK) with a BSc in Electrical and Electrical Engineering.


Related Content:

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.