Running object detection models with Intel DevCloud for the Edge
As demand for compute at the edge continues to increase, developers face a variety of challenges in implementing AI and computer vision solutions. This article explains how Intel DevCloud for the Edge, a remote development environment, can be used to address these challenges and explains how you can run your own object detection model remotely.
Intel DevCloud for the Edge enables users to develop, prototype, benchmark, and test AI inference applications on a broad range of Intel hardware, including CPUs, iGPUs, FPGAs, and VPUs. DevCloud for the Edge uses a Jupyter Notebook interface and comes pre-installed with the latest versions of the Intel Distribution of OpenVINO Toolkit, which is necessary for optimization and heterogenous distribution. All supporting devices are configured for optimal performance and ready for inference execution.
Challenges addressed by DevCloud for the Edge include:
- Hardware choice paralysis: Developers can run AI applications remotely on a wide range of hardware to determine which is best for their solution based on factors such as inference execution time, power consumption, and cost.
- Access to edge hardware: Immediate remote access to the latest Intel hardware.
- Outdated software: Instant access to the latest version of Intel Distribution of OpenVINO toolkit and compatible Edge hardware.
- Access to performance benchmarks: Application-specific performance benchmarks, in an “easy to compare” side by side format.
- Unsure where to start: tutorials and samples to help developers get started
Visit Intel DevCloud for the Edge to learn more or register for free access.
1. Introduction
This tutorial allows the user to dive into what it takes to perform an Object Detection task by using a pre-trained CNN MobileNet-SSD, Intel Distribution of OpenVINO toolkit for inference, and the ability to run inference on Intel’s CPUs, iGPUs, VPUs, and accelerator devices. All software and hardware dependencies are setup so all you have to do is login, load the Jupyter notebook, and follow along. The tutorial not only shows you how to use Intel Distribution of OpenVINO toolkit APIs, but also gives you the flexibility to use your own images or video and pre-trained AI model if the ones used in the tutorial don’t satisfy your use case.
Readers can follow along in their own browser by registering and accessing the Object Detection Tutorial here.
1.1. Prerequisites
A series of files needed to run the application are included on the development server in the Jupyter Notebook environment. You can access these files within the Object Detection Tutorial in the Jupyter Notebook file directory. The tutorial includes the following:
- All files are present and in the following directory structure:
- tutorial_object_detection_ssd.ipynb – This Jupyter Notebook
- mobilenet-ssd/mobilenet-ssd.bin and mobilenet-ssd.xml – The IR files for the inference model created using Model Optimizer
- mobilenet-ssd/labels.txt – Mapping of numerical labels to text strings
- face.jpg – Test image
- car.bmp – Test image
- doc_*.png – Images used in the documentation
- Optional: URL to user’s image or video to run inference on
Note: It is assumed that the server this tutorial is being run on has Jupyter Notebook, the Intel Distribution of OpenVINO toolkit, and other required libraries already installed. If you download or copy to a new server, this tutorial may not run.
The pre-trained model to be used for object detection is the “mobilenet-ssd” which has already been converted to the necessary Intermediate Representation (IR) files needed by the Inference Engine (Conversion is not covered here, please see the Intel Distribution of OpenVINO toolkit documentation for more details). The model is capable of detecting different objects including: airplane, bicycle, bird, boat, bus, car, cat, dog, horse, person and more (see the complete list here).
1.2 Key concepts
Before going through the tutorial steps, we will go over some key concepts that will be covered in this tutorial.
1.2.1 Intel distribution of OpenVINO toolkit overview and terminology
The Intel Distribution of OpenVINO toolkit enables the quick deployment of convolutional neural networks (CNN) for heterogeneous execution on Intel hardware while maximizing performance. This is done using the Intel Deep Learning Deployment Toolkit (Intel DLDT) included within the Intel Distribution of OpenVINO toolkit with its main components shown below.
Many CNNs have been trained and optimized for size and speed and are available publicly. This tutorial uses MobileNet-SSD, an optimization of the MobileNet model. Learn more about MobileNet-SSD here.
The basic flow is:
- Use a tool, such as Caffe, to create and train a CNN inference model
- Run the created model through Model Optimizer to produce an optimized Intermediate Representation (IR) stored in files (.bin and.xml) for use with the Inference Engine
- The User Application then loads and runs models on devices using the Inference Engine and the IR files
This tutorial will focus on the last step, the User Application and using the Inference Engine to run a model on a CPU.
1.2. Using the inference engine
Below is a more detailed view of the User Application and Inference Engine:
The Inference Engine includes a plugin library for each supported device that has been optimized for the Intel hardware device CPU, iGPU, and VPU. From here, we will use the terms “device” and “plugin” with the assumption that one infers the other (e.g. CPU device infers the CPU plugin and vice versa). As part of loading the model, the User Application tells the Inference Engine which device to target which in turn loads the associated plugin library to later run on the associated device. The Inference Engine uses “blobs” for all data exchanges, basically arrays in memory arranged according the input and output data of the model.
Inference engine API integration flow
Using the inference engine API follows the basic steps outlined briefly below. The API objects and functions will be seen later in the sample code.
- Load the plugin
- Read the model IR
- Load the model into the plugin
- Prepare the input
- Run inference
- Process the output
More details on the Inference Engine can be found in the Inference Engine Development Guide
1.3. Input preprocessing
This tutorial and the many samples in the Intel Distribution of OpenVINO toolkit use OpenCV to perform resizing of input data. The basic steps performed using OpenCV are:
- Resize image dimensions form image to model’s input W x H:
frame = cv2.resize(image, (w, h)) - Change data layout from (H x W x C) to (C x H x W)
frame = frame.transpose((2, 0, 1)) - Reshape to match input dimensions
frame = frame.reshape((n, c, h, w))
2. Sample application
The following sections will guide you through a sample application
2.1. Imports
We begin by importing all the Python modules that will be used by the sample code:
- os – Operating system specific module (used for file name parsing)
- cv2 – OpenCV module
- time – time tracking module (used for measuring execution time)
- openvino.inference_engine – the IENetwork and IECore objects
- matplotlib – pyplot is used for displaying output images
Run the cell below to import Python dependencies needed for displaying the results in this notebook.
import os import cv2 import time from openvino.inference_engine import IECore %matplotlib inline from matplotlib import pyplot as plt print('Imported Python modules successfully.')
2.2. Configuration
Here we will create and set the following configuration parameters used by the sample:
- model_xml – Path to the.xml IR file of the trained model to use for inference
- model_bin – Path to the.bin IR file of the trained model to use for inference (derived from model_xml)
- input_path – Path to input image
- device – Specify the target device to infer on, CPU, iGPU, FPGA, or MYRIAD is acceptable, however the device must be present. For this tutorial we use “CPU” which is known to be present.
- labels_path – Path to labels mapping file used to map outputted integers to strings (e.g. 7=”car”)
- prob_threshold – Probability threshold for filtering detection results
We will set all parameters here only once except for input_path which we will change later to point to different images and video.
# model IR files model_xml = "./mobilenet-ssd/mobilenet-ssd.xml" model_bin = os.path.splitext(model_xml)[0] + ".bin" # create IR .bin filename from path to IR .xml file # input image file input_path = "car.bmp" # CPU extension library to use cpu_extension_path = os.path.expanduser("~")+"/inference_engine_samples/intel64/Release/lib/libcpu_extension.so" # device to use device = "CPU" # output labels labels_path = "./mobilenet-ssd/labels.txt" # minimum probability threshold to detect an object prob_threshold = 0.5 print("Configuration parameters settings:" "\n\tmodel_xml=", model_xml, "\n\tmodel_bin=", model_bin, "\n\tinput_path=", input_path, "\n\tdevice=", device, "\n\tlabels_path=", labels_path, "\n\tprob_threshold=", prob_threshold)
2.3. Create inference engine instance
Next we create the Inference Engine instance to be used by our application.
# create Inference Engine instance ie = IECore() print("An Inference Engine object has been created")
2.4. Create network
Here we create an IENetwork object and load the model’s IR files into it. After loading the model, we check to make sure that all the model’s layers are supported by the plugin we will use. We also check to make sure that the model’s input and output are as expected for later when we run inference.
# load network from IR files net = ie.read_network(model=model_xml, weights=model_bin) print("Loaded model IR files [",model_bin,"] and [", model_xml, "]\n") # check to make sure that the plugin has support for all layers in the loaded model supported_layers = ie.query_network(net,device) not_supported_layers = [l for l in net.layers.keys() if l not in supported_layers] if len(not_supported_layers) != 0: print("ERROR: Following layers are not supported by the plugin for specified", " device {}:\n {}".format(plugin.device, ', '.join(not_supported_layers))) assert 0 == 1, "ERROR: Missing support for all layers in the model," \ + " cannot continue." # check to make sue that the model's input and output are what is expected assert len(net.inputs.keys()) == 1, \ "ERROR: This sample supports only single input topologies" assert len(net.outputs) == 1, \ "ERROR: This sample supports only single output topologies" print("SUCCESS: Model IR files have been loaded and verified")
2.5. Load model
Here we load the model network into the plugin so that we may run inference. exec_net will be used later to actually run inference. After loading, we store the names of the input (input_blob) and output (output_blob) blobs to use when accessing the input and output blobs of the model. Lastly, we store the model’s input dimensions into the following variables:
- n = input batch size
- c = number of input channels (here 1 channel per color R,G, and B)
- h = input height
- w = input width
# load the model into the Inference Engine for our device exec_net = ie.load_network(network=net, num_requests=2, device_name=device) # store name of input and output blobs input_blob = next(iter(net.inputs)) output_blob = next(iter(net.outputs)) # read the input's dimensions: n=batch size, c=number of channels, h=height, w=width n, c, h, w = net.inputs[input_blob].shape print("Loaded model into Inference Engine for device:", device, "\nModel input dimensions: n=",n,", c=",c,", h=",h,", w=",w)
2.6. Load labels
For each detected object, the output from the model will include an integer to indicate which type (e.g. car, person, etc.) of trained object has been detected. To translate the integer into a more readable text string, a label mapping file may be used. The label mapping file is simply a text file of the format “n: string” (e.g. “7: car” for 7=”car”) that is loaded into a lookup table to be used later while labeling detected objects.
Here, if the labels_path variable has been set to point to a label mapping file, we open the file and load the labels into the variable labels_map.
labels_map = None # if labels points to a label mapping file, then load the file into labels_map print(labels_path) if os.path.isfile(labels_path): with open(labels_path, 'r') as f: labels_map = [x.split(sep=' ', maxsplit=1)[-1].strip() for x in f] print("Loaded label mapping file [",labels_path,"]") else: print("No label mapping file has been loaded, only numbers will be used", " for detected object labels")
2.7. Prepare input
Here we read and then prepare the input image by resizing and re-arranging its dimensions according to the model’s input dimensions. We define the functions loadInputImage() and resizeInputImage() for the operations so that we may reuse them again later in the tutorial.
# define function to load an input image def loadInputImage(input_path, verbose = True): # globals to store input width and height global input_w, input_h # use OpenCV to load the input image cap = cv2.VideoCapture(input_path) # store input width and height input_w = cap.get(3) input_h = cap.get(4) if verbose: print("Loaded input image [",input_path,"], resolution=", input_w, "w x ",input_h,"h") # load the input image ret, image = cap.read() del cap return image # define function for resizing input image def resizeInputImage(image, verbose = True): # resize image dimensions form image to model's input w x h in_frame = cv2.resize(image, (w, h)) # Change data layout from HWC to CHW in_frame = in_frame.transpose((2, 0, 1)) # reshape to input dimensions in_frame = in_frame.reshape((n, c, h, w)) if verbose: print("Resized input image from {} to {}".format(image.shape[:-1], (h, w))) return in_frame # load image image = loadInputImage(input_path) # resize the input image in_frame = resizeInputImage(image) # display input image print("Input image:") plt.axis("off") plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
2.8. Run inference
Now that we have the input image in the correct format for the model, we now run inference on the input image that was previously set to./car.bmp:
# save start time inf_start = time.time() # run inference res = exec_net.infer(inputs={input_blob: in_frame}) # calculate time from start until now inf_time = time.time() - inf_start print("Inference complete, run time: {:.3f} ms".format(inf_time * 1000))
2.9. Process results
Now we parse the inference results and for each object detected draw boxes with text annotations on image. We define the function processResults() so that we may use it again later in the tutorial to process results.
res is set to the output of the inference model which is an array of results, with one element for each detected object. We loop through res setting obj to hold the results for each detected object which appear in obj as:
- obj[1] = Class ID (type of object detected)
- obj[2] = Probability of detected object
- obj[3] = Lower x coordinate of detected object
- obj[4] = Lower y coordinate of detected object
- obj[5] = Upper x coordinate of detected object
- obj[6] = Upper y coordinate of detected object
# create function to process inference results def processResults(result): # get output results res = result[output_blob] # loop through all possible results for obj in res[0][0]: # If probability is more than specified threshold, draw and label box if obj[2] > prob_threshold: # get coordinates of box containing detected object xmin = int(obj[3] * input_w) ymin = int(obj[4] * input_h) xmax = int(obj[5] * input_w) ymax = int(obj[6] * input_h) # get type of object detected class_id = int(obj[1]) # Draw box and label for detected object color = (min(class_id * 12.5, 255), 255, 255) cv2.rectangle(image, (xmin, ymin), (xmax, ymax), color, 4) det_label = labels_map[class_id] if labels_map else str(class_id) cv2.putText(image, det_label + ' ' + str(round(obj[2] * 100, 1)) + ' %', (xmin, ymin - 7), cv2.FONT_HERSHEY_COMPLEX, 1, color, 2) processResults(res) print("Processed inference output results.")
2.10. Display results
Now that the results from inference have been processed, we display the image to see what has been detected.
# convert colors BGR -> RGB image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # disable axis display, then display image plt.axis("off") plt.imshow(image)
3. Exercises
The remainder of the article demonstrates how the application can be run on other images and on video files. As with all deep neural network models, the accuracy of the model is based on the training set with a specific set of labels, so the model may perform poorly on different images than those used in the training set. But running inference on varying image sets is a good exercise to understand how effectively a model can generalize based on input data.
You can find more info on the MobileNet-SSD model here, including how to train the model on your own datasets.
3.1. Run a different image
Now that we have seen all the steps, let us run them again on a different image. We also define inferImage() to combine the input processing, inference, and results processing so that we may use it again later in the tutorial.
# define function to prepare input, run inference, and process inference results def inferImage(image, verbose = True): # prepare input in_frame = resizeInputImage(image, verbose) # run inference res = exec_net.infer(inputs={input_blob: in_frame}) # process inference results processResults(res) # set path to different input image input_path="face.jpg" # load input image image = loadInputImage(input_path) # display input image plt.axis("off") plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB)) # infer image inferImage(image) # display image with inference results # convert colors BGR -> RGB image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # create new figure, disable axis display, then display image plt.figure() plt.axis("off") plt.imshow(image)
3.2. Run your own image
Here you may run any image you would like by setting the input_path variable which may be set to a local file or URL. A sample URL is provided as an example.
# input_path may be set to a local file or URL input_path="https://github.com/chuanqi305/MobileNet-SSD/raw/master/images/004545.jpg" # load input image cap = loadInputImage(input_path) # display input image plt.axis("off") plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB)) # infer image inferImage(image) # display image with inference results # convert colors BGR -> RGB image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # create figure, disable axis display, then display image plt.figure() plt.axis("off") plt.imshow(image)
3.3. Running inference on video
We have seen how to run individual images, now how do we do video? To run inference on video is much the same as for a single image except that a loop is necessary to process all the frames in the video. In the code below, we use the same method of loading a video as we had for an image, but now include the while-loop to keep reading images until cap.isOpened() returns false or cap.read() sets ret to false:
while cap.isOpened(): # read video frame ret, im = cap.read() # break if no more video frames if not ret: break ... # close and then re-import matplotlib to be able to update output images for video plt.close() %matplotlib notebook #%matplotlib inline from matplotlib import pyplot as plt # disable axis display plt.axis("off") # input_path may be set to local file or URL input_path = "/data/reference-sample-data/object-detection-python/cars_1900.mp4" print (os.path.exists(input_path)) print("Loading video [",input_path,"]") # use OpenCV to load the input image cap = cv2.VideoCapture(input_path) # track frame count and set maximum frame_num = 0 max_frame_num = 60 skip_num_frames = 100 last_frame_num = cap.get(cv2.CAP_PROP_FRAME_COUNT) if last_frame_num < 1: last_frame_num = "unknown" while cap.isOpened(): # read video frame ret, image = cap.read() # break if no more video frames if not ret: break frame_num += 1 # skip skip_num_frames of frames, then infer max_frame_num frames from there if frame_num > skip_num_frames: # infer image inferImage(image, False) # display results # convert colors BGR -> RGB image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # show image then force re-draw to show new image plt.imshow(image) plt.gcf().canvas.draw() # display current frame number print("Frame #:", frame_num, "/", last_frame_num, end="\r") # limit number of frames, video can be slow and gets slower the more frames that are processed if frame_num >= (max_frame_num + skip_num_frames): print("\nStopping at frame #", frame_num) break print("\nDone.",frame_num)
3.4. Run your own video
If you would like to see inference run on your own video, you may do so by first setting input_path to a local file or URL and then re-executing the cell above. For example, you could use this video: https://github.com/intel-iot-devkit/sample-videos/raw/master/person-bicycle-car-detection.mp4 by replacing the input_path=”…” line above with the line:
input_path=“https://github.com/intel-iot-devkit/sample-videos/raw/master/person-bicycle-car-detection.mp4“
You can control which frame to start from by setting skip_num_frames which will skip that many frames. You can also control how many frames to show by setting max_frame_num.
Note: There are more videos available to choose from at: https://github.com/intel-iot-devkit/sample-videos/
4. Cleanup
Now that we are done running the sample, we clean up by deleting objects before exiting.
del exec_net del net del ie print("Resource objects removed")
5. Next steps
You can run this application yourself using Intel DevCloud for the Edge. Register here for free and access the Object Detection Tutorial and other tutorials here.
This tutorial has explained how to run an object detection application using Intel DevCloud for the Edge. Here are things developers can do to build on what’s been done in this tutorial:
- Visit Sample Applications to learn how to target specific hardware edge nodes to benchmark performance on a variety of devices and begin model optimization for specific devices.
- Experiment with model accuracy, including retraining the model with new training data using applications like TensorFlow, ONNX, and Pytorch.
- Learn how applications can be deployed to edge devices.
6. More information
- More Jupyter Notebook Tutorials – additional sample application Jupyter Notebook tutorials
- Jupyter Notebook Samples – sample applications
- Intel Distribution of OpenVINO toolkit Main Page – learn more about the tools and use of the Intel Distribution of OpenVINO toolkit for implementing inference on the edge
- For technical support, please visit the Intel DevCloud Forums
![]() |
Related Contents:
- Applying machine learning in embedded systems
- Artificial intelligence algorithms and challenges for autonomous vehicles
- Training AI models on the edge
- Tackling training bias in machine learning
- Why AI model “accuracy” can be misleading
- Neural networks application for small-scale tasks
- Teasing out a neural net exchange format
- Choosing solutions for edge AI
For more Embedded, subscribe to Embedded’s weekly email newsletter.