Running object detection models with Intel DevCloud for the Edge -

Running object detection models with Intel DevCloud for the Edge


As demand for compute at the edge continues to increase, developers face a variety of challenges in implementing AI and computer vision solutions. This article explains how Intel DevCloud for the Edge, a remote development environment, can be used to address these challenges and explains how you can run your own object detection model remotely.

Intel DevCloud for the Edge enables users to develop, prototype, benchmark, and test AI inference applications on a broad range of Intel hardware, including CPUs, iGPUs, FPGAs, and VPUs. DevCloud for the Edge uses a Jupyter Notebook interface and comes pre-installed with the latest versions of the Intel Distribution of OpenVINO Toolkit, which is necessary for optimization and heterogenous distribution. All supporting devices are configured for optimal performance and ready for inference execution.

Challenges addressed by DevCloud for the Edge include:

  • Hardware choice paralysis: Developers can run AI applications remotely on a wide range of hardware to determine which is best for their solution based on factors such as inference execution time, power consumption, and cost.
  • Access to edge hardware: Immediate remote access to the latest Intel hardware.
  • Outdated software: Instant access to the latest version of Intel Distribution of OpenVINO toolkit and compatible Edge hardware.
  • Access to performance benchmarks: Application-specific performance benchmarks, in an “easy to compare” side by side format.
  • Unsure where to start: tutorials and samples to help developers get started

Visit Intel DevCloud for the Edge to learn more or register for free access.

1.   Introduction

This tutorial allows the user to dive into what it takes to perform an Object Detection task by using a pre-trained CNN MobileNet-SSD, Intel Distribution of OpenVINO toolkit for inference, and the ability to run inference on Intel’s CPUs, iGPUs, VPUs, and accelerator devices. All software and hardware dependencies are setup so all you have to do is login, load the Jupyter notebook, and follow along. The tutorial not only shows you how to use Intel Distribution of OpenVINO toolkit APIs, but also gives you the flexibility to use your own images or video and pre-trained AI model if the ones used in the tutorial don’t satisfy your use case.

Readers can follow along in their own browser by registering and accessing the Object Detection Tutorial here.

1.1.  Prerequisites

A series of files needed to run the application are included on the development server in the Jupyter Notebook environment. You can access these files within the Object Detection Tutorial in the Jupyter Notebook file directory. The tutorial includes the following:

  • All files are present and in the following directory structure:
    • tutorial_object_detection_ssd.ipynb – This Jupyter Notebook
    • mobilenet-ssd/mobilenet-ssd.bin and mobilenet-ssd.xml – The IR files for the inference model created using Model Optimizer
    • mobilenet-ssd/labels.txt – Mapping of numerical labels to text strings
    • face.jpg – Test image
    • car.bmp – Test image
    • doc_*.png – Images used in the documentation
  • Optional: URL to user’s image or video to run inference on

Note: It is assumed that the server this tutorial is being run on has Jupyter Notebook, the Intel Distribution of OpenVINO toolkit, and other required libraries already installed. If you download or copy to a new server, this tutorial may not run.

The pre-trained model to be used for object detection is the “mobilenet-ssd” which has already been converted to the necessary Intermediate Representation (IR) files needed by the Inference Engine (Conversion is not covered here, please see the Intel Distribution of OpenVINO toolkit documentation for more details). The model is capable of detecting different objects including: airplane, bicycle, bird, boat, bus, car, cat, dog, horse, person and more (see the complete list here).

1.2  Key concepts

Before going through the tutorial steps, we will go over some key concepts that will be covered in this tutorial.

1.2.1  Intel distribution of OpenVINO toolkit overview and terminology

The Intel Distribution of OpenVINO toolkit enables the quick deployment of convolutional neural networks (CNN) for heterogeneous execution on Intel hardware while maximizing performance. This is done using the Intel Deep Learning Deployment Toolkit (Intel DLDT) included within the Intel Distribution of OpenVINO toolkit with its main components shown below.

Many CNNs have been trained and optimized for size and speed and are available publicly. This tutorial uses MobileNet-SSD, an optimization of the MobileNet model. Learn more about MobileNet-SSD here.

The basic flow is:

  1. Use a tool, such as Caffe, to create and train a CNN inference model
  2. Run the created model through Model Optimizer to produce an optimized Intermediate Representation (IR) stored in files (.bin and.xml) for use with the Inference Engine
  3. The User Application then loads and runs models on devices using the Inference Engine and the IR files

This tutorial will focus on the last step, the User Application and using the Inference Engine to run a model on a CPU.

1.2.  Using the inference engine

Below is a more detailed view of the User Application and Inference Engine:

The Inference Engine includes a plugin library for each supported device that has been optimized for the Intel hardware device CPU, iGPU, and VPU. From here, we will use the terms “device” and “plugin” with the assumption that one infers the other (e.g. CPU device infers the CPU plugin and vice versa). As part of loading the model, the User Application tells the Inference Engine which device to target which in turn loads the associated plugin library to later run on the associated device. The Inference Engine uses “blobs” for all data exchanges, basically arrays in memory arranged according the input and output data of the model.

Inference engine API integration flow

Using the inference engine API follows the basic steps outlined briefly below. The API objects and functions will be seen later in the sample code.

  1. Load the plugin
  2. Read the model IR
  3. Load the model into the plugin
  4. Prepare the input
  5. Run inference
  6. Process the output

More details on the Inference Engine can be found in the Inference Engine Development Guide

1.3. Input preprocessing

This tutorial and the many samples in the Intel Distribution of OpenVINO toolkit use OpenCV to perform resizing of input data. The basic steps performed using OpenCV are:

  1. Resize image dimensions form image to model’s input W x H:
    frame = cv2.resize(image, (w, h))
  2. Change data layout from (H x W x C) to (C x H x W)
    frame = frame.transpose((2, 0, 1))
  3. Reshape to match input dimensions
    frame = frame.reshape((n, c, h, w))

2. Sample application
The following sections will guide you through a sample application

2.1.  Imports

We begin by importing all the Python modules that will be used by the sample code:

  • os – Operating system specific module (used for file name parsing)
  • cv2 – OpenCV module
  • time – time tracking module (used for measuring execution time)
  • openvino.inference_engine – the IENetwork and IECore objects
  • matplotlib – pyplot is used for displaying output images

Run the cell below to import Python dependencies needed for displaying the results in this notebook.

import os
import cv2
import time
from openvino.inference_engine import IECore
%matplotlib inline
from matplotlib import pyplot as plt
print('Imported Python modules successfully.')

2.2.  Configuration

Here we will create and set the following configuration parameters used by the sample:

  • model_xml – Path to the.xml IR file of the trained model to use for inference
  • model_bin – Path to the.bin IR file of the trained model to use for inference (derived from model_xml)
  • input_path – Path to input image
  • device – Specify the target device to infer on, CPU, iGPU, FPGA, or MYRIAD is acceptable, however the device must be present. For this tutorial we use “CPU” which is known to be present.
  • labels_path – Path to labels mapping file used to map outputted integers to strings (e.g. 7=”car”)
  • prob_threshold – Probability threshold for filtering detection results

We will set all parameters here only once except for input_path which we will change later to point to different images and video.

# model IR files
model_xml = "./mobilenet-ssd/mobilenet-ssd.xml"
model_bin = os.path.splitext(model_xml)[0] + ".bin" # create IR .bin filename from path to IR .xml file

# input image file
input_path = "car.bmp"

# CPU extension library to use
cpu_extension_path = os.path.expanduser("~")+"/inference_engine_samples/intel64/Release/lib/"

# device to use
device = "CPU"

# output labels 
labels_path = "./mobilenet-ssd/labels.txt"

# minimum probability threshold to detect an object
prob_threshold = 0.5
print("Configuration parameters settings:"
    "\n\tmodel_xml=", model_xml,
     "\n\tmodel_bin=", model_bin,
     "\n\tinput_path=", input_path,
     "\n\tdevice=", device, 
     "\n\tlabels_path=", labels_path, 
     "\n\tprob_threshold=", prob_threshold)

2.3.  Create inference engine instance

Next we create the Inference Engine instance to be used by our application.

# create Inference Engine instance
ie = IECore()
print("An Inference Engine object has been created")

2.4.  Create network

Here we create an IENetwork object and load the model’s IR files into it. After loading the model, we check to make sure that all the model’s layers are supported by the plugin we will use. We also check to make sure that the model’s input and output are as expected for later when we run inference.

# load network from IR files
net = ie.read_network(model=model_xml, weights=model_bin)
print("Loaded model IR files [",model_bin,"] and [", model_xml, "]\n")

# check to make sure that the plugin has support for all layers in the loaded model
supported_layers = ie.query_network(net,device)
not_supported_layers = [l for l in net.layers.keys() if l not in supported_layers]
if len(not_supported_layers) != 0:
   print("ERROR: Following layers are not supported by the plugin for specified",
         " device {}:\n {}".format(plugin.device, ', '.join(not_supported_layers)))
   assert 0 == 1, "ERROR: Missing support for all layers in the model," \
             + " cannot continue."

# check to make sue that the model's input and output are what is expected
assert len(net.inputs.keys()) == 1, \
   "ERROR: This sample supports only single input topologies"
assert len(net.outputs) == 1, \
   "ERROR: This sample supports only single output topologies"
print("SUCCESS: Model IR files have been loaded and verified")

2.5.  Load model

Here we load the model network into the plugin so that we may run inference. exec_net will be used later to actually run inference. After loading, we store the names of the input (input_blob) and output (output_blob) blobs to use when accessing the input and output blobs of the model. Lastly, we store the model’s input dimensions into the following variables:

  • n = input batch size
  • c = number of input channels (here 1 channel per color R,G, and B)
  • h = input height
  • w = input width
# load the model into the Inference Engine for our device
exec_net = ie.load_network(network=net, num_requests=2, device_name=device)

# store name of input and output blobs
input_blob = next(iter(net.inputs))
output_blob = next(iter(net.outputs))

# read the input's dimensions: n=batch size, c=number of channels, h=height, w=width
n, c, h, w = net.inputs[input_blob].shape
print("Loaded model into Inference Engine for device:", device, 
     "\nModel input dimensions: n=",n,", c=",c,", h=",h,", w=",w)

2.6.  Load labels

For each detected object, the output from the model will include an integer to indicate which type (e.g. car, person, etc.) of trained object has been detected. To translate the integer into a more readable text string, a label mapping file may be used. The label mapping file is simply a text file of the format “n: string” (e.g. “7: car” for 7=”car”) that is loaded into a lookup table to be used later while labeling detected objects.

Here, if the labels_path variable has been set to point to a label mapping file, we open the file and load the labels into the variable labels_map.

labels_map = None
# if labels points to a label mapping file, then load the file into labels_map
if os.path.isfile(labels_path):
   with open(labels_path, 'r') as f:
       labels_map = [x.split(sep=' ', maxsplit=1)[-1].strip() for x in f]
   print("Loaded label mapping file [",labels_path,"]")
   print("No label mapping file has been loaded, only numbers will be used",
         " for detected object labels")

2.7.  Prepare input

Here we read and then prepare the input image by resizing and re-arranging its dimensions according to the model’s input dimensions. We define the functions loadInputImage() and resizeInputImage() for the operations so that we may reuse them again later in the tutorial.

# define function to load an input image
def loadInputImage(input_path, verbose = True):
   # globals to store input width and height
   global input_w, input_h
   # use OpenCV to load the input image
   cap = cv2.VideoCapture(input_path) 
   # store input width and height
   input_w = cap.get(3)
   input_h = cap.get(4)
   if verbose: print("Loaded input image [",input_path,"], resolution=", input_w, "w x ",input_h,"h")
   # load the input image
   ret, image =
   del cap
   return image
# define function for resizing input image
def resizeInputImage(image, verbose = True):
   # resize image dimensions form image to model's input w x h
   in_frame = cv2.resize(image, (w, h))
   # Change data layout from HWC to CHW
   in_frame = in_frame.transpose((2, 0, 1)) 
   # reshape to input dimensions
   in_frame = in_frame.reshape((n, c, h, w))
   if verbose: print("Resized input image from {} to {}".format(image.shape[:-1], (h, w)))
   return in_frame
# load image
image = loadInputImage(input_path)
# resize the input image
in_frame = resizeInputImage(image)
# display input image
print("Input image:")
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

2.8.  Run inference

Now that we have the input image in the correct format for the model, we now run inference on the input image that was previously set to./car.bmp:

# save start time
inf_start = time.time()

# run inference
res = exec_net.infer(inputs={input_blob: in_frame})  

# calculate time from start until now
inf_time = time.time() - inf_start
print("Inference complete, run time: {:.3f} ms".format(inf_time * 1000))

2.9.  Process results

Now we parse the inference results and for each object detected draw boxes with text annotations on image. We define the function processResults() so that we may use it again later in the tutorial to process results.

res is set to the output of the inference model which is an array of results, with one element for each detected object. We loop through res setting obj to hold the results for each detected object which appear in obj as:

  • obj[1] = Class ID (type of object detected)
  • obj[2] = Probability of detected object
  • obj[3] = Lower x coordinate of detected object
  • obj[4] = Lower y coordinate of detected object
  • obj[5] = Upper x coordinate of detected object
  • obj[6] = Upper y coordinate of detected object
# create function to process inference results
def processResults(result):
   # get output results
   res = result[output_blob]
   # loop through all possible results
   for obj in res[0][0]:
       # If probability is more than specified threshold, draw and label box 
       if obj[2] > prob_threshold:
           # get coordinates of box containing detected object
           xmin = int(obj[3] * input_w)
           ymin = int(obj[4] * input_h)
           xmax = int(obj[5] * input_w)
           ymax = int(obj[6] * input_h)
           # get type of object detected
           class_id = int(obj[1])
           # Draw box and label for detected object
           color = (min(class_id * 12.5, 255), 255, 255)
           cv2.rectangle(image, (xmin, ymin), (xmax, ymax), color, 4)
           det_label = labels_map[class_id] if labels_map else str(class_id)
           cv2.putText(image, det_label + ' ' + str(round(obj[2] * 100, 1)) + ' %', (xmin, ymin - 7),
                      cv2.FONT_HERSHEY_COMPLEX, 1, color, 2)

print("Processed inference output results.")

2.10.  Display results

Now that the results from inference have been processed, we display the image to see what has been detected.

# convert colors BGR -> RGB
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# disable axis display, then display image

3. Exercises

The remainder of the article demonstrates how the application can be run on other images and on video files. As with all deep neural network models, the accuracy of the model is based on the training set with a specific set of labels, so the model may perform poorly on different images than those used in the training set. But running inference on varying image sets is a good exercise to understand how effectively a model can generalize based on input data.

You can find more info on the MobileNet-SSD model here, including how to train the model on your own datasets.

3.1. Run a different image

Now that we have seen all the steps, let us run them again on a different image. We also define inferImage() to combine the input processing, inference, and results processing so that we may use it again later in the tutorial.

# define function to prepare input, run inference, and process inference results
def inferImage(image, verbose = True):
   # prepare input
   in_frame = resizeInputImage(image, verbose)
   # run inference
   res = exec_net.infer(inputs={input_blob: in_frame}) 
   # process inference results 
# set path to different input image
# load input image
image = loadInputImage(input_path)
# display input image
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
# infer image
# display image with inference results
# convert colors BGR -> RGB
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# create new figure, disable axis display, then display image

3.2. Run your own image

Here you may run any image you would like by setting the input_path variable which may be set to a local file or URL. A sample URL is provided as an example.

# input_path may be set to a local file or URL
# load input image
cap = loadInputImage(input_path)
# display input image
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
# infer image
# display image with inference results
# convert colors BGR -> RGB
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# create figure, disable axis display, then display image

3.3. Running inference on video

We have seen how to run individual images, now how do we do video? To run inference on video is much the same as for a single image except that a loop is necessary to process all the frames in the video. In the code below, we use the same method of loading a video as we had for an image, but now include the while-loop to keep reading images until cap.isOpened() returns false or sets ret to false:

while cap.isOpened():
   # read video frame
   ret, im =
   # break if no more video frames
   if not ret:
# close and then re-import matplotlib to be able to update output images for video
%matplotlib notebook

#%matplotlib inline
from matplotlib import pyplot as plt

# disable axis display

# input_path may be set to local file or URL 
input_path = "/data/reference-sample-data/object-detection-python/cars_1900.mp4"
print (os.path.exists(input_path))
print("Loading video [",input_path,"]")

# use OpenCV to load the input image
cap = cv2.VideoCapture(input_path)

# track frame count and set maximum
frame_num = 0
max_frame_num = 60
skip_num_frames = 100
last_frame_num = cap.get(cv2.CAP_PROP_FRAME_COUNT)
if last_frame_num < 1: last_frame_num = "unknown"
while cap.isOpened():
   # read video frame
   ret, image =
   # break if no more video frames
   if not ret:
   frame_num += 1
   # skip skip_num_frames of frames, then infer max_frame_num frames from there
   if frame_num > skip_num_frames: 
       # infer image
       inferImage(image, False)
       # display results
       # convert colors BGR -> RGB
       image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
       # show image then force re-draw to show new image
   # display current frame number
   print("Frame #:", frame_num, "/", last_frame_num, end="\r")
   # limit number of frames, video can be slow and gets slower the more frames that are processed
   if frame_num >= (max_frame_num + skip_num_frames): 
       print("\nStopping at frame #", frame_num)

3.4. Run your own video

If you would like to see inference run on your own video, you may do so by first setting input_path to a local file or URL and then re-executing the cell above. For example, you could use this video: by replacing the input_path=”…” line above with the line:


You can control which frame to start from by setting skip_num_frames which will skip that many frames. You can also control how many frames to show by setting max_frame_num.

Note: There are more videos available to choose from at:

4. Cleanup

Now that we are done running the sample, we clean up by deleting objects before exiting.

del exec_net
del net
del ie
print("Resource objects removed")

5. Next steps

You can run this application yourself using Intel DevCloud for the Edge. Register here for free and access the Object Detection Tutorial and other tutorials here.

This tutorial has explained how to run an object detection application using Intel DevCloud for the Edge. Here are things developers can do to build on what’s been done in this tutorial:

  • Visit Sample Applications to learn how to target specific hardware edge nodes to benchmark performance on a variety of devices and begin model optimization for specific devices.
  • Experiment with model accuracy, including retraining the model with new training data using applications like TensorFlow, ONNX, and Pytorch.
  • Learn how applications can be deployed to edge devices.

6. More information

Monique Jones is the technical lead for the OpenVINO™ toolkit on the U.S. team and supports Intel’s portfolio of visual computing products. She holds a Bachelor of Science in Electrical Engineering/Computer Engineering from Texas State University.

Related Contents:

For more Embedded, subscribe to Embedded’s weekly email newsletter.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.