In this post, it is demonstrated how to use OpenCV 3.4.1 deep learning module with MobileNet-SSD network for object detection.

As part of Opencv 3.4.+ deep neural network(dnn) module was included officially. The dnn module allows load pre-trained models from most populars deep learning frameworks, including Tensorflow, Caffe, Darknet, Torch. Besides MobileNet-SDD other architectures are compatible with OpenCV 3.4.1 :

  • GoogleLeNet
  • YOLO
  • SqueezeNet
  • Faster R-CNN
  • ResNet

This API is compatible with C++ and Python.  : – )

Code description

In this section, We’ll create the python script for object detection and it is explained, how to load our deep neural network with OpenCV 3.4 ? How to pass image to neural network ? and How to make a prediction with MobileNet or dnn module in OpenCV?.

We use a MobileNet pre-trained taken from that was trained in Caffe-SSD framework. This model can detect 20 classes.

Load and predict with deep neural network module

First, create a python new file put the following code, here we import the libraries:

#Import the neccesary libraries
import numpy as np
import argparse
import cv2

Next, add the parser command lines:

# construct the argument parse 
parser = argparse.ArgumentParser(
    description='Script to run MobileNet-SSD object detection network ')
parser.add_argument("--video", help="path to video file. If empty, camera's stream will be used")
parser.add_argument("--prototxt", default="MobileNetSSD_deploy.prototxt",
                                  help='Path to text network file: '
                                       'MobileNetSSD_deploy.prototxt for Caffe model or '
parser.add_argument("--weights", default="MobileNetSSD_deploy.caffemodel",
                                 help='Path to weights: '
                                      'MobileNetSSD_deploy.caffemodel for Caffe model or '
parser.add_argument("--thr", default=0.2, type=float, help="confidence threshold to filter out weak detections")
args = parser.parse_args()

The above line establish the following arguments:

  • –video: Path file video.
  • –prototxt: Network file is .prototxt
  • –weights: Network weights file is .caffemodel
  • –thr: Confidence threshold.

Next, we define the labels for the classes of our MobileNet-SSD network.

#Labels of network.
classNames = { 0: 'background',
    1: 'aeroplane', 2: 'bicycle', 3: 'bird', 4: 'boat',
    5: 'bottle', 6: 'bus', 7: 'car', 8: 'cat', 9: 'chair',
    10: 'cow', 11: 'diningtable', 12: 'dog', 13: 'horse',
    14: 'motorbike', 15: 'person', 16: 'pottedplant',
    17: 'sheep', 18: 'sofa', 19: 'train', 20: 'tvmonitor' }
Next, open the video file or capture device depending what we choose, also load the model Caffe model.

# Open video file or capture device. 
    cap = cv2.VideoCapture(
    cap = cv2.VideoCapture(0)

#Load the Caffe model 
net = cv2.dnn.readNetFromCaffe(args.prototxt, args.weights)

On line 36, pass the arguments prototxt and weights to the function, after that we loaded correctly the network.

Next, we read the video frame by frame and pass to the frame to network for detections. With the dnn module is easily to use our deep learning network in OpenCV and make predictions.

while True:
    # Capture frame-by-frame
    ret, frame =
    frame_resized = cv2.resize(frame,(300,300)) # resize frame for prediction

On line 40-41, read the frame from video and resize to 300×300 because it is the input size of image defined for MobileNet-SSD model.

    # MobileNet requires fixed dimensions for input image(s)
    # so we have to ensure that it is resized to 300x300 pixels.
    # set a scale factor to image because network the objects has differents size. 
    # We perform a mean subtraction (127.5, 127.5, 127.5) to normalize the input;
    # after executing this command our "blob" now has the shape:
    # (1, 3, 300, 300)
    blob = cv2.dnn.blobFromImage(frame_resized, 0.007843, (300, 300), (127.5, 127.5, 127.5), False)
    #Set to network the input blob 
    #Prediction of network
    detections = net.forward()

After the above lines, we obtain the prediction of network, it simply to do in three basic steps:

  • Load an image
  • Pre-process the image
  • Set the image as input of network and obtain the prediction result.

The usage for dnn module is essentially the same for the others networks and architecture, so we can replicate this for own trained models.

Please, help us to create a community, follow us in instagram 

Visualize object detection and prediction confidence

In conclusion, after that previous steps, new questions arise, How to get the object location with MobileNet ? How to know the class of object predicted ?  How to get confidence of prediction ? Let’s go!

We must read detections array for get the prediction data of neural network, the following code do this:

   #Size of frame resize (300x300)
    cols = frame_resized.shape[1] 
    rows = frame_resized.shape[0]

    #For get the class and location of object detected, 
    # There is a fix index for class, location and confidence
    # value in @detections array .
    for i in range(detections.shape[2]):
        confidence = detections[0, 0, i, 2] #Confidence of prediction 
        if confidence > args.thr: # Filter prediction 
            class_id = int(detections[0, 0, i, 1]) # Class label

            # Object location 
            xLeftBottom = int(detections[0, 0, i, 3] * cols) 
            yLeftBottom = int(detections[0, 0, i, 4] * rows)
            xRightTop   = int(detections[0, 0, i, 5] * cols)
            yRightTop   = int(detections[0, 0, i, 6] * rows)

We make a loop(line 62)  for read the values. Then, on line 63 we get the confidence of prediction and next line filter with threshold value. On line 65, get the label. On lines 68 – 71, get the corners of object.

With all the information about object predicted, the last step is display the results. The next code draw object detected and display its label and confidence in frame.

            # Factor for scale to original size of frame
            heightFactor = frame.shape[0]/300.0  
            widthFactor = frame.shape[1]/300.0 
            # Scale object detection to frame
            xLeftBottom = int(widthFactor * xLeftBottom) 
            yLeftBottom = int(heightFactor * yLeftBottom)
            xRightTop   = int(widthFactor * xRightTop)
            yRightTop   = int(heightFactor * yRightTop)
            # Draw location of object  
            cv2.rectangle(frame, (xLeftBottom, yLeftBottom), (xRightTop, yRightTop),
                          (0, 255, 0))

            # Draw label and confidence of prediction in frame resized
            if class_id in classNames:
                label = classNames[class_id] + ": " + str(confidence)
                labelSize, baseLine = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)

                yLeftBottom = max(yLeftBottom, labelSize[1])
                cv2.rectangle(frame, (xLeftBottom, yLeftBottom - labelSize[1]),
                                     (xLeftBottom + labelSize[0], yLeftBottom + baseLine),
                                     (255, 255, 255), cv2.FILLED)
                cv2.putText(frame, label, (xLeftBottom, yLeftBottom),
                            cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0))

                print label #print class and confidence 

    cv2.namedWindow("frame", cv2.WINDOW_NORMAL)
    cv2.imshow("frame", frame)
    if cv2.waitKey(1) >= 0:  # Break with ESC 

Last on lines 99-93, display the image of frame normal and resize to screen.


The code and MobileNet trained model can be downloaded from: