Click here to Skip to main content
15,906,558 members
Articles / Artificial Intelligence / Keras

AI Queue Length Detection: Object Detection in Video Using YOLO

Rate me:
Please Sign up or sign in to vote.
4.82/5 (5 votes)
30 Oct 2020CPOL3 min read 7.8K   1   3
In this we’ll see if we can implement YOLO on video feeds for queue length detection.
Here we’ll use YOLO to detect and count the number of people in a video sequence.

So far in the series, we have been working with still image data. In this article, we’ll use a basic implementation of YOLO to detect and count people in video sequences.

Let’s again start by importing the required libraries.

import cv2
import numpy as np
import time

Here, we’ll use the same files and code pattern as discussed in the previous article. If you haven’t read the article, I would suggest you give it a read since it’ll clear the basics of the code. For reference, we’ll load the YOLO model here.

# Load Yolo
net=cv2.dnn.readNet("./yolov3.weights", "./yolov3.cfg")
# Classes
with open("coco.names","r") as f:
	classes=[line.strip() for line in f.readlines()]
# Define Output Layers	
output_layers=[layer_names[i[0]-1] for i in net.getUnconnectedOutLayers()]

Now, instead of image, we’ll feed our model a video file. OpenCV provides a simple interface to deal with video files. We’ll create an instance of the VideoCapture object. Its argument can either be the index of an attached device or a video file. I’ll be working with a video.

cap = cv2.VideoCapture('./video.mp4')

We’ll save the output as a video sequence as well. To save our video output, we’ll use a VideoWriter object instance from Keras.

out_video = cv2.VideoWriter(  'human.avi', cv2.VideoWriter_fourcc(*'MJPG'), 15., (640,480))

Now we’ll capture the frames from the video sequence, process them using blob and get the detection. (Check out the previous article for a detailed explanation.)

frame = cv2.resize(frame, (640, 480))
#detecting objects using blob. (320,320) can be used instead of (608,608)
#Object Detection

Let’s get the coordinates of the bounding box for each detected object and apply a threshold to eliminate weak detections. Of course, let’s not forget to apply Non-Maximum Suppression.

# # Evaluate class ids, confidence score and bounding boxes for detected objects i.e. humans
for out in outs:
  	for detection in out:
        	if confidence>0.5:
              	# Object Detected
  	            # Rectangle Co-ordinates
              	# Non-max Suppression

Draw the final boundary boxes on the detected objects and increment the counter for each detection.

count = 0
# Draw bounding boxes
	for i in range(len(boxes)):
    	if i in indexes:
        	if int(class_ids[i] == 0):
            	count +=1
            	cv2.putText(frame,label+" "+str(round(confidences[i],3)),(x,y-5),font,1, color, 1)

We can draw the frame number as well as a counter right on the screen so that we know how many people there are in each frame.

# draw counter
cv2.putText(frame, str(count), (100,200), cv2.FONT_HERSHEY_DUPLEX, 2, (0, 255, 255), 10)


Our model will now count the number of people present in the frame. For a better estimate of queue length, we can define our region of interest as well prior to processing.

# Region of interest
ROI = [(100,100),(1880,100),(100,980),(1880,980)]

And then focus the frame only in that region as follows:

# draw Region Of Interest
cv2.rectangle(frame, ROI[0], ROI[3], (255,255,0), 2)

Here’s a snapshot of the final video being processed.

End Note

In this series of articles, we learned to implement deep neural networks for computer vision problems. We implemented the neural networks from scratch, used transfer learning, and used pre-trained models to detect objects of interest. At first glance, it seemed like it might be better to use custom trained models when we’re working with a congested scene where objects of interest are not clearly visible. Custom trained models would be more accurate in such cases, but the accuracy comes at the cost of time and resource consumption. Some state-of-the-art algorithms like YOLO are more efficient than custom trained models and can be implemented in real time while maintaining a reasonable level of accuracy.

The solutions we explored in this series are not perfect and can be improved, but you should now be able to see a clear picture of what it means to work with deep learning models for object detection. I encourage you to experiment with the solutions we went through. Maybe you could fine-tune parameters to get a better prediction, or implement ROI or LOI to get a better estimate of the number of people in a queue. You can experiment with object tracking as well, just don’t forget to share your findings with us. Happy coding!

This article is part of the series 'AI Queue Length Detection View All


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By
Ireland Ireland
C# Corner MVP, UGRAD alumni, student, programmer and an author.

Comments and Discussions

QuestionCan u please put the entire code Pin
Member 1560649522-Apr-22 12:21
Member 1560649522-Apr-22 12:21 
QuestionVideo data Pin
Member 1148395722-Feb-21 1:48
Member 1148395722-Feb-21 1:48 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.