Performing Object Detection on a video file using CodeProject.AI Server

Chris Maunder

5.00/5 (3 votes)

Apr 9, 2024

CPOL

2 min read

6550

Learn how to process a video file offline using CodeProject.AI Server

Introduction

A lot of attention is given to processing live webcam feeds in areas of object detection or face recognition. There are uses cases for processing video from saved or downloaded videos so let's do a very quick walkthrough of using CodeProject.AI Server to process a video clip.

For this we'll use Python, and specifically the OpenCV library due to its built in support for many things video. The goal is to load a video file, run it against an object detector, and generate a file containing objects and timestamps, as we as viewing the video itself with the detection bounding boxes overlaid.

Setup

This will be bare-bones so we can focus on using CodeProject.AI Server rather than focussing on tedious setup steps. We will be running the YOLOv5 6.2 Object Detection module within CodeProject.AI Server. This module provides decent performance, but most conveniently it runs in the shared Python virtual environment setup in the runtimes/ folder. We will shamelessly use this same venv ourselves.

All our code will run inside the CodeProject.AI Server codebase, with our demo sitting under the /demos/clients/Python/ObjectDetect folder in the video_process.py file.

To run this code, go to the /demos/clients/Python/ObjectDetect folder and run

# For Windows
 ..\..\..\..\src\runtimes\bin\windows\python39\venv\Scripts\python video_process.py

# for Linux/macOS
 ../../../../src/runtimes/bin/macos/python38/venv/bin/python video_process.py

To halt the program type "q" in the terminal from which you launched the file.

The Code

So how did we do it?

Below is minimal version of the code for opening a video file and sending to CodeProject.AI Server

    vs = FileVideoStream(file_path).start()

    with open("results,txt", 'w') as log_file:
        while True:      
            if not vs.more():
                break

            frame = vs.read()
            if frame is None:
                break

            image = Image.fromarray(frame)
            image = do_detection(image, log_file)
            frame = np.asarray(image)

            if frame is not None:
                frame = imutils.resize(frame, width = 640)
                cv2.imshow("Movie File", frame)

    vs.stop()
    cv2.destroyAllWindows()

We open the video file using FileVideoStream, then iterate over the stream object until we run out of frames. Each frame is sent to a do_detection method which does the actual object detection. We've also opened a log file called "results.txt" which we pass to do_detection, which will log the items and locations detected in the image to this file.

def do_detection(image, log_file):
   
    # Convert to format suitable for a POST to CodeProject.AI Server
    buf = io.BytesIO()
    image.save(buf, format='PNG')
    buf.seek(0)

    # Send to CodeProject.AI Server for object detection. It's better to have a
    # session object created once at the start and closed at the end, but we
    # keep the code simpler here for demo purposes    
    with requests.Session() as session:
        response = session.post(server_url + "vision/detection",
                                files={"image": ('image.png', buf, 'image/png') },
                                data={"min_confidence": 0.5}).json()

    # Get the predictions (but be careful of a null return)
    predictions = None
    if response is not None and "predictions" in response:
       predictions = response["predictions"]

    if predictions is not None:
        # Draw each bounding box and label onto the image we based in
        font = ImageFont.truetype("Arial.ttf", font_size)
        draw = ImageDraw.Draw(image)

        for object in predictions:
            label = object["label"]
            conf  = object["confidence"]
            y_max = int(object["y_max"])
            y_min = int(object["y_min"])
            x_max = int(object["x_max"])
            x_min = int(object["x_min"])

            if y_max < y_min:
                temp = y_max
                y_max = y_min
                y_min = temp

            if x_max < x_min:
                temp = x_max
                x_max = x_min
                x_min = temp

            draw.rectangle([(x_min, y_min), (x_max, y_max)], outline="red", width=line_width)
            draw.text((x_min + padding, y_min - padding - font_size), f"{label} {round(conf*100.0,0)}%", font=font)

            log_file.write(f"{object_info}: ({x_min}, {y_min}), ({x_max}, {y_max})\n")

    # Return our (now labelled) image
    return image

The only tricky parts are

Extract frames from a video file
Encode each frame correctly so it can be sent as a HTTP POST to the CodeProject.AI Server API
Draw the bounding boxes and labels of detected objects onto the frame, and display each frame in turn

ALl the real grunt work has been done by CodeProject.AI Server in detecting the objects in each frame

Summing up

The techniques explained here are transferrable to many of the modules in CodeProject.AI Server: take some data, convert to a form suitable for a HTTP POST, make the API call, and then display the results. As long as you have the data to send a module that satisfies your need, you're good to go.