Hybrid Edge AI for Facial Recognition: Next Steps

Sergey L. Gladkiy

4.06/5 (3 votes)

Aug 4, 2021

CPOL

5 min read

10434

In this article, we’ll discuss some aspects of developing a facial recognition system from scratch.

Download net.zip - 80.9 MB

Introduction

Face recognition is one area of artificial intelligence (AI) where the modern approaches of deep learning (DL) have had great success during the last decade. The best face recognition systems can recognize people in images and video with the same precision humans can – or even better.

Our series of articles on this topic is divided into two parts:

Face detection, where the client-side application detects human faces in images or in a video feed, aligns the detected face pictures, and submits them to the server.
Face recognition (this part), where the server-side application performs face recognition.

We assume that you are familiar with DNN, Python, Keras, and TensorFlow. You are welcome to download this project code to follow along.

In the previous article, we had a look at the Kubernetes-managed implementation of our face recognition system. In this article (the last in the series), we’ll discuss development of an AI facial recognition system from scratch.

The DNN Dilemma

As was stated in the introduction article, a typical AI face recognition algorithm consists of four steps: face detection, face alignment, feature extraction, and feature matching. The first and the third steps are implemented with specially developed DNNs. The face alignment and feature matching algorithms depend on the facial landmarks and features provided by the DNNs.

Development and training DNNs for facial recognition are very time- and effort-consuming tasks. Before starting such development, we must be sure that there is no other way to reach our goal. There are two main parameters that rate a facial recognition system: accuracy and performance (speed). We may decide to develop our own DNNs for the facial recognition system if one or both parameters of the existing models do not meet our requirements.

For example, the performance of the MTCNN model we used didn’t support the true real-time mode on Raspberry Pi 3, so we could decide to develop a new face detection DNN. However, we could make our DNN faster using a couple of alternative options.

The first option is to optimize the MTCNN model by reducing the number of its parameters and retrain the optimized model. Training of an MTCNN-like model is not an easy task because the model consists of four stages: scaling, face classification, bounding box regression, and landmark localization. Examples of how to implement and train MTCNN models can be found here and here.

The second option is to select another DNN model for face detection. For example, you can train an SSD model with a compact structure, such as MobileNet or SqueezeNet, to get better performance. Here is an implementation of MobileNet SSD for face detection. Note that SSD models do not provide facial landmarks, only bounding boxes.

The available pre-trained models can recognize faces better than humans. However, you may need to develop your own DNN if none of the existing models turn out to be optimized for specific conditions where your face recognition system is going to work.

We, for example, identified faces in the video from an outdoor IP camera, while our model had been trained mainly on pictures taken indoors with the various photo cameras. So we might want to retrain our DNN model using face images taken from cameras in the outdoor conditions. To gather face images for training, we can follow the same approach we had used (reference to ‘Creating Database’ part) when creating our database: run the face detector with alignment on video and correlate the extracted faces to the different known people. Here is a description of the training process for the FaceNet model.

Conclusion

In this series, we showed you how to create a hybrid edge AI system for face recognition. We considered a common face recognition algorithm based on modern DL approaches: face detection, face alignment, feature extraction, and feature matching. We developed a face detector based on a pre-trained MTCNN model, and then tested it on a Raspberry Pi device. We suggested algorithms for face alignment and wrote the code for creating a database of aligned faces.

We explained how to use a pre-trained FaceNet model for face identification. We developed a simple web API to wrap the face identification model, designed a client-server application based on this API, and tested the model and the API together. We containeried the server application with Docker to simplify the development, testing, and deployment of the system. We also showed you how the server application can be scaled to work with multiple client applications by running it on Kubernetes. Finally, we discussed the development of an AI face recognition system from scratch.

Next Steps

The theme of facial recognition on edge devices is very extensive for one series of articles. We’ve realized only conceptual face recognition pipeline and there are a lot of things to improve in the system for commercial use. Here we’ll give you only several ideas on how to improve the parts of the system for better performance (speed and accuracy).

Using a motion detector to increase the speed of face detection on an edge device. The speed of the MTCNN detector directly depends on the frame size. A motion detector allows selecting areas of interest on video frames and running the MTCNN detector on smaller parts of the frame, increasing the speed of processing.
Using face tracking. While detecting faces we can track them using an appropriate tracking algorithm. The main idea of using tracking is to get additional information. As we track a face, we know that it is the same person on several frames and we can use this information to identify the person by several probe images. This multi-identification method can increase the accuracy of the system.
Using a multi-face database. Such a database contains several face samples for every person. This additional information also allows increasing the accuracy of the face identification. We should note here that this slows down the recognition server as it requires comparing every probe face with every face sample in the database. This is not such a big problem because the identification usually runs on powerful servers.

We hope that you’ve got many interesting ideas reading this series and we wish you ‘happy face recognition’!