Introduction to Face Identification

Sergey L. Gladkiy

4.91/5 (4 votes)

Jul 28, 2021

CPOL

6 min read

13332

3340

In this article, we focus on developing and testing a face identification algorithm along with the face detection module.

Download net.zip - 80.9 MB

Introduction

Face recognition is one area of artificial intelligence (AI) where the modern approaches of deep learning (DL) have had great success during the last decade. The best face recognition systems can recognize people in images and video with the same precision humans can – or even better.

Our series of articles on this topic is divided into two parts:

Face detection, where the client-side application detects human faces in images or in a video feed, aligns the detected face pictures, and submits them to the server.
Face recognition (this part), where the server-side application performs face recognition.

In this part of the series, we’ll discuss the problem of face identification and combine the previously developed face detector with a face recognizer. We’ll then implement face identification in a Docker container and add a web API for transferring the detected faces to the server running recognition. In addition, we’ll consider certain aspects of running face recognition in Kubernetes. Finally, we’ll talk about building face recognition systems from scratch.

We assume that you are familiar with DNN, Python, Keras, and TensorFlow. You are welcome to download this project code to follow along.

Face Identification

Face identification can be described as finding in a face database the person most similar to the one being identified. In the previous article (the last one in the first half of this series), we created a database of 15 people. In this half of the series, we’ll consider the face identification task in more detail and develop an algorithm for identifying people in a video feed using a pre-trained DNN model.

We have a database of faces that contains sample images of people – one photograph for every person. When faces are detected in images or video, the detector produces a picture of each detected face. We must determine if the face in that picture belongs to one of the people in our database. There are two possible cases:

The detected face belongs to one of the people in the database, in which case we must specify this person’s ID (for example, their name).
The detected face belongs to an unknown person, in which case we must state this fact.

Face Recognition Models

All modern state-of-the-art methods of face recognition use DNN models for face identification. The models can have different architectures and can be trained on different face databases. However, they use similar approaches to achieve the goal. First, a DNN model is used as a feature extractor to get embeddings for a face image. These embeddings are then used to determine how similar the faces are, which is known as the "distance" between two face images – the less the distance, the more similar the faces. By evaluating the distances between the detected face and all faces in the database, we can find the most similar person.

Developing and training DNN models for face recognition is not a trivial task. Fortunately, there are many free pre-trained models and libraries that implement state-of-the-art DNN architectures for face recognition. For example:

Python face recognition based on the dlib library
The VGGFace model and its Keras implementation
The DeepFace library that supports the various DNN models
The Keras library that implements the FaceNet model

Each of the above models has its pros and cons. The best choice depends on the situation. In this series, we’ll use the FaceNet pre-trained model for two reasons:

The model is trained to optimize face embeddings directly, without an intermediate layer.
We can run a recognition algorithm with just a few lines of Python code because this model is implemented in Keras.

Face Recognizer Code

It’s time to write code for our face recognizer. A DNN-based face recognizer must implement at least two functions: one that extracts embeddings from a face image, and another one that evaluates the distance between the embeddings in two face images. Here is how we coded the recognizer based on the FaceNet DNN model:

class FaceNetRec:    
    def __init__(self, model, min_distance):
        self.model = load_model(model)
        self.min_distance = min_distance
    
    def get_model(self):
        return self.model
    
    def embeddings(self, f_img):
        r_img = cv2.resize(f_img, (160, 160), cv2.INTER_AREA)
        arr =  r_img.astype('float32')
        arr = (arr-127.5)/127.5
        samples = np.expand_dims(arr, axis=0)
        embds = self.model.predict(samples)
        return embds[0]
    
    def eval_distance(self, embds1, embds2):
        dist = distance.cosine(embds1, embds2)
        return dist 
    
    def img_distance(self, f_img1, f_img2):
        embds1 = self.embeddings(f_img1)
        embds2 = self.embeddings(f_img2)
        dist = self.eval_distance(embds1, embds2)
        return dist 
    
    def match(self, embds1, embds2):
        dist = self.eval_distance(embds1, embds2)
        return dist <= self.min_distance
    
    def img_match(self, f_img1, f_img2):
        embds1 = self.embeddings(f_img1)
        embds2 = self.embeddings(f_img2)
        return self.match(embds1, embds2)
    
    def recognize(self, embds, f_db):
        minfd = 2.0
        indx = -1
        f_data = f_db.get_data();
        for (i, data) in enumerate(f_data):
            (name, embds_i, p_img) = data
            dist = self.eval_distance(embds, embds_i)
            if (dist<minfd) and (dist<self.min_distance):
                indx = i
                minfd = dist
        if indx>=0:
            (name, embds_i, p_img) = f_data[indx]
            return (name, minfd, p_img)
        
        return None
    
    def img_recognize(self, f_img, f_db):
        embds = self.embeddings(f_img)
        
        return self.recognize(embds, f_db)

The constructor of the class loads the Keras model from a file specified by the model argument. The min_distance parameter specifies the minimal distance value required to identify a face image as belonging to a specific person. The embeddings and eval_distance methods implement the above functionality. We use the so-called cosine distance for evaluating the similarity of embeddings.

Another method we’d like to mention is recognize. This method takes as input the embeddings of a detected face and the f_db object. This object creates a face database as described in this (reference to the part ‘Creating Database’) article.

class FaceDB:
    def __init__(self):
        self.clear()
        
    def clear(self):
        self.f_data = []
        
    def load(self, db_path, rec):
        self.clear()
        files = FileUtils.get_files(db_path)
        for (i, fname) in enumerate(files):
            f_img = cv2.imread(fname, cv2.IMREAD_UNCHANGED)
            embds = rec.embeddings(f_img)
            f = os.path.basename(fname)
            p_name = os.path.splitext(f)[0]
            data = (p_name, embds, f_img)
            self.f_data.append(data)
            
    def get_data(self):
        return self.f_data

The load method of the class searches all files in a folder specified by the db_path parameter, creates embeddings for all the images it finds, and builds the name-embeddings-image triplets for every person in the database. Using the data from the face database, the recognize method evaluates the distances between embeddings for all people and the detected face. The minimal distance indicates the most similar person in the database. Note that if this value is greater than the specified threshold, the result will be None, which indicates an unknown face.

Combining the Face Recognizer with the Face Detector

We can now combine the developed classes with the MTCNN video face detector described in this (reference to the part ‘Face Detection’) article.

class VideoFR:    
    def __init__(self, detector, rec, f_db):
        self.detector = detector
        self.rec = rec
        self.f_db = f_db
    
    def process(self, video, align=False):
        detection_num = 0;
        rec_num = 0
        capture = cv2.VideoCapture(video)
        img = None

        dname = 'AI face recognition'
        cv2.namedWindow(dname, cv2.WINDOW_NORMAL)
        cv2.resizeWindow(dname, 960, 720)
        
        frame_count = 0
        dt = 0
        if align:
            fa = Face_Align_Mouth(160)
            
        # Capture all frames
        while(True):    
            (ret, frame) = capture.read()
            if frame is None:
                break
            frame_count = frame_count+1
            
            faces = self.detector.detect(frame)
            f_count = len(faces)
            detection_num += f_count
            
            names = None
            if (f_count>0) and (not (self.f_db is None)):
                t1 = time.time()
                names = [None]*f_count
                for (i, face) in enumerate(faces):
                    if align:
                        (f_cropped, f_img) = fa.align(frame, face)
                    else:
                        (f_cropped, f_img) = self.detector.extract(frame, face)
                    if (not (f_img is None)) and (not f_img.size==0):
                        embds = self.rec.embeddings(f_img)
                        data = self.rec.recognize(embds, self.f_db)
                        if not (data is None):
                            rec_num += 1
                            (name, dist, p_photo) = data
                            conf = 1.0 - dist
                            names[i] = (name, conf)
                        
                t2 = time.time()
                dt = dt + (t2-t1)
                    
            if len(faces)>0:
                Utils.draw_faces(faces, (0, 0, 255), frame, True, True, names)
            
            # Display the resulting frame
            cv2.imshow(dname,frame)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break
            
        capture.release()
        cv2.destroyAllWindows()    
        
        if dt>0:
            fps = detection_num/dt
        else:
            fps = 0
        
        return (detection_num, rec_num, fps)

On initialization, the VideoFR class receives the MTCNN face detector, face recognizer, and a database of faces. These components are then combined into a face recognition pipeline.

Testing the Recognizer

We can now run recognition on a video file with the following code:

m_file = r"C:\PI_FR\net\facenet_keras.h5"
rec = FaceNetRec(m_file, 0.5)
print("Recognizer loaded.")
print(rec.get_model().inputs)
print(rec.get_model().outputs)

db_path = r"C:\PI_FR\db"
f_db = FaceDB()
f_db.load(db_path, rec)
d = MTCNN_Detector(50, 0.95)
vr = VideoFR(d, rec, f_db)

v_file = r"C:\PI_FR\video\5_3.mp4"

(f_count, rec_count, fps) = vr.process(v_file, True)

print("Face detections: "+str(f_count))
print("Face recognitions: "+str(rec_count))
print("FPS: "+str(fps))

Here is the resulting video for the test we ran.

Note that, when a person is recognized, we draw in the outpt their name, along with the confidence of recognition (similarity score). If a person is not recognized, we draw the confidence of the face detection. As you can see from the above test result, our face recognition algorithm works fine on the video file. It correctly identified two people from the database (Lena and Marat) in most of the frames, and it did not recognize the unknown person throughout the video.

Let’s run a test on two more video snippets.

In the first video, we got 100% correct results. In the second video, the algorithm failed to recognize one of the people: It incorrectly named a person that was not in the database.

This showcases one common problem with face recognition systems. When a person is known (its face is in the database), the system correctly identifies them by assigning the greatest similarity value. However, when the detected person is unknown, the algorithm can still find a person in the database who looks a bit like the detected one.

We can resolve the above issue by changing the min_distance argument when initializing the recognizer. Note that the confidence for the woman who had been wrongly identified as a known person is never greater than 0.65. So if we set the min value of the similarity (distance) to 0.35, the false identification won’t reoccur.

Next Step

Now we have a good face recognition system. In the next article, we’ll create a Docker container for this system. Stay tuned!