This post was inspired by a YouTube video I published on my channel some time ago.
Source Code Download
What libraries are we going to use:
Who is this article for:
- A developer who already is somewhat familiar with the Face Recognition problem. For all of you that are struggling with this solution, I will be posting a very detailed tutorial on each workflow step described bellow backed up with code files and many examples. For the more experienced ones we are still not done. This application can be improved by a lot.
- I will be answering all additional questions here.
- Detect Face (Phase I)
- Extract face feature vector (Phase II)
- Find the most similar face from our data set (Phase III)
So let’s get started…
In order to create a face detector I used OpenCV. This library provides us with a great real time face detection algorithm using Haar Cascades. OpenCV for JAVA has a class called CascadeClassifier that implements the algorithm. At object creation time of the class, you will need to provide the location of the xml file that CascadeClassifier needs in order to detect faces on the image. The xml file that I actually use is a pre-trained file that is dispatched with OpenCV. The file is called “haarcascade_frontalface_alt.xml” and you will find it included in the project. Detecting faces is as simple as calling the function:
If you want to know more about this function I suggest to read the following documentation on OpenCV. More in depth on this problem and OpenCV API will be covered in another tutorial.
The output from detectMultiScale is a MatOfRect type that contains an array of type Rect. Now this array actually contains the X and Y coordinates, as well as the Width and Height of the area where the face is located on the image. Now given that rectangle we can actually extract the faces individually from the image. Since the image is loaded in a Mat, we can extract the face pretty easy just by calling the submat function and passing the face Rect as a input parameter. Next what I do is I save the face as an image. And that is pretty much it.
The whole process can be seen in ExtractFacesFromImageTask.java file.
With this we finish the task of Face Detection.
As I said more on this topic in another tutorial…
Once we have extracted the faces we can continue with the next item defined in our workflow and that is face recognition. Since the goal is to create an app that can recognized undefined number of faces, we need to split this section in two. The first part will explain how to obtain facial features and the second one would measure the similarity between the extracted facial features and the features we have in a data set.
Data Set in our project is a file that contains list of face feature vectors and face labels. So every single face feature vector would contain a label as well.
Creating a facial features vector
In order to recognize faces we would need to extract face features from a given image. We can either train our own neural network to get good quality facial features or we can use a pre-trained one. In this project I do use a pre-trained one.
Pre-trained network is just a model that was already trained and has calculated weights in it. That’s all…
Usually people like using the first couple of layers of such network because they tend to generalize better. The main advantage is that I don’t have to train the network myself. Sometimes I don’t have big enough training set or I don’t have the necessary computing power to do the training. So I opt out for using these pre-trained models.
Creating the facial features vector boils down to using a pre-trained neural network model that we can obtain from the DL4J library. In order to do that we would need to load the “VGGFace” Computational Graph. The idea is to pass the face image into the “VGGFace” computational graph to process it and we would use the output of layer “pool4” as our input into the classification method we will use later. Since this is a pre-trained model it would extract excellent features for the classifier. I chose to get the outputs out of the layer named “pool4” which I think it is OK for this demo.
So why did I choose that approach?
Since I would like to create a face classifier that can be trained on unknown amount of faces, I could not use the whole pre-trained network, since I don’t know how many faces I would enter. Instead I decided to take the output from the neural network and write it as a n-dimensional vector. The logic behind this is the following: Since similar faces would yield similar vectors they will be grouped in a cluster. Or simply put, similar faces would end up having similar vector elements. So if I want to classify a new face, I would run the same procedure, get the output vector from the neural network and calculate the distance between the vectors that are already in my data set. My data set contains a vector and a label, so If I find which vector is most similar to the one I am processing at the moment I will also find the label. I do that using Euclidean Distance but even better solution would be KNN algorithm.
So the code starts with loading in the weights of “VGGFace” into our Computational Graph. We read in the image using NativeImageLoader to transform the image into the correct size for the “VGGFace” neural network. Then we use the function Featurize from our TransferLearningHelper object. Next we would like to “flatten” the matrix, into a vector, so we use the function reshape from ND4J library.
Finally we have our vector that represents the facial features from the person and we have a label. We combine them and put them in a data set. When a new face comes along, we use the same procedure as described above to get the feature vector, flatten it and compare it to the rest of our data set. The vector that has the closest distance to it (most similar) wins and we display the label (the name of the person).
If the presented label is wrong we can correct it by entering the new label. Please note that we already have the feature vector for the face so all we need to do is insert this record into our data set. We use this method to correct miss classifications and increase the accuracy.
This is it for now…
We will cover every topic described here in more detail very soon. Also please note that this is not the most optimal and best solution for this problem. So how to improve accuracy and optimize for speed will be cover in another post.
Complete source code: Face Recognition Project