Here we'll look at: what images we need in our dataset, how many pictures we need, we revisit the age estimation as a classification problem, acquire the images, and organize the dataset.
In this series of articles, we’ll show you how to use a Deep Neural Network (DNN) to estimate a person’s age from an image.
This is the second article of the series, in which we’ll talk about the selection and acquisition of the image dataset. This dataset is the first core component you’ll need to solve any image classification problem, including age estimation. This dataset is used in the DL pipeline for training our CNN to distinguish images with human faces and classifying them into age groups – child, teen, adult, and so on. Also, the data will be used to estimate the precision of our CNN model, that is, how correct it predicts a person’s age from a picture.
What images do we need in our dataset? As we try to estimate people’s age, we’ll need images of human faces. We must know the age of the person in each image. Using DL terms, our images need to be "labeled" with the age of the person measured in years.
How Many Images?
How many pictures do we need? This question is really hard to answer exactly. The number depends on the problem we are solving and the goal we’ve set. DL researchers say that solving a classification problem requires at least 1,000 examples per class. The minimal number of data samples for successful training of a CNN stems from the problem of overfitting. If our dataset is insufficient for CNN training, our model will not be able to generalize the information. So the CNN model would be very precise while processing the training data, but it will give bad results when handing the testing data.
As we stated in the first article of this series, we’ll consider age estimation as a classification problem. So we need to split the entire age range into several age groups, and then classify a person as belonging to one of these groups. For practical purposes, let’s use the following age groups:
|Age group ||1-5 ||6-10 ||11-15 ||16-18 ||19-21 ||22-30 ||31-44 ||45-60 ||61-80 ||80-100 |
With ten groups (classes), we’ll need at least 10,000 images for CNN training. We’ll need extra images for testing the quality of CNN after the training. Commonly, testing requires 10-20% of the training data. So, we’ll need at least 11-12 thousand images in our dataset to train and test our CNN.
Don’t forget: not only are the total number of images important, but so is the sample distribution. The ideal dataset includes the same number of samples in each class to avoid the class imbalance issues. Another important aspect: a subset of images for every age group must include samples for a range of conditions: smiling and serious faces; persons with and without glasses; men and women, and so on. This will allow CNN to extract the various features from the images and select only those features that help estimate age.
Get the Data
After we decide what data we need, it is time to acquire the images. Fortunately, there are many open datasets of faces, which we can use for our purposes. A quick Internet search produces several datasets with age-labeled human faces:
- FGNET – about 1,000 face images of different sizes with age labels)
- UTKFace – about 23,000 face images, sized 200 x 200 pixels with age, gender, and race labels)
- Adience – about 25,000 face images with age and gender labels)
- IMDB-WIKI – 500,000+ images with age and gender labels
We’ll use the UTKFace dataset, which contains images with properly aligned and cropped faces, single face per image.
As you can see, every file name contains three prefix numbers. The first number is the age of the person in years, the second is its gender label, and the last one is the ethnicity label.
Arrange the Data
For convenience, let’s organize the dataset on our working computer in the following directory structure:
The UTKFace folder contains the original images from the UTKFace dataset, with the images for ages greater than 100 removed. Now we need to split the original dataset into the training and testing parts. This simple Python class will do the splitting:
from shutil import copyfile
def __init__(self, part):
self.part = part
def split(self, folder, train, test):
filenames = os.listdir(folder)
for (i, fname) in enumerate(filenames):
src = os.path.join(folder, fname)
if (i % self.part) == 0:
dst = os.path.join(test, fname)
dst = os.path.join(train, fname)
part parameter passed into the class constructor is the part of the original data that will be used for testing. The
folder parameter is the path to the original dataset directory, while the
test parameters stand for the paths to the training and testing dataset directories, respectively. Executing the following Python code will split our dataset into the training and testing sets:
ds = DataSplitter(10)
ds.split(r"C:\Faces\UTKFace", r"C:\Faces\Training", r"C:\Faces\Testing")
The original data split with ratio 1:9. That is, every tenth original image will end up in the testing dataset, and the remaining nine will stay in the training one.
In this article, we explained what data we’ll need to solve the problem of age estimation with DL and CNN. We found some open datasets suitable for our purposes and selected one which we’ll use further in this series. We then split our dataset and organized it in our workspace conveniently for further use.
We are now ready for the next step – CNN design.