Classifying Gender in images using Deep Learning

Electricity transformed countless industries: transportation, manufacturing, healthcare, communications, and more. AI will bring about an equally big transformation.
-Andrew Ng

7 min readDec 28, 2020

What is Deep Learning ?
Deep Learning is a subset of Machine Learning that has networks capable of learning unsupervised from data that is unstructured or unlabelled.

Introduction

This blog is a part of a course project from Deep Learning with PyTorch: Zero to GANs. This course is a 6-week long course by Aakash N S and his team at Jovian. It is a beginner-friendly online course offering a practical and coding-focused introduction to deep learning using the PyTorch framework. It was a novel experience for all, with the lectures being delivered via Youtube live-streaming (on the beloved freeCodeCamp Youtube channel) having participants from around the world. The lessons and topics in the course are taught through Jupyter Notebooks hosted on Jovian with a really interactive discussion forum where you will get help on your questions, errors, and doubts from the worldwide data science community.

Framework and Dataset

The framework used in this project for deep learning is PyTorch. PyTorch is an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook’s AI Research lab. There are many open source libraries for deep learning of which popular ones other than PyTorch are TensorFlow(developed by Google) and Keras (now integrated with TensorFlow).

Objective — Setting a neural network model that achieves the highest accuracy rate for the classification of male and female images.

The dataset for training the model has been downloaded from Kaggle containing 58.7k images of male and female. The dataset contains two folders namely Training and Validation. The dataset of Validation will be used for both validating and testing purposes.

Technique and Observation

In this blog, I will be using Convolutional Neural Network to train the model to classify the images of male and female. In the end, you will see that in a very short time of training the model achieves more than 95% accuracy. The approach for training and testing the dataset will be as follows:

Importing all the libraries required
Exploring the Dataset
Defining Training and Validation Dataset
Defining the Model (Convolutional Neural Network)
Using a GPU
Training the Model
Testing with Individual Images

Let’s get started……….

Importing all the libraries

Exploring the Dataset

Let’s check the folders in the dataset and classes in each folder

As stated above, the dataset has two folders Training and Validation, each having two classes male and female.

Now we will check the number of images in each class of the Training folder.

The above directory structure (one folder per class) is used by many computer vision datasets, and most deep learning libraries provide utilities for working with such datasets. We can use the ImageFolder class from torchvision to load the data as PyTorch tensors.

Defining Training and Validation Dataset

The images in the dataset are of varying size which may create problem in training and testing purpose, thus we will also use Resize transform to a particular size(in this case 64 x 64)

We can now check the shape of the image tensor of the training dataset

The list of classes is stored in.classes property of the dataset. The numeric label for each element corresponds to the index of the element's label in the list of classes.

We will now define some helper method to show the image with its label.

The validation dataset can be defined in a similar way

We can now create data loaders to help us load the data in batches. Dataloader is a tensor generator that provides utilities such as batching of data, shuffling along with the parallel loading of data keeping the memory usage nearly constant. It doesn’t load the complete dataset in memory which if allowed will result in the crashing of the program. For a more detailed explanation, you can check this Stanford blog. We’ll use a batch size of 50.

Defining the Model (Convolutional Neural Network)

The 2D convolution is a fairly simple operation at heart: you start with a kernel, which is simply a small matrix of weights. This kernel “slides” over the 2D input data, performing an element-wise multiplication with the part of the input it is currently on, and then summing up the results into a single output pixel. — [Source]

Check out the following articles to gain a better understanding of convolutions:

1. Intuitively understanding Convolutions for Deep Learning by Irhum Shafkat

2. Convolutions in Depth by Sylvian Gugger (this article implements convolutions from scratch)

There are certain advantages offered by convolutional layers when working with image data:

Fewer parameters: A small set of parameters (the kernel) is used to calculate outputs of the entire image, so the model has much fewer parameters compared to a fully connected layer.

2. Sparsity of connections: In each layer, each output element only depends on a small number of input elements, which makes the forward and backward passes more efficient.

3. Parameter sharing and spatial invariance: The features learned by a kernel in one part of the image can be used to detect a similar pattern in a different part of another image.

We will also use max-pooling layers to progressively decrease the height & width of the output tensors from each convolutional layer.

The Conv2d layer transforms a 3-channel image to a 16-channel feature map, and the MaxPool2d layer halves the height and width. The feature map gets smaller as we add more layers until we are finally left with a small feature map, which can be flattened into a vector. We can then add some fully connected layers at the end to get a vector of size 10 for each image.

Transformation after 2 convolutional layer

Let’s define the model by extending an ImageClassificationBase class which contains helper methods for training & validation.

We’ll use nn.Sequential to chain the layers and activations functions into single network architecture.

GenderCnnModel by extending ImageClassification base

Now we will instantiate the model and verify that the model produces the expected output on a batch of training data. The 10 outputs for each image can be interpreted as probabilities for the 10 target classes (after applying softmax), and the class with the highest probability is chosen as the label predicted by the model for the input image.

model = GenderCnnModel()

Using a GPU

As the sizes of our models and datasets increase, we need to use GPUs to train our models within a reasonable amount of time. GPUs contain hundreds of cores optimized for performing expensive matrix operations on floating-point numbers quickly, making them ideal for training deep neural networks. You can use GPUs for free on Google Colab and Kaggle or rent GPU-powered machines on services like Google Cloud Platform, Amazon Web Services, and Paperspace.

To seamlessly use a GPU, if one is available, we define a couple of helper functions (get_default_device & to_device) and a helper class DeviceDataLoader to move our model & data to the GPU as required.

We can now wrap our training and validation data loaders using DeviceDataLoader for automatically transferring batches of data to the GPU, and use to_device to move our model to the GPU

Training the model

We’ll define two functions: fit and evaluate to train the model using gradient descent and evaluate its performance on the validation set.

Before we begin training, let’s instantiate the model once again and see how it performs on the validation set with the initial set of parameters.

The initial accuracy is 0%, which is what one might expect from a randomly initialized model. We’ll use the following hyperparameters (learning rate, no. of epochs, batch_size, etc.) to train our model.

Jovian platform provides the method for recording the above parameters as it is important to record the hyperparameters of every experiment you do, to replicate it later and compare it against other experiments. We can record them using jovian.log_hyperparams.

The training starts now

As you can see that the accuracy achieved after 10 epochs is around 96% which is a great achievement.

We can plot the validation set accuracies to study how the model improves over time.

We can also plot the training and validation losses to study the trend.

Testing with individual images

While we have been tracking the overall accuracy of a model so far, it’s also a good idea to look at the model’s results on some sample images. Let’s test out our model with some images from the predefined test dataset of images. We begin by creating a test dataset using the ImageFolder class on Validation folder.

test_dataset = ImageFolder(data_dir+'Validation', transform=transform_face)

Let’s define a helper function predict_image, which returns the predicted label for a single image tensor.

Conclusion

On the whole, this model can help to identify males and females with 96% accuracy which can further improve with different techniques. It was fairly a simple example where computers learned to identify images of males and females. Various complex datasets are available online which require complex techniques involving augmentations, regularisation with ResNet to achieve state-of-the-art accuracy.