Colombian hand gesture for sign language classification

Using residual networks and transfer learning to train a model to classify various hand gestures of sign language.

Attyuttam Saha
5 min readAug 14, 2020

In this article, I will dive straight into the code and its caveats and points to discuss on how I have approached this problem of classification. I will be providing notes and links to cases where explanation of a particular topic is required but I will try not to discuss much on theory.

Importing Libraries

These are the libraries that I have imported and I will be providing basic explanations as I will continue to use them. For more in-depth understanding please refer to their documentations which are easily available.

Splitting into training and validation set and exploring the dataset

In the code block above, I have done the following three things:

  1. Data augmentation
  2. Splitting data into training and validation sets
  3. Observing the length of the each set

As the images are way too large in size, so processing is difficult, hence, I have resized the image into 32x32 pixels and before doing that, I have taken a random crop of 2500 from the image so that a bit of randomness is also thrown in. To ensure that the images are more random in nature, I have carried out the process of data augmentation by performing random crop, random vertical flip and random horizontal flip.

To ensure that the splitting of the datset into validation and test set is done randomly, I have used the random_split() method from torch.utils.data and to ensure that the random splitting is the same every time the code is run, I have set a random seed using torch.manual_seed().

I have kept the validation set size to be 400 and the training set size as 2925. As, the number of data is really very less, I haven’t created any test set and will be relying on the validation set solely to evaluate my model.

Visualizing the data

I will now define a helper function that uses the matplotlib.pyplot() method to plot the images.

Let us now see some of the examples.

Creating DataLoaders

Now, to ensure that the data is loaded in batches during the training we need to create data loaders for the training and validation sets.

As you can see in the code above that I have set the batch size to be 64. Please read more about the other parameters from the documentation.

Let us see a batch of data.

Defining the model

Let us now define the accuracy method which will help us visualize the accuracy of our model during the training process and we shall also define a helper class that has methods to perform loss calculation during training and validation, loss calculation and prediction generations during validation and printing logs to update us during the training process.

Now, we will define the model and we will inherit the class ColombianHandGestureImageClassificationBase which has all the utilities for training and validation as defined in the code block above.

As seen in the code, I have used a resnet50 pre-trained model and I have replaced the last layer according to my need to produce 11 outputs as the dataset I have is classified into 11 different hand gestures.

This method of using a pretrained model to solve your problem by making small changes in the model like changing the last layer in this case is called transfer learning.

One interesting thing that you can see in the code above are the freeze() and unfreeze() methods.

I will explain in more details their functionality when I will actually be using them but right now just to give a basic idea, the freeze() method is used to freeze all the layers of the model except the last one so that these layers do not get trained which means that during the training process the weights in these layers shall not change at all. And the opposite of that is the unfreeze() method which is used to ensure that all the layers gets trained, thus, the weights of the layers which has been unfreezed shall now be updated.

Helper method to use device dynamically

Now, the device on which our model gets trained might be a CPU or a GPU(cuda) and our model shall train on the available device, so the helper function below shall load the model depending on the device available.

As you can see that the device available is cuda as I have trained this model on a GPU accelerator in kaggle.

Let us now load the model onto the GPU.

Training

There are many things going on in the code block above. Let us try to understand what are the things going on.

Let us start with the evaluate() method. This method is essentially used to perform the validation step. It calls the validation_step() that generates the predictions by the model and also produces the output for the validation data which helps in assessing the model. The annotation @torch.no_grad() is to signify that no gradient calculation will be done when the validation step is in progress.

Now about the get_lr() method. Before I talk about this method, we first want to know about “learning rate scheduling”. In this method, instead of using a single learning rate, we use different learning rates throughout the training process, we just provide the maximum possible learning rate and then the model starts by using a significantly small learning rate and then gradually increases it to the maximum allowed learning rate and then slowly decreases it significantly. This process of learning rate scheduling is called the “One Cycle Learning Rate Policy” and you can read more on this here.

So back to this method get_lr(). In the fit_one_cycle() method which is essentially the method used for training we define a variable called sched which is the learning rate scheduler and we use the scheduler provided by torch, so when we do a sched.step() the learning rate takes the next possible value and to observe what value is the current learning rate, we use the get_lr() method.

So, then its the general training process, where we load in batch and then train followed by validation.

Let’s start training the model!

We define the hyper-parameters in the code block above.

We first freeze all the layers other than the final layer by using the model.freeze() method. We do this because as we are using the pre trained model(resnet50) which is already trained to identify objects, we do not want to retrain and adjust the weights.

After the training is done, we now understand that the model is doing significantly well and so we now want to tune all the layers, so we unfreeze all the layers and perform another round of training.

We can observe that we have reached a significant accuracy of 67%.

Visualizing the results

Let us now plot the accuracies, losses and the learning rates.

Conclusion

So, in this article I have majorly focused on the code and the dataset that I have used and I haven’t delved much into the code. For further understanding of the code do check out this YouTube video by Jovian.ml where the concepts are thoroughly explained. Also, there are very high level prospects of increasing the accuracy by using different number of epochs, maybe using a different model, maybe by not using transfer learning at all. Please try your hands on and tell me about your findings. Also, do notify me if you see any discrepancies in this article. Appreciate you spending your time to read this article. Thanks !

--

--

Attyuttam Saha
Attyuttam Saha

Written by Attyuttam Saha

Software Engineer, MCA from NIT Warnagal, loves to read and watch horror and talk about programming.

No responses yet