Generative Adversarial Networks using Pytorch

Here I’ll be talking about GANs and how they can be used to generate images of fashionable items like shirts, shoes etc from the MNIST dataset and the code for doing this will be well explained and I shall be using PyTorch for this purpose.

Attyuttam Saha
9 min readAug 20, 2020
Image courtesy: Liberal Dictionary

A bit about GANs

Now, I am assuming we all have a fair bit of knowledge with Deep Neural Networks and most of of the time we spend applying these concepts the use cases are generally like classification, predictions etc. Now, we know these tasks fall under the category of supervised learning for which DNNs are famous. But what we intend to achieve by using GANs is called Generative Modelling.

According to Jason Brownlee in his article on GANs:

Generative modeling is an unsupervised learning task in machine learning that involves automatically discovering and learning the regularities or patterns in input data in such a way that the model can be used to generate or output new examples that plausibly could have been drawn from the original dataset

Now, to create a GAN we need two different kinds of models: Generator Model and Discriminator Model.

The purpose of the Generator model is to generate new examples out of random input vectors provided to it, now how it does that, we are going to ponder on that pretty soon !

And the Discriminator is a simple classifier that takes input an image and classifies it into either a Fake image or a Real Image.

By Fake image I mean an image created by the Generator and by Real image I mean an image that has been provided from the dataset and the whole concept of GANs revolves around the generator trying to fool the discriminator !

The Generator Model. Picture by Jason Brownlee in Machine Learning Mastery
The Discriminator Model. Picture by Jason Brownlee in Machine Learning Mastery

Why is it called an Adversarial Network ?

Now, as we shall be seeing in the code that the two models are trained in tandem to each other. The generator essentially creates a set of images and these are provided to the discriminator which tries to tell whether the given image is real or fake. As the discriminator is also being trained with inputs from the dataset, it becomes well trained in understanding what the real images are, hence, if the generator is successful in making the discriminator believe that the image generated by it is a real image, the generator has done a good job !

We can think of the generator as being like a counterfeiter, trying to make fake money, and the discriminator as being like police, trying to allow legitimate money and catch counterfeit money. To succeed in this game, the counterfeiter must learn to make money that is indistinguishable from genuine money, and the generator network must learn to create samples that are drawn from the same distribution as the training data.

- NIPS 2016 Tutorial: Generative Adversarial Networks, 2016.

Hence, both the models are trying to act as opponent to each other, the generator is trying to fool the discriminator and the discriminator is trying to avoid being fooled. Once the generator successfully fools the discriminator, our job is done ! Thus, both are adversaries to each other. Hence, the term adversarial comes into the picture.

The entire architecture

The GAN. Picture by Jason Brownlee in Machine Learning Mastery

The Code

The code that I will be using is from a notebook by Jovian.ml and has also been inspired from this repository of PyTorch tutorials.

I will also be using Jovian to save and commit the code and fret not as this shall not be blocker in your learning process as you can commit your code anywhere you want !

Let’s dive in !

Imports and Data Loading

In the code block above, all the necessary libraries are being imported and then the FashionMNIST dataset from torchvision.datasets has been used to download the Fashion MNIST dataset. In the process of downloading I am also converting the downloaded data into tensor and am also normalizing the images using the standard deviation and mean value of 0.5 which ensures that all the pixel values of the image range from -1 to 1. As this is a gray-scale image, hence there is only one channel because of which both the standard deviation and the mean value has one value each.

Viewing the tensor

We should see the data that we have downloaded and converted to tensor right !

Here, I have printed the label of the first MNIST image and a part of the tensor and to ensure the range of pixel values in the image, I have also printed the minimum and the maximum pixel value and as expected its -1 and 1 respectively.

Viewing the Image

The code above is a helper method to de-normalize the image before viewing.

Now, let’s see one of the images from the dataset.

Setting up the DataLoader

Now, during training we would want the data to be loaded in batches and the training be done on a batch and then update the weights and stuff. To achieve this we use the data loader provided by PyTorch.

We also view the labels of all the images of the first batch and also one of the images of the first batch to ensure that the batch is getting properly loaded.

Finding out the appropriate device

Now, it is wise to use the GPU for image processing problems like this one but in case we don’t have one we might have to end up using the CPU. So, to ensure that the models get loaded onto the appropriate device, the following piece of code has been written.

We can see that the device is of type cuda which means that currently the GPU is active. I have used the kaggle environment to run this piece of code.

The Discriminator Model

As explained above, the discriminator is a simple classifier that tries to distinguish between fake and real images. To perform a better classification we can definitely use a CNN model but in this case as the dataset is not that complicated, we are using a simple feed forward neural network, you can definitely try using a CNN to achieve much better results !

I have added in comments the steps that the Discriminator model has, so I am not doing redundant stuffs here :p

For readers who are not familiar with the concept of leaky ReLU. It differs from normal ReLU only by the fact that values less than 0 are not set to 0 but some factor of the number itself.

Left: ReLU , Right: Leaky ReLU. Source: Quora answer by Daria Shamai

The Generator Model

The input to the generator model is random input vector also called a latent variable.

According to Jason Brownlee:

We often refer to latent variables, or a latent space, as a projection or compression of a data distribution. That is, a latent space provides a compression or high-level concepts of the observed raw data such as the input data distribution. In the case of GANs, the generator model applies meaning to points in a chosen latent space, such that new points drawn from the latent space can be provided to the generator model as input and used to generate new and different output examples.

Now, one might think why did we use this new tanh() activation function in the end. What surprise element does it bring to the table ?!

Well, these explanations are provided in research papers and I have merely implemented them. Although, there’s an explanation provided in the paper by Alec Radford & Luke Metz from indico Research and Soumith Chintala from Facebook AI Research titled Unsupervised representation learning with deep convolutional generative adversarial networks (https://arxiv.org/pdf/1511.06434.pdf]) as:

The ReLU activation (Nair & Hinton, 2010) is used in the generator with the exception of the output layer which uses the Tanh function. We observed that using a bounded activation allowed the model to learn more quickly to saturate and cover the color space of the training distribution. Within the discriminator we found the leaky rectified activation (Maas et al., 2013) (Xu et al., 2015) to work well, especially for higher resolution modeling.

Loading models to appropriate device

We had seen earlier how to load models and data to CPU/GPU. Now as we are done with defining the models, it is time for us to load the models onto the GPU.

Training the Discriminator

As the discriminator is a binary classifier, so I will be using the binary cross entropy loss function to quantify the efficiency at which the discriminator is able to classify. Also, I will be using the Adam optimizer for performing the gradient descent.

So, the comments are pretty explanatory but I will try to provide a picture of what’s really going on here.

So, essentially we would want the discriminator to classify all the images from the Fashion MNIST dataset as real and the ones from the generated dataset as fake so, we find the loss for the images from the MNIST dataset by setting the labels to be compared with in the loss function as 1 and vice versa for the images generated by the generator. After we have calculated the loss for both the real and fake set of images, we add the two losses as the net loss for the discriminator. You can also observe that only the gradients of the discriminator is being calculated and not that of the generator as in this piece of code we are only training the discriminator.

Training the Generator

The generator produces images, hence training it is a bit of a challenge. But, by using the discriminator as part of the loss function, you will see how easily we achieve this task.

Firstly, we generate a batch of images from the generator. Now, we pass these images to the discriminator but we set the labels to 1 to try to fool the discriminator into understanding that what we have passed to it are real images. This will tell us how far are the images generated by the generator from real images. We will then use the loss to update the weights of the generator.

Again, we can observe that the weights of the generator are being updated, the weights of the discriminator remain intact and is not getting affected.

Training the Model

We will create a directory to save the intermediate results of the generator during training.

We will save a batch of real images into this directory. This is an unnecessary step and if you want, you can avoid this.

Now, the code for creating the directory to save intermediate images of the generator is done, we will now create a helper function to save the generated images into this directory.

Let’s start with the training.

Saving checkpoints and viewing intermediate data

Now, that the training is done, I have saved the model checkpoints as you can see the piece of code below:

Now, let us view some of the intermediate results:

Creating the learning video (additional)

Now, as an addition, we can also create a video of the learning process where we collate all the intermediate images and put them on a video. We use opencv to achieve this.

Viewing the losses and accuracies

Our main aim is to ensure that the loss of the generator reduces overtime given that the loss of the discriminator does not become too high.

Save and Commit(additional)

In this part I will be using Jovian.ml to save my code. Feel free to skip this part if you want to commit in some other platform.

Conclusion

In this article, I have written about my understanding of GANs and have also provided the code for generating images from the fashion MNIST dataset. This code has been taken from one of the notebooks from Jovian.ml and I have used that code on the FashionMNIST dataset. I am also currently in the learning process so do drop a comment or mail me if you find any discrepancies. Thanks for reading, do not forget to throw a clap !

--

--

Attyuttam Saha

Software Engineer, MCA from NIT Warnagal, loves to read and watch horror and talk about programming.