Generative Model: Image Generation

This notebook is an example of a Generative Adversarial Nets(GANs) in Kokoyi. Specifically, we are going to use a simple GAN to perform handwritten digit image generation.

GAN

Generative Adversarial Nets(GANs) is one of the most popular structures in the class of generative models. A typical GAN simultaneously train two models: a generative model $G$ and a discriminative model $D$.

The data pipeline looks like this: image

$D$ and $G$ play the following two-player minimax game with the loss function $V(G, D)$:

$$ \min_G\max_DV(D,G)= \mathbb{E}_{x\sim p_{\text{data}}(x)}[\log D(x)] + \mathbb{E}_{x\sim p_{z}(z)}[\log(1 - D(G(z)))]$$

Following Algorithm 1 in the paper, training is split up into two main steps. The first step updates the discriminator $D$ and the second step updates the generator $G$.

drawing

We train $D$ to maximize the probability of assigning the correct label to both real images and fake image generated by $G$, so it acts as a binary classifier. In other words, we want to maximize $\log(D(x)) + \log(1-D(G(x)))$. We can write the loss in Kokoyi using BCELoss:

We train $G$ to generate better fake images to fool $D$ such that it cannot tell the real from the fake. To do so, we want images generated from the latent code, i.e. $G(z)$, to have higher probability to be taken as real, i.e. we will minimize $\log(1-D(G(z)))$ with $D$ fixed. We can write generator's loss in Kokoyi:

Image Generation using GAN

Now we can specify the structure of the discriminator $D$ and the generator $G$. To make it simple, we use a simple multilayer perceptron(MLP) as $D$ and a transposed convolution module as $G$.

You can let Kokoyi to set up the initialization for D and G (just copy and paste and then fill up what's needed):

Click here

to see the default initialization code generated by Kokoyi for this model (You can use the button above to insert such a cell while at a Kokoyi cell):
class D(torch.nn.Module):
    def __init__(self):
        """ Add your code for parameter initialization here (not necessarily the same names)."""
        super().__init__()
        self.W = None
        self.b = None

    def get_parameters(self):
        """ Change the following code to return the parameters as a tuple in the order of (W, b)."""
        return None

    forward = kokoyi.symbol["D"]
class TransposedConvBlock(torch.nn.Module):
    def __init__(self):
        """ Add your code for parameter initialization here (not necessarily the same names)."""
        super().__init__()
        self.ConvTranspose2d = None
        self.BatchNorm2d = None

    def get_parameters(self):
        """ Change the following code to return the parameters as a tuple in the order of (ConvTranspose2d, BatchNorm2d)."""
        return None

    forward = kokoyi.symbol["TransposedConvBlock"]
class G(torch.nn.Module):
    def __init__(self):
        """ Add your code for parameter initialization here (not necessarily the same names)."""
        super().__init__()
        self.C = None
        self.H = None
        self.W = None
        self.Linear = None
        self.TransposedConvBlocks = None
        self.Conv2d = None

    def get_parameters(self):
        """ Change the following code to return the parameters as a tuple in the order of (C, H, W, Linear, TransposedConvBlocks, Conv2d)."""
        return None

    forward = kokoyi.symbol["G"]

Here's the completed module definitions.

Let's first do some setup:

MNIST supported by torchvision is used to train the model. The dataset consists of 2D images of handwritten numbers and corresponding integer labels (from 0 to 9).

Finally, we can set the hyper-parameters and start training!