Image Generation Using Variational Autoencoder — With Code — Part 2

6 min readJun 19, 2021

This part of the blog consists of explanation about variational autoencoder, math behind it, code and the result

This blog consists of following topics in order:

Recap — Autoencoder
Variational Autoencoder
Code — Image Generation
Conclusion
References

Recap — Autoencoder

In the last blog we understood how autoencoder works. Variational autoencoder works similar to autoencoder but with little variation. This variation helps us in generating the images using the latent space. Let’s dive deep into it and try to understand the concepts behind variational autoencoder.

Just to recap, Autoencoder takes the high dimension data and forward passes through the encoder and creates the latent space or compressed data. This compressed data is reconstructed back using the decoder. The gradients are adjusted and error is reduced to optimise the compressed data or latent space. This being said, let’s get on to variational autoencoder.

Variational Autoencoder

In the variational autoencoder, the bottleneck vector is replaced by 2 separate vectors mean of the distribution and standard deviation error of the distribution. So whenever data is fed into the decoder, samples of the distribution is passed through the decoder. The loss function of variational autoencoder consists of 2 terms. First one is the reconstrcution loss, it is same as the autoencoder expect we have expectation term because we are sampling from the distribution. The second term is the KL divergence term. The second term ensures that it stays within the normal distribution. We basically train to keep the latent space close to mean of 0 and standard deviation of 1 which is equivalent to normal distribution

Now, we have a major problem. The mean of the vector and standard deviation representation is sampled into a vector and these samples are fed to decoder. The problem is we cannot do backpropagation or we cannot push the gradients into the sampled vector. In order to run the gradients through the entire network and train the network we will using reparameterization trick.

FIg:2 — Reparametrized Trick used to push Gradients for Training

So the trick goes as follows, if you see the latent vector then it can be seen as the sum of the mu, which is the parameter you are learning, sigma which is also the parameter we are learning and multiplied by epsilon, this epsilon is where we put the stochastic part. This epsilon is always gonna be guassian with zero mean and standard deviation of 1. So the process is we gonna sample from epsilon, multiplied by sigma and add it with mu to have latent vector. So mu and sigma are the only things we have to train and it would be possible to push the gradients to decrease the error and train the network. The epsilon, is ok not to be trained. We need the stochasticity which would help us in generating the images. This can be seen in below figure 3.

Fig:3 — Sampled Vector Representation for Gradient tuning

Code — Image Generation

Here, I will be using Tensorflow-Keras framework of python to train VAE and generate images. Let’s import the required libraries to start with.

import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf; tf.compat.v1.disable_eager_execution()
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Input, Dense, Conv2D, Conv2DTranspose, Flatten, Lambda, Reshape
from tensorflow.keras.models import Model
from tensorflow.keras.losses import binary_crossentropy
from tensorflow.keras.datasets import mnist
np.random.seed(25)
tf.executing_eagerly()

Before jumping on to the code, let’s write all the functions that we will be using iteratively. Few of the iterative function we will be using is latent vector calculation, loss function, displaying the image.

# A function to compute the value of latent space using mu and sigma
def compute_latent(x):
    mu, sigma = x
    batch = K.shape(mu)[0]
    dim = K.int_shape(mu)[1]
    eps = K.random_normal(shape=(batch,dim))
    return mu + K.exp(sigma/2)*eps# The loss function for VAE
def kl_reconstruction_loss(true, pred):
    # Reconstruction loss (binary crossentropy)
    reconstruction_loss = binary_crossentropy(K.flatten(true), K.flatten(pred)) * img_width * img_height# KL divergence loss
    kl_loss = 1 + sigma - K.square(mu) - K.exp(sigma)
    kl_loss = K.sum(kl_loss, axis=-1)
    kl_loss *= -0.5
    # Total loss = 50% rec + 50% KL divergence loss
    return K.mean(reconstruction_loss + kl_loss)# A function to display image sequence
def display_image_sequence(x_start, y_start, x_end, y_end, no_of_imgs):
    x_axis = np.linspace(x_start,x_end,no_of_imgs)
    y_axis = np.linspace(y_start,y_end,no_of_imgs)
    
    x_axis = x_axis[:, np.newaxis]
    y_axis = y_axis[:, np.newaxis]
    
    new_points = np.hstack((x_axis, y_axis))
    new_images = decoder.predict(new_points)
    new_images = new_images.reshape(new_images.shape[0], new_images.shape[1], new_images.shape[2])
    
    # Display some images
    fig, axes = plt.subplots(ncols=no_of_imgs, sharex=False, sharey=True, figsize=(20, 7))
    counter = 0
    for i in range(no_of_imgs):
        axes[counter].imshow(new_images[i], cmap='gray')
        axes[counter].get_xaxis().set_visible(False)
        axes[counter].get_yaxis().set_visible(False)
        counter += 1
    plt.show()

Import the dataset and plot few examples

# Loading dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()# Displaying data
fig, axes = plt.subplots(ncols=10, sharex=False, sharey=True, figsize=(20, 7))
counter = 0
for i in range(120, 130):
    axes[counter].set_title(y_train[i])
    axes[counter].imshow(X_train[i], cmap='gray')
    axes[counter].get_xaxis().set_visible(False)
    axes[counter].get_yaxis().set_visible(False)
    counter += 1
plt.show()

Normalise the data and define variables for further use

# Normalize values such that all numbers are within
# the range of 0 to 1
X_train = X_train/255
X_test = X_test/255# Convert from (no_of_data, 28, 28) to (no_of_data, 28, 28, 1)
X_train_new = X_train.reshape(X_train.shape[0], X_train.shape[1], X_train.shape[2], 1)
X_test_new = X_test.reshape(X_test.shape[0], X_test.shape[1], X_test.shape[2], 1)# Defining some variables
img_height   = X_train_new.shape[1]    # 28
img_width    = X_train_new.shape[2]    # 28
num_channels = X_train_new.shape[3]    # 1
input_shape =  (img_height, img_width, num_channels)   # (28,28,1)
latent_dim = 2    # Dimension of the latent space

Define encoder, decoder and latent space. These are hyper parameters. Layers of encoder and latent space dimension.

# Constructing encoder
encoder_input = Input(shape=input_shape)encoder_conv = Conv2D(filters=8, kernel_size=3, strides=2,padding='same', activation='relu')(encoder_input)
encoder = Flatten()(encoder_conv)#latent space
mu = Dense(latent_dim)(encoder)
sigma = Dense(latent_dim)(encoder)latent_space = Lambda(compute_latent, output_shape=(latent_dim,))([mu, sigma])# Take the convolution shape to be used in the decoder
conv_shape = K.int_shape(encoder_conv)#Decoder
# Constructing decoder
decoder_input = Input(shape=(latent_dim,))
decoder = Dense(conv_shape[1]*conv_shape[2]*conv_shape[3], activation='relu')(decoder_input)decoder = Reshape((conv_shape[1], conv_shape[2], conv_shape[3]))(decoder)decoder_conv = Conv2DTranspose(filters=8, kernel_size=3, strides=2, padding='same', activation='relu')(decoder)
decoder_conv =  Conv2DTranspose(filters=num_channels, kernel_size=3, padding='same', activation='sigmoid')(decoder_conv)

Define Model, view summary of Encoder, Decoder and VAE

# Actually build encoder, decoder and the entire VAE
encoder = Model(encoder_input, latent_space)
decoder = Model(decoder_input, decoder_conv)
vae = Model(encoder_input, decoder(encoder(encoder_input)))

Model Summary of Encoder, Decoder and VAE

Now let’s train the network and view results.

# Compile the model using KL loss
vae.compile(optimizer='adam', loss=kl_reconstruction_loss)# Training VAE
history = vae.fit(x=X_train_new, y=X_train_new, epochs=30, batch_size=32, validation_data=(X_test_new,X_test_new))

Here is the training curve.

Code to generate image using the 2D latent vector. Note, higher the latent vector dimension better would be the generated images.

Conclusion

Images can be generated using variational autoencoder. To some extent latent vector dimension can be used as hyper parameter to generate better images.

There is an extension of the Variational Autoencoder which uses beta term. This is called Disentangled Autoencoder. Also, Generative Adversial Networks(GANs) can be used to generate more images. This will be experimented in further blogs.

Image Generation Using Variational Autoencoder — With Code — Part 2

Recap — Autoencoder

Variational Autoencoder

Code — Image Generation

Conclusion

Written by Praveen Krishna Murthy

No responses yet