Tutorial: Image Classification with Lasagne – Part 2

In part two of this tutorial we will explore some common techniques to improve ConvNet classification results. We will use the code base from the previous tutorial and add some lines here and there. In this part of the tutorial, we will focus on metric visualization, reproducibility, parameter reduction and learning rate adjustment.

Random Seeds

Before we start, we need to do two very important things. In order to keep our results reproducible, we have to fix the random seeds used in Theano, Lasagne and NumPy. Random numbers are generated throughout the entire training process, e.g. for dataset shuffling, image augmentation or layer weight initialization. Fixing the NumPy and Lasagne seeds is easily done with four lines in our code:

We now have an object called RANDOM which we will use anywhere we need some randomness. By editing the .theanorc file, we can fix the seeds used in Theano. Simply add these lines:

(Note: It is completely up to you, which random seed you choose. Changing the random seed can sometimes negatively impact the classification accuracy due to shifts in the validation split. This effect does not play such a big role on larger, more diverse dataset.)

Metrics

Next, we should look at the results from our previous experiments. You might recall the mediocre validation accuracy of only ~44%. Let’s discuss some of the reasons why this may happen. Have a look at the PyPlot chart from the training of 30 epochs with our base implementation.

As we can see, after a few epochs, training loss and validation loss diverge by a great margin. This is a clear sign of overfitting. Our net becomes more and more capable of distinguishing the images from our training set. But it does so by focusing on semantically unlinked noise present in the dataset and not by recognizing the objects and their respective features. After training, our net is nearly perfect in predicting the labels of our training set, but it cannot transfer this knowledge to new images (our validation set). Sooner or later, every ConvNet will start overfitting. In our case, the ConvNet is not very complex and our dataset is pretty small (we might need tens of thousands of images or even millions to train a mighty ConvNet). It is important to keep track of the net accuracy by using validation images – images the net is not allowed to train on.

Another good way to visualize the performance of our ConvNet for classification problems is the so-called “Confusion Matrix”. It shows us the number of samples that were assigned correctly in a way, which allows us to see where our net needs to improve. We can plot the confusion matrix with PyPlot using the following code:

(Note: I changed some lines of code here and there in the script, which are not mentioned here. If you find yourself confronted with some errors, please have a look at the complete script from the GitHub repository.)

We predict in batches of our validation set. That means, we have to accumulate all the single matrices of each epoch into one final confusion matrix. To do so, we need to call the method clearConfusionMatrix() at the beginning of each epoch, call updateConfusionMatrix() after every validation batch and run showConfusionMatrix() at the end of each epoch to show the classification results.

Let’s have a look at the confusion matrix from our baseline experiment:

Ideally, all of the predicted labels would we correct (target label = predicted label in the matrix) and diagonally aligned in our confusion matrix. This is clearly not the case. Our most accurate class seems to be the class “cow”, as the ConvNet correctly labeled 44 of 75 instances or 58.6%. The least accurate one is the class “dog” with only 19 of 78 (24,4%) correct labels assigned. 23 cats were labeled as dogs, 21 dogs were labeled as cats, which is somehow understandable, as cats and dogs might look alike in some of the 64×64 pixel images. Nevertheless, we should try to increase the ConvNet performance to avoid confusion.

So, what can we do? We will try some of the most common techniques to raise the validation accuracy. We will keep track of the current results and compare the influence of each technique with our baseline experiment.

So far our stats look like this:

Run Max. Accuracy Best Epoch
Randomly guessing 1 of 5 classes ~20%
Baseline 44.0% 28

The accuracy of your experiments should always exceed the default or random guessing accuracy. Otherwise, your method is worse than guessing. Sometimes its easy to get fooled even by a validation dataset. Let’s assume we have two classes in our dataset. One of them contains dogs and consists of 100 images. The second class shows cats and includes 900 images. By default, we can reach an accuracy of 90% by simply assigning the class label “cat” to every image. We need to be careful with the interpretation of the evaluation metrics.

Model Optimization

Of all the things we could try to improve our results, we first should focus on the optimization of our baseline model architecture.  When we see heavy overfitting in our loss chart, we can assume that our model has too many parameters. Wait…too many? Yes, indeed. Our classification task is not very complex with only five classes and we have to adjust the model design (we have to do this for every new classification task, every new dataset).

Despite the heterogeneous dataset, we do not need as many weights, as we currently have. Let’s cut down the parameter count of our model by splitting the number of filters in our convolutions and the numbers of hidden units in our dense layers in half. This reduces the net parameter count by 75% to ~230.000. The buildModel()-function now looks like this:

If you run the training script again, you should end up with a validation accuracy of 45.2% in epoch 25. Despite reducing the number of parameters of our net by a great margin, we score better because we delayed overfitting.

You could try to reduce the parameter count even further, you can save a lot of weights by reducing the filter count in the last convolutional layer or by reducing the amount of hidden units in the dense layers (Remember: Dense layers connect every neuron with all neurons from the preceding layer, which results in a lot of weights). Reducing the capacity too much might result in underfitting, so be careful. Training with less weights might take longer, you might need to increase the number of epochs from 30 to 50 or even 100.

Our stats now look like this:

Run Max. Accuracy Best Epoch
Randomly guessing 1 of 5 classes ~20%
Baseline 44.0% 28
Half filters, half hidden units 45.2% 25

(Note: There is a distinct difference between overfitting and underfitting. Overfitting means our model has too much capacity to learn the dataset, which results in learned features that are not useful for the classification. Training and validation loss differ by a great margin. Underfitting on the other hand indicates that our model has not enough parameters to learn the dataset appropriately and both losses flatten out.)

Learning Rate Schedule

The learning rate is arguably the single most important hyper-parameter of our model (hyper-parameters are settings we choose for things like batch size, learning rate, number of epochs to train and so on). As of now, we use a fixed learning rate for the entire training process. It would be better to decrease the learning rate after every epoch to converge the optimization process. Usually, there are different ways to do this. A common practice is to use learning rate steps, which lower the learning rate by a specific amount at certain key points during training (e.g. after epoch 10, 30 and 50). We could also interpolate the learning rate between the first and last epoch. This is what I did for this tutorial. I wanted the learning rate to be 0.0005 at first and 0.000001 after 20 epochs.

Implementing a dynamic learning rate in Lasagne is a bit complicated. Basically, we need to do five things:

1. Define a tensor variable, which stores our learning rate

2. Pass this tensor variable to the optimizer

3. Define this tensor variable as input for the training function

4. Interpolate the learning rate before each epoch

5. Pass the learning rate to the training function

(Note: I defined LR_START, LR_END and EPOCHS in the “Config” section at the top of this script.)

We fixed random seeds, added confusion matrix functionality, adapted the amount of parameters in our model and implemented a learning rate schedule. The training process should now look like this:

We now reached 49.5% validation accuracy, which is quite an improvement using the same model design. Additionally, we reached the best result after only 10 Epochs. This is very important for training on larger datasets: Adjusting the learning rate can speed up the entire training process by a great margin.

Our new result table looks like this:

Run Max. Accuracy Best Epoch
Randomly guessing 1 of 5 classes ~20%
Baseline 44.0% 28
Half filters, half hidden units 45.2% 25
Learning rate schedule 49.5% 10

We now have everything in place to investigate more techniques for optimization. We could try a number of things for baseline optimization, feel free to test some different settings for batch size, model parameters, maybe even image size. I will introduce some ways of regularization, initialization and activation as well as different model architectures in future tutorials. Eventually, we will try to improve upon our experiments and come up with the best possible training scheme for our current task.

Leave a Comment