Dogs vs. Cats: Deep learning image classification using TensorFlow in Python

The problem

Given a set of labeled images of cats and dogs, a machine learning model is to be learnt and later it is to be used to classify a set of new images as cats or dogs. This problem appeared in a Kaggle competition and the images are taken from this kaggle dataset.

The original dataset contains a huge number of images (25,000 labeled CAT /dog images for training and 12,500 unlabeled images for test).
Only a few sample images are chosen (1100 labeled images for cat/dog as training and 1000 images from the test dataset) from the dataset, just for the sake of quick demonstration of how to solve this problem using deep learning (motivated by the Udacity course Deep Learning by Google), which is going to be described (along with the results) in this article.
The sample test images chosen are manually labeled to compute model accuracy later with the model-predicted labels.
The accuracy on the test dataset is not going to be good in general for the above-mentioned reason. In order to obtain good accuracy on the test dataset using deep learning, we need to train the models with a large number of input images (e.g., with all the training images from the kaggle dataset).

A few sample labeled images from the training dataset are shown below.

Dogs

Cats

As a pre-processing step, all the images are first resized to 50 x 50 pixel images.

Classification with a few off-the-self classifiers

First, each image from the training dataset is fattened and represented as 2500-length vectors (one for each channel).

Next, a few sklearn models are trained on this flattened data. Here are the results

As shown above, the test accuracy is quite poor with a few sophisticated off-the-self classifiers.

Classifying images using Deep Learning with Tensorflow

Now let’s first train a logistic regression and then a couple of neural network models by introducing L2 regularization for both the models.

First, all the images are converted to gray-scale images.
The following figures visualize the weights learnt for the cat vs. the dog class during training the logistic regression model with SGDWith L2-regularization (λ=0.1, Batch size=128).

The Test accuracy: 53.6%

The following animation visualizes the weights learnt for 400 randomly selected hidden units using a neural net with a single hidden layer with 4096 hidden nodes by training the neural net model with SGD with L2-regularization (λ1=λ2=0.05, batch size=128).Minibatch loss at step 0: 198140.156250

Minibatch accuracy: 50.0%

Minibatch loss at Step 500: 0.542070

Minibatch accuracy: 89.8%

Minibatch loss at Step 1000:0.474844

Minibatch accuracy: 96.9%

The Validation accuracy: 60.0%
Minibatch loss at step 1500: 0.571939

Minibatch accuracy: 85.9%

Validation accuracy: 56.0%

Minibatch loss at step 2000: 0.537061

Minibatch accuracy: 91.4%

Validation accuracy: 63.0%

Minibatch loss at step 2500: 0.751552

Minibatch accuracy: 75.8%

Validation accuracy: 57.0%

Minibatch loss at Step 3000:0.579084 Minibatch accuracy: 85.9% Validation accuracy: 54.0%
```
The Test accuracy: 57.8%
```

Clearly, the model learnt above overfits the training dataset, the test accuracy improved a bit, but still quite poor.

Now, let’s “train” a deeper neural net with a two hidden layers, first one with 1024 nodes and second one with 64 nodes.
Minibatch loss at step 0: 1015.947266

Minibatch accuracy: 43.0%

Validation accuracy: 50.0%
Minibatch loss at step 500: 0.734610

Minibatch accuracy: 79.7%

Validation accuracy: 55.0%
Minibatch loss at step 1000: 0.615992

Minibatch accuracy: 93.8%

Validation accuracy: 55.0%
Minibatch loss at step 1500: 0.670009

Minibatch accuracy: 82.8%

Validation accuracy: 56.0%
Minibatch loss at step 2000: 0.798796

Minibatch accuracy: 77.3%

Validation accuracy: 58.0%
Minibatch loss at step 2500: 0.717479

Minibatch accuracy: 84.4%

Validation accuracy: 55.0%
Minibatch loss at Step 3000:0.631013 Minibatch accuracy: 90.6% Validation accuracy: 57.0%
Minibatch loss at Step 35:0.739071 Minibatch accuracy: 75.8% Validation accuracy: 54.0%
Minibatch loss at step 4000: 0.698650

Minibatch accuracy: 84.4%

Validation accuracy: 55.0%
Minibatch loss at step 4500: 0.666173

Minibatch accuracy: 85.2%

Validation accuracy: 51.0%
Minibatch loss at step 5000: 0.614820

Minibatch accuracy: 92.2%

Validation accuracy: 58.0%

The Test accuracy: 55.2%Copy the code

The following animation visualizes the weights learnt for 400 randomly selected hidden units from the first hidden layer, by training the neural net model with SGD with L2-regularization (λ1=λ2=λ3=0.1, batch size=128, dropout rate=0.6).
The next animation visualizes the weights learnt and then the weights learnt for all the 64 hidden units for the second hidden layer.
Clearly, the second deeper neural net model learnt above overfits the training dataset more, the test accuracy decreased a bit.

Classifying images with Deep Convolution Network

Let’s use the following conv-net shown in the next figure.

As shown above, the ConvNet uses:

2 convolution layers each with
- 5 x5 kernel
- 16 filters
- 1 x 1 stride
- SAME padding
2 Max pooling layers each with
- 2 x 2 kernel
- 2 x 2 stride
64 hidden nodes
128 batch size
5K iterations
0.7 dropout rate
No learning decay

Results

Minibatch loss at step 0: 1.783917

Minibatch accuracy: 55.5%

Validation accuracy: 50.0%

Minibatch loss at step 500: 0.269719

Minibatch accuracy: 89.1%

Validation accuracy: 54.0%

Minibatch loss at Step 1000:0.045729 Minibatch accuracy: 96.9% Validation accuracy: 61.0%

Minibatch loss at step 1500: 0.015794

Minibatch accuracy: 100.0%

Validation accuracy: 61.0%

Minibatch loss at step 2000: 0.028912

Minibatch accuracy: 98.4%

Validation accuracy: 64.0%

Minibatch loss at Step 2500:0.007787 Minibatch accuracy: 100.0% Validation accuracy: 62.0%

Minibatch loss at Step 3000:0.001591 Minibatch accuracy: 100.0% Validation accuracy: 63.0%

The Test accuracy: 61.3%

The following animations show the features learnt (for the first 16 images for each SGD batch) at different convolution and Maxpooling layers:

Clearly, the simple convolution neural net outperforms all the previous models in terms of test accuracy, as shown below.

Only 1100 labeled images (randomly chosen from the training dataset) were used to train the model and predict 1000 test images (randomly chosen from the test dataset). Clearly the accuracy can be improved a lot if a large number of images are used for training with deeper / more complex networks (with more parameters to learn).

Dogs vs. Cats: Deep learning image classification using TensorFlow in Python

The problem

Classification with a few off-the-self classifiers

Classifying images using Deep Learning with Tensorflow

Classifying images with Deep Convolution Network

Related Posts

Integrated learning for machine learning

Baidu PGL has won 2 gold MEDALS and 1 silver MEDALS in the first KDD CUP 2021 Graph neural Network Competition

TensorFlow Lite: TensorFlow on mobile devices and embedded devices lightweight cross-platform solutions | Google developers conference 2018