In this homework, we continue learning Caffe, and implement dropout and data augmentation in our earlier ConvNet. We then fine-tune a pre-trained model, AlexNet, for style classification on the WikiArt dataset. Finally, we visualize data gradients and learn to generate images to fool a pre-trained ConvNet.
Download the starter code here.
Q1: Dropout and Data Augmentation (15 points)
In this exercise, we'll be working with the same two-layer ConvNet we trained on the CIFAR-10 dataset in the previous assignment and implementing two ways to reduce overfitting - dropout and data augmentation, using Caffe.
Go through the specification of the DropoutLayer
and
read network prototxt files of AlexNet & CaffeNet to see
how dropout layers are implemented in Caffe.
There is in-built support for simple data augmentations such
as random crops and mirroring in Caffe. This is defined by the
transform_param
parameter inside a DataLayer
definition.
layer {
name: "data"
type: "Data"
[...]
transform_param {
scale: 0.1
mean_file_size: mean.binaryproto
# for images in particular horizontal mirroring and random cropping
# can be done as simple data augmentations.
mirror: 1 # 1 = on, 0 = off
# crop a `crop_size` x `crop_size` patch:
# - at random during training
# - from the center during testing
crop_size: 227
}
}
- Use a smaller training set, so that the network overfits (high training accuracy, low validation accuracy)
- Define a dropout layer
- Add data augmentation parameters to the Data layer
- Train the network again on the smaller set. You should see higher validation accuracy
Optional: Other common data augmentation techniques used to improve accuracy are rotations, shearing & perspective wrapping. Take a look at the ChenglongChen/caffe-rta repository to see how the author has implemented these.
Deliverables
- Network prototxt with dropout and data augmentation (5 points)
Validation Loss v/s Iterations
plot with and without dropout (10 points)
Q2: Fine-tuning AlexNet for Style classification on WikiArt data (20 points)
Given the WikiArt dataset, which consists of 10000 images of paintings of arbitrary sizes from 10 different styles - Baroque, Realism, Expressionism, etc., the goal is to fine-tune a pretrained model, AlexNet, to predict painting style with reasonable performance and minimal training time.
Obtaining the dataset
The dataset consists of 10000 images in total from 10 different styles
of painting - 1000 images each. Use the download_wikiart.py
script
to download a subset of the data and split it into training and
validation sets.
% python download_wikiart.py -h
usage: download_wikiart.py [-h] [-s SEED] [-i IMAGES] [-w WORKERS]
Download a subset of the WikiArt style dataset to a directory.
optional arguments:
-h, --help show this help message and exit
-s SEED, --seed SEED random seed
-i IMAGES, --images IMAGES
number of images to use (-1 for all [default])
-w WORKERS, --workers WORKERS
num workers used to download images. -x uses (all - x)
cores [-1 default].
% python download_wikiart.py -i 2000 -s 761218
Downloading 2000 images with 7 workers...
Writing train/val for 1996 successfully downloaded images.
Setting up the AlexNet prototxt files
Copy the AlexNet prototxt files, solver.prototxt
and train_val.prototxt
from
$CAFFE_ROOT/models/bvlc_alexnet
to the working directory.
cp $CAFFE_ROOT/models/bvlc_alexnet/solver.prototxt ./
cp $CAFFE_ROOT/models/bvlc_alexnet/train_val.prototxt ./
Since you'll be fine-tuning a network pretrained on the ImageNet dataset,
you will also need the ImageNet mean file. Note that if you train a network
from scratch, then you should instead compute the mean over your own training
data. Run $CAFFE_ROOT/data/ilsvrc12/get_ilsvrc_aux.sh
to obtain this. You
will also need the AlexNet pretrained model.
python $CAFFE_ROOT/scripts/download_model_binary.py $CAFFE_ROOT/models/bvlc_alexnet
Transfer Learning
There are two main transfer learning scenarios:
ConvNet as a fixed feature extractor: We take a ConvNet pretrained on the ImageNet dataset, remove the final fully-connected layer and treat the rest of the ConvNet as a fixed feature extractor for the new dataset. We can train a linear classifier (linear SVM or SoftMax classifier) on these extracted features (4096-D vectors for every image in case of AlexNet) for the new dataset. In Caffe, this is achieved by setting the learning rates of the intermediate layers (
blobs_lr
) to 0.Finetuning the ConvNet: The second strategy is to not only replace and retrain the classifier on top of the ConvNet on the new dataset, but to also fine-tune the weights of the pretrained network by continuing the backpropagation.
Fine-tuning
Look at train_val.prototxt
and solver.prototxt
closely. To fine-tune on the
WikiArt dataset, we'll start with the weights of the pretrained model for
all layers. Since our dataset consists of 10 classes instead of 1000
(for ImageNet), we'll modify the last layer. Note that in Caffe when we
start training with a pretrained model, weights of layers with the same
name are retained and new layers are initialized with random weights.
From the Caffe example on fine-tuning CaffeNet for style recognition on Flickr style data:
We will also decrease the overall learning rate base_lr
in the solver prototxt,
but boost the blobs_lr
on the newly introduced layer. The idea is to have the
rest of the model change very slowly with new data, but let the new layer learn fast.
Additionally, we set stepsize
in the solver to a lower value than if we were training
from scratch, since we’re virtually far along in training and therefore want the
learning rate to go down faster. Note that we could also entirely prevent fine-tuning
of all layers other than fc8_flickr
by setting their blobs_lr
to 0.
- Change the data layer
- Change last layer
- Modify hyperparameters
Now you can start training.
$CAFFE_ROOT/build/tools/caffe train -solver solver.txt -weights $CAFFE_ROOT/models/bvlc_alexnet/bvlc_alexnet.caffemodel
Deliverables
- Prototxt files (
train_val
,solver
,deploy
) (10 points) Training Loss v/s Iteration
plot (5 points)- Kaggle contest (5 points + up to 10 extra points for beating TA entry and top performers)
Q3: Visualizing and Breaking ConvNets (15 points)
In this exercise, we'll work with the Python interface for Caffe and learn to visualize data gradients and generate images to fool ConvNets.
Class Model Visualizations
We'll be using the method outlined in the paper "Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps" [3] to visualize a class model learnt by a convolutional neural network.
In order to generate the class model visualization, we need to optimize the unnormalized class score with respect to the image.
$$ \mathop{\arg\,\max}\limits_I S_c(I) - \lambda \lVert I \rVert^2 $$
This is done by standard backpropagation as done during the training phase of the network with the difference that instead of updating the network parameters, we'll be updating the image to maximise the score, a method known as gradient ascent. Also note that we'll drop the final layer of the network and maximize the unnormalized class score instead of the probability as outlined in the paper.
Copy the AlexNet deploy.prototxt
into the working directory and edit it.
cp $CAFFE_ROOT/models/bvlc_alexnet/deploy.prototxt 3_visualizing-breaking-convnets/
- Delete the final layer
- Add "force_backward: true", to propagate the gradients back to the data layer in the backward pass
- Change the number of input dimensions to 1
Open the IPython notebook class-model-visualizations.ipynb
and complete
the missing code to generate the class model visualizations.
Image-Specific Class Saliency Visualisation
Section 3 of the paper [3] describes a method to understand which part of an image is important for classification by visualizing the gradient of the correct class score with respect to the input image. The core idea behing this is to find the pixels which need to be changed the least.
Open the IPython notebook saliency-maps.ipynb
and complete the missing code
to extract and visualize image-specific saliency maps.
Generating Fooling Images to Break ConvNets
Several papers [4,5,6] have suggested ways to perform optimization over the input image to construct images that break a trained ConvNet. These papers showed that given a trained ConvNet, an input image, and a desired label, that we can add a small amount of noise to the input image to force the ConvNet to classify it as having the desired label.
We will create a fooling image by solving the following optimization problem:
$$ x_f = \mathop{\arg\,\min}\limits_x (L (x,y,m) + \frac{\lambda}{2} \lVert x - x_0 \rVert ^2) $$
Open the IPython notebook breaking-convnets.ipynb
and complete the missing code
to generate fooling images that break pretrained ConvNets.
Deliverables
- Completed IPython notebooks
class-model-visualizations.ipynb
,saliency-maps.ipynb
&breaking-convnets.ipynb
(5 points x 3)
References:
- Assignment 3, CS231n, Stanford
- Fine-tuning CaffeNet for Style Recognition on “Flickr Style” Data
- Simonyan et al., "Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps", ICLR 2014
- Nguyen et al., "Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images", CVPR 2015
- Szegedy et al., "Intriguing properties of neural networks"
- Goodfellow et al., "Explaining and Harnessing Adversarial Examples", ICLR 2015