Generate counter samples with Caffe

CNN: Solving input images by Optimizing – Zhihu column
Antagonistic samples were generated in Caffe using the Fast Gradient Sign method

A complete example of this article’s code can be downloaded at:

frombeijingwithlove/dlcv_for_beginners

Fast Gradient Sign method

A word back on CNN: The method of adding noise to an array of opposing samples, based on optimal solution of the input image, is an Intriguing property of Neural networks, a paper by Christian Szegedy:

& amp; amp; lt; img src=”https://pic2.zhimg.com/v2-ae1eadf3107819f3adbe4fc26da883dd_b.jpg” data-rawwidth=”683″ data-rawheight=”277″ class=”origin_image zh-lightbox-thumb” width=”683″ data-original=”https://pic2.zhimg.com/v2-ae1eadf3107819f3adbe4fc26da883dd_r.jpg”& amp; amp; gt; Computationally, this method has a huge advantage because only one forward and one backward gradient calculation is required. Ian Goodfellow calls it

F
G
S

The FGS method is so simple that it is easy to implement with any framework (Ian Goodfellow has an official implementation as a complete toolkit based on TensorFlow: OpenAI/CleverHans). Here are examples of Caffe’s Python interface implementation.

First you need to prepare the model to attack, here we use SqueezeNet V1.0 pre-trained on the ImageNet dataset as an example:

DeepScale/SqueezeNet

Two files need to be downloaded:

deploy.prototxt

Squeezenet_v1. 0. Caffemodel

Because it needs to be evaluated backwards, the first thing you do after deploying. Prototxt is downloaded is add the following sentence:

force_backward: true
Copy the code

Start by loading the prepared model definition and parameter files in Caffe and initialize transformer to read the three-channel color pictures:

# model to attack
model_definition = '/path/to/deploy.prototxt'
model_weights = '/path/to/squeezenet_v1.0.caffemodel'
channel_means = numpy.array([104., 117., 123.])

# initialize net
net = caffe.Net(model_definition, model_weights, caffe.TEST)
n_channels, height, width = net.blobs['data'].shape[-3:]
net.blobs['data'].reshape(1, n_channels, height, width)

# initialize transformer
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2, 0, 1))
transformer.set_mean('data', channel_means)
transformer.set_raw_scale('data', 255)
transformer.set_channel_swap('data', (2, 1, 0))
Copy the code

Since we are just demonstrating how to make an adversarial sample, we will only process one image at a time for convenience. The next step is to read the image and go forward to calculate the category confidence and backward to calculate the gradient. We use the following photo of the small white dog as input:

& amp; amp; lt; img src=”https://pic4.zhimg.com/v2-f07340bb8772462fca9c5eeac54c2147_b.jpg” data-rawwidth=”227″ data-rawheight=”227″ class=”content_image” width=”227″& amp; amp; gt; The code is as follows:

# Load image & forward
img = caffe.io.load_image('little_white_dog.jpg')
transformed_img = transformer.preprocess('data', img)
net.blobs['data'].data[0] = transformed_img
net.forward()

# Get predicted label index
pred = numpy.argmax(net.blobs['prob'].data.flatten())

# Set gradient direction to reduce the current prediction
net.blobs['prob'].diff[0][pred] = -1.

# Generate attack image with fast gradient sign method
diffs = net.backward()
diff_sign_mat = numpy.sign(diffs['data'])
adversarial_noise = 1.0 * diff_sign_mat
Copy the code

This is fine for the counter sample noise that is superimposed on the original image. In this code, what we are doing is generating an counter sample that reduces the prediction category of the current model, where each pixel advances 1.0 in the gradient direction. To generate an adversarial sample so that the model prediction image is of a specified category, change the gradient assignment statement to the following:

net.blobs[prob_blob].diff[0][label_index] = 1.
Copy the code

Where label_index is the category that the model is expected to mispredict. Note that the image read with caffe.io.load_image is an Ndarray with values between 0 and 1. Transformer processes the new Ndarray with values between 0 and 255 for each pixel. In addition, the noise obtained is often not the final result, because whether the pixel value will overflow should be considered after adding the original image, so the code to generate the final anti-sample image is as follows:

# clip exceeded values
attack_hwc = transformer.deprocess(data_blob, transformed_img + adversarial_noise[0])
attack_hwc[attack_hwc > 1] = 1.
attack_hwc[attack_hwc < 0] = 0.
attack_img = transformer.preprocess(data_blob, attack_hwc)
Copy the code

Attack_img is an antisample of Caffe’s blob shape. Attack_hwc is a format of dimensions by image height, image width, and image channel order, which can be visualized directly with Matplotlib.

Visualization and simple analysis

To facilitate analysis, we package the process of generating adversarial samples into a function:

def make_n_test_adversarial_example( img, net, transformer, epsilon, data_blob='data', prob_blob='prob', label_index=None, top_k=5): # Load image & forward transformed_img = transformer.preprocess(data_blob, img) net.blobs[data_blob].data[0] = transformed_img net.forward() probs = [x for x in enumerate(net.blobs[prob_blob].data.flatten())] num_classes = len(probs) sorted_probs = sorted(probs, key=itemgetter(1), reverse=True) top_preds = sorted_probs[:top_k] pred = sorted_probs[0][0] # if label_index is set, # generate a adversarial example toward the label, # else # reduce the probability of predicted label net.blobs[prob_blob].diff[...]  = 0 if type(label_index) is int and 0 <= label_index < num_classes: net.blobs[prob_blob].diff[0][label_index] = 1. else: net.blobs[prob_blob].diff[0][pred] = -1. # generate attack image with fast gradient sign method diffs = net.backward() diff_sign_mat = numpy.sign(diffs[data_blob]) adversarial_noise = epsilon * diff_sign_mat # clip exceeded values attack_hwc = transformer.deprocess(data_blob, transformed_img + adversarial_noise[0]) attack_hwc[attack_hwc > 1] = 1. attack_hwc[attack_hwc < 0] = 0. attack_img = transformer.preprocess(data_blob, attack_hwc) net.blobs[data_blob].data[0] = attack_img net.forward() probs = [x for x in enumerate(net.blobs[prob_blob].data.flatten())] sorted_probs = sorted(probs, key=itemgetter(1), reverse=True) top_attacked_preds = sorted_probs[:top_k] return attack_hwc, top_preds, top_attacked_predsCopy the code

This function uses the nDARray read by caffe.io. Load_image as the input image, requires both NET and Transformer, epsilon is the magnitude of the noise, label_index defaults to None, and the resulting counter sample reduces the confidence of the current prediction. If label_index is set to the specified category, the resulting counter sample attempts to increase the confidence of the model’s prediction for that category. Finally, the function returns attack_hwc which can be directly visualized by Matplotlib, top_attack_preds, top_attack_preds, top_attack_preds, and top_attack_preds.

The result of the above function can be visualized with the following function:

def visualize_attack(title, original_img, attack_img, original_preds, attacked_preds, labels):
    pred = original_preds[0][0]
    attacked_pred = attacked_preds[0][0]
    k = len(original_preds)
    fig_name = '{}: {} to {}'.format(title, labels[pred], labels[attacked_pred])

    pyplot.figure(fig_name)
    for img, plt0, plt1, preds in [
        (original_img, 231, 234, original_preds),
        (attack_img, 233, 236, attacked_preds)
    ]:
        pyplot.subplot(plt0)
        pyplot.axis('off')
        pyplot.imshow(img)
        ax = pyplot.subplot(plt1)
        pyplot.axis('off')
        ax.set_xlim([0, 2])
        bars = ax.barh(range(k-1, -1, -1), [x[1] for x in preds])
        for i, bar in enumerate(bars):
            x_loc = bar.get_x() + bar.get_width()
            y_loc = k - i - 1
            label = labels[preds[i][0]]
            ax.text(x_loc, y_loc, '{}: {:.2f}%'.format(label, preds[i][1]*100))

    pyplot.subplot(232)
    pyplot.axis('off')
    noise = attack_img - original_img
    pyplot.imshow(255 * noise)
Copy the code

This code displays both the class and confidence of the original image and the model prediction, the class and confidence of the sample image and the model prediction, and the noise superimposed on the original image. For ImageNet data, you can download the synset_words. TXT file provided by Caffe, and then read the labels in a list in order. In the following example, we assume that the list is labels.

When everything is ready, to see the effect, first try to reduce the confidence predicted by the model with a noise of magnitude 1:

attack_img, original_preds, attacked_preds = \ make_n_test_adversarial_example(img, net, transformer, Visualize_attack ('example0', img, attack_img, Original_preds, attacked_preds, labels)Copy the code

The results are as follows:

& amp; amp; lt; img src=”https://pic2.zhimg.com/v2-edeb48d88375ebc8a6533b3e1fb0737d_b.png” data-rawwidth=”673″ data-rawheight=”480″ class=”origin_image zh-lightbox-thumb” width=”673″ data-original=”https://pic2.zhimg.com/v2-edeb48d88375ebc8a6533b3e1fb0737d_r.png”& amp; amp; gt; Because The Chinese rural dog is not in ImageNet’s category, the model predicted Great Pyrenees. Considering the color and shape of the little dog, the result was reasonable, indicating that SqueezeNet V1.0 was good. After a pixel of noise, the model predicted a weasel…

Next, try generating an adversarial sample that the model predicts for the given category. Since the original category is the great Pyrenees, try predicting for the real big white bear, i.e., ice Bear:

Attack_img, original_preds, attacked_preds = \ make_n_test_adversarial_example(img, net, transformer, 1.0, label_index=296) visualize_attack('example1', img, attack_img, original_preds, attacked_preds, labels)Copy the code

& amp; amp; lt; img src=”https://pic2.zhimg.com/v2-51e6bee68b472702fcdcd36195cabfc9_b.png” data-rawwidth=”640″ data-rawheight=”480″ class=”origin_image zh-lightbox-thumb” width=”640″ data-original=”https://pic2.zhimg.com/v2-51e6bee68b472702fcdcd36195cabfc9_r.png”& amp; amp; gt; The results are still pretty good, and a very high rating, but the weasel comes in second. Label_index = label_index = label_index = label_index = label_index = label_index = label_index = label_index = label_index = label_index

& amp; amp; lt; img src=”https://pic4.zhimg.com/v2-5bcff63ee2f91fca0570d69bb849897f_b.png” data-rawwidth=”640″ data-rawheight=”480″ class=”origin_image zh-lightbox-thumb” width=”640″ data-original=”https://pic4.zhimg.com/v2-5bcff63ee2f91fca0570d69bb849897f_r.png”& amp; amp; gt; It’s still a weasel, so try a stronger noise and set the noise amplitude to 2.0:

& amp; amp; lt; img src=”https://pic3.zhimg.com/v2-f4092a62eac9fb7ff7e0e71a42eca532_b.png” data-rawwidth=”640″ data-rawheight=”480″ class=”origin_image zh-lightbox-thumb” width=”640″ data-original=”https://pic3.zhimg.com/v2-f4092a62eac9fb7ff7e0e71a42eca532_r.png”& amp; amp; gt; It worked, although the confidence level was not very high, further increasing the noise amplitude to 6.0:

The prediction is that the ostrich’s confidence has improved dramatically! So is it true that the higher the noise amplitude, the higher the confidence of ostrich prediction? According to the figure in Ian’s paper (Fig. 4), it seems that:

Try setting the noise level to 18.0:

& amp; amp; lt; img src=”https://pic2.zhimg.com/v2-cfbe566d7f38ec1d6d0bdc51e7cc5da9_b.png” data-rawwidth=”640″ data-rawheight=”480″ class=”origin_image zh-lightbox-thumb” width=”640″ data-original=”https://pic2.zhimg.com/v2-cfbe566d7f38ec1d6d0bdc51e7cc5da9_r.png”& amp; amp; gt; A toad… One of the main arguments in Ian’s paper is that in the popular deep network, the main reason for the existence of adversarial samples is the high degree of linearity of models, as evidenced by FIG. 4 in the paper above, and that adversarial samples can be generalized among different models. But why is linearity the dominant cause? Ian doesn’t seem to offer quantitative, particularly convincing evidence. In fact, fig.4 in the original text is just a diagram on MNIST, and the linearity degree of some more complicated data has been weakened, such as the article written by Ian for KDNUGGETS

Deep Learning Adversarial Examples – Clarifying Misconceptions

In essence, the existence of adversarial samples is because it is not feasible to search in high-dimensional space. It is natural for adversarial samples to appear in the corners where data and models cannot reach. Although the perceived linearity of the model and the corresponding division of the input space are the main reasons for the existence of adversation samples, adversation samples attributed to other factors are not negligible, such as the case of a dog turning into a toad. After all, what makes neural networks universal approximators is nonlinearity.

Use iteration to better generate adversarial samples

Classification model although there is no distance between this concept, but in the input space between categories or similar categories will be more clearly, through some examples also can see, a dog or a bear weasel easier, become an ostrich is a little difficult, into other, more is not similar like the Racket (Racket) it will be more difficult. The four amplitudes of ostrich antagonism samples (1.0, 2.0, 6.0, 18.0) were also tested on the antagonistic samples of racket generation. The results are as follows:

& amp; amp; lt; img src=”https://pic2.zhimg.com/v2-4239c3313261de176707e3484d0016e9_b.png” data-rawwidth=”640″ data-rawheight=”480″ class=”origin_image zh-lightbox-thumb” width=”640″ data-original=”https://pic2.zhimg.com/v2-4239c3313261de176707e3484d0016e9_r.png”& amp; amp; gt;

Based on this idea, we modified the second method and tried to increase the confidence of the racquet with iteration method, 0.1 for each iteration and 10 times:

Attack_img, original_preds, attacked_preds = \ make_n_test_adversarial_example(img, net, transformer, 0.1, label_index=752) for i in range(9): Attack_img, _, attacked_preds = \ make_n_test_adversarial_example(attack_img, net, transformer, 0.1, label_index=752) visualize_attack('racket_try1'.format(i), img, attack_img, original_preds, attacked_preds, labels)Copy the code

Note that the iterative writing of external calls is inefficient and contains a redundant forward calculation each time. For simplicity, the result of the iteration is as follows:

& amp; amp; lt; img src=”https://pic1.zhimg.com/v2-e43a5ab6c393a684fc844fb581cdac24_b.png” data-rawwidth=”640″ data-rawheight=”480″ class=”origin_image zh-lightbox-thumb” width=”640″ data-original=”https://pic1.zhimg.com/v2-e43a5ab6c393a684fc844fb581cdac24_r.png”& amp; amp; gt; Succeeded in getting the racket.

Generate counter samples with Caffe

Fast Gradient Sign method