In recent years, I have seen many ai-related topics in CTF competitions at home and abroad. Some require players to implement their own AI to automate certain operations; Some give a target AI model and ask the player to crack it. This article focuses on the latter — how can we trick the AI given in a CTF contest?

Deception AI problems in CTF generally fall into two categories: neural network-based and statistical model-based. If the contestants are asked to trick the neural network, white-box models are given (usually image classification tasks). If the player is asked to cheat the statistical learning model, some questions will give white box model parameters, while others will provide training data sets.

Let’s start with a very simple deception statistical learning model to experience the main process of solving this kind of problem

Deceiving kNN: [West Lake Sword 2020] pointing a deer at a horse

Task goal

There is an AI model that requires contestants to upload a picture, which is very little different from dear. PNG, but is identified as a horse by AI.

import numpy as np from PIL import Image import math import operator import os import time import base64 import random def load_horse(): data = [] p = Image.open('./horse.png').convert('L') p = np.array(p).reshape(-1) p = np.append(p,0) data.append(p) return np.array(data) def load_deer(): data = [] p = Image.open('./deer.png').convert('L') p = np.array(p).reshape(-1) p = np.append(p,1) data.append(p) return  np.array(data) def load_test(pic): data = [] p = Image.open(pic).convert('L') p = np.array(p).reshape(-1) p = np.append(p,1) data.append(p) return np.array(data) def euclideanDistance(instance1, instance2, length): distance = 0 for x in range(length): distance += pow((instance1[x] - instance2[x]), 2) return math.sqrt(distance) def getNeighbors(trainingSet, testInstance, k): distances = [] length = len(testInstance) - 1 for x in range(len(trainingSet)): dist = euclideanDistance(testInstance, trainingSet[x], length) distances.append((trainingSet[x], dist)) distances.sort(key=operator.itemgetter(1)) neighbors = [] for x in range(k): neighbors.append(distances[x][0]) return neighbors def getResponse(neighbors): classVotes = {} for x in range(len(neighbors)): response = neighbors[x][-1] if response in classVotes: classVotes[response] += 1 else: classVotes[response] = 1 sortedVotes = sorted(classVotes.items(), key=operator.itemgetter(1), reverse=True) return sortedVotes[0][0] def getAccuracy(testSet, predictions): correct = 0 for x in range(len(testSet)): TestSet [x][-1] == Predictions [x]: Correct += 1 Return (correct/float(len(testSet))) * 100.0 def Check (PIC): source_p = Image.open('deer.png') try: c_p = Image.open(pic) except: print("Please upload right picture.") exit() diff_pixel = 0 a, b = source_p.size if c_p.size[0] ! = a and c_p.size[1] ! = b: print("Please upload right picture size("+str(a)+','+str(b)+')') exit() for y in range(b): for x in range(a): diff_pixel += abs(source_p.getpixel((x, y)) - c_p.getpixel((x, y))) return diff_pixel def main(): while 1: print('-' * 134) print(''' ____ __ _ _ _ _ _ _ _ | __ \ / _| | | | | | | | | | | | | | | | |__) |___| |_ ___ _ __ | |_ ___ | | _ | | __ ___ __ | | ___ ___ _ __ __ _ ___ | | _ | | __ ___ | | __ ___ __ __ ___ ___ | _ / _ \ _ / _ \ '__ | | __ / _ \ | __ | '_ \ _ \ / _ ` | / _ \ _ \' | / _ ` / (| | __ | '_ \ / _ \ |' _ \ / _ \ | '__ / __ | / _ \ \ | | \ \ __ / | | __ / | | | | (_) | | | _ | | | | __ / | (_ | | __ / __ / | | (_ | \ __ \ | | _ | | | | __ / | | | | (_) | | \ \ (/ | _ | \ _ \ _ | _ | \ ___ | _ | \__\___/ \__|_| |_|\___| \__,_|\___|\___|_| \__,_|___/ \__|_| |_|\___| |_| |_|\___/|_| |___/\___| ''') print('-'*134) print('\t1.show source code') print('\t2.give me the source pictures') print('\t3.upload picture') print('\t4.exit') choose = input('>') if choose == '1': w = open('run.py','r') print(w.read()) continue elif choose == '2': print('this is horse`s picture:') h = base64.b64encode(open('horse.png','rb').read()) print(h.decode()) print('-'*134) print('this is deer`s picture:') d = base64.b64encode(open('deer.png', 'rb').read()) print(d.decode()) continue elif choose == '4': break elif choose == '3': print('Please input your deer picture`s base64(Preferably in png format)') pic = input('>') try: pic = base64.b64decode(pic) except: exit() if b"<?php" in pic or b'eval' in pic: print("Hacker!! This is not WEB,It`s Just a misc!!!" ) exit() salt = str(random.getrandbits(15)) pic_name = 'tmp_'+salt+'.png' tmp_pic = open(pic_name,'wb') tmp_pic.write(pic) tmp_pic.close() if check(pic_name)>=100000: print('Don`t give me the horse source picture!!! ') os.remove(pic_name) break ma = load_horse() lu = load_deer() k = 1 trainingSet = np.append(ma, lu).reshape(2, 5185) testSet = load_test(pic_name) neighbors = getNeighbors(trainingSet, testSet[0], k) result = getResponse(neighbors) if repr(result) == '0': os.system('clear') print('Yes,I want this horse like deer,here is your flag encoded by base64') flag = base64.b64encode(open('flag','rb').read()) print(flag.decode()) os.remove(pic_name) break else: print('I want horse but not deer!!! ') os.remove(pic_name) break else: print('wrong choose!!! ') break exit() if __name__=='__main__': main()Copy the code

After a detailed look at the code, we found that the AI model was K-nearest Neighbor (KNN) with k=1, and there was only one picture for deer and horse in the training set. Read in the picture of the contestant and do the following:

  1. Check whether the pixel difference between the picture uploaded by the contestant and Deer is less than 100000. If the limit is exceeded, an error is reported.
  2. The Euclidean distance between deer and Horse. Whoever is closer to you, you decide which category you belong to.
  3. If the contestant’s picture is judged to be a horse, the contestant wins.

Deer and Horse are grayscale images as follows:

We recommend using Jupyter Notebook or Jupyter Lab and matplotlib to visualize the current results. This will greatly increase productivity.

Our current goal is to modify the deer so that the Euclidean distance between the deer and the Horse is less than that between the Deer and the Horse.

Try: random noise

In order to construct a legitimate picture, we need to go back and look at the measure of “amount of modification”. The code is as follows:

for y in range(b):
    for x in range(a):
        diff_pixel += abs(source_p.getpixel((x, y)) - c_p.getpixel((x, y)))
return diff_pixel
Copy the code

It measures the sum of the distances of each pixel between picture A and picture B. In other words, this is the Manhattan distance. Most of the CTF cheating AI problems I encountered measured the change in the Manhattan distance.

The image has 5,184 pixels, which means that, on average, each pixel allows a deviation of 19. In fact, this is a very loose value, so let’s just show you a legal change:

The output is just like an old TV. Can it fool the AI?

Unfortunately, the Euclidean distance from the deer is less than the Euclidean distance from the horse. We now have to ask ourselves: is it the best solution to randomly distribute 100,000 differences over each pixel?

Solution: Modify the pixels with large differences

Let’s think about this in two dimensions. Suppose we want to keep a point away from (0, 0) as measured by Euclidean distance, but keep Manhattan no more than 2. If (1, 1) is selected, then the Euclidean distance is SQRT (2); If (0,2) is selected, the Euclidean distance can reach 2, which is a better choice.

So, we guess accordingly: for this question, we should change some pixels directly to be equal to the corresponding pixels of Horse; Other pixels can be abandoned. And those points that should be modified are the points with the largest pixel difference between Deer and Horse.

And it produces a very strange picture. To verify that the requirements are met:

So the distance from the deer is 4003, and the distance from the horse is 2538, which is fooling the AI. The pixel difference is 99,999, and we have successfully accomplished the goal that the problem specified.

Mathematical proof

We have just successfully solved the problem based on the guess that the more different the pixel, the more it should be modified. Here is a proof of why it works. Readers who don’t like the proof can skip it.

So, we mathematically proved why “the more different the pixel, the more worth changing”. And from the mathematical derivation, we can also find another conclusion: it is not optimal to change the pixel point to the corresponding pixel value of the horse. Change it completely: either 0 or 255. However, the pixel difference limit of 100000 is a very loose bound, so our previous algorithm can succeed.

conclusion

In retrospect, we start with an original picture X, apply a small perturbation vector, and get sample Y, and AI behaves very differently to Y than it does to X. Such samples are called “adversarial samples”. How to construct high-quality adversarial samples and use adversarial samples to improve the robustness of the model has become an increasingly important direction in machine learning research.

It is important to note that attack statistics learning AI models often requires some mathematical derivation. If readers are interested, the author recommends to know kNN, Kmeans, mixed Gaussian model and other classical statistical learning methods.

Spoofing white box neural network

An overview of the

Neural network can solve a lot of problems that are difficult to be solved by traditional models and has experienced rapid development in recent years. Neural networks are generally composed of multiple layers of neurons, each of which has its own parameters. Here is a simple neural network model (multilayer perceptron) :

Source of IBM. It is assumed that the reader already has some knowledge of neural networks; If you’re starting from scratch, I recommend checking out 3Blue1Brown’s machine learning tutorial and PyTorch’s official tutorial.

Take the neural network described in the figure above. In the task of image classification, each pixel of the image is input to the first layer, and then transmitted to the second and third layers… Down to the last layer. Each neuron in the last layer represents a classification, and the output is a score of whether an image belongs to this category or not. We usually take the category with the highest score as the result of classification.

The deceptive neural network problems in CTF are generally as follows: given a pre-trained classification model (PyTorch or TensorFlow), then given an original graph. The original image is modified so slightly that the neural network misclassifies it into another category.

Means of attack

When we train neural networks, we usually use gradient descent. Each iteration can be understood as the following process: input X first, then run NET (X) to get the output, and propagate back to modify the parameters of the network model according to the difference between the network output and the expected output.

So, what can we do to attack this network now? First of all, the original graph X is provided to the network to obtain the output NET (X). Next, we calculate the loss value according to the difference between “the result of network classification” and “the result we want to mislead” and carry out reverse propagation. Note, however, that we do not modify the network parameters, but subtract the gradient from the original graph. Do this several times until you successfully mislead the AI.

Next, we demonstrate the attack method, starting with the training network, using the task of recognizing handwritten digits (MNIST data sets) as an example.

Practice: Train neural networks

PyTorch is used to implement the neural network. The first is to import the data set:

import torch
import torchvision
import torch.nn as nn
import torchvision.transforms as transforms
import torch.nn.functional as F
import numpy as np

import matplotlib.pyplot as plt

trans_to_tensor = transforms.Compose([
    transforms.ToTensor()
])

data_train = torchvision.datasets.MNIST(
    './data', 
    train=True, 
    transform=trans_to_tensor, 
    download=True)

data_test = torchvision.datasets.MNIST(
    './data', 
    train=False, 
    transform=trans_to_tensor, 
    download=True)

data_train, data_test

Copy the code

Implement a DataLoader that generates randomly scrambled mini batches for training:

train_loader = torch.utils.data.DataLoader(
    data_train, 
    batch_size=100, 
    shuffle=True)
Copy the code

Let’s look at a mini batch.

Next, define the network. We adopted a very original model: the input 28*28 gray scale map was expanded into a one-dimensional array, and then through the full connection layer of 100 neurons, the activation function was ReLu. Then through the full connection layer of 10 neurons, the activation function is sigmoID, which is output as the predicted value.

class MyNet(nn.Module):

    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28*28, 100)
        self.fc2 = nn.Linear(100, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        x = torch.sigmoid(x)

        return x

net = MyNet()
Copy the code

If the number in the image is C, we want the output of the 10-dimensional vector to have only the c-th bit be 1, and all the rest be 0. Therefore, we adopt the cross entropy loss function and Adam optimizer:

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters())
Copy the code

The next step is to train the network.

def fit(net, epoch=1):
    net.train()
    run_loss = 0

    for num_epoch in range(epoch):
        print(f'epoch {num_epoch}')

        for i, data in enumerate(train_loader):
            x, y = data[0], data[1]

            outputs = net(x)
            loss = criterion(outputs, y)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            run_loss += loss.item()

            if i % 100 == 99:
                print(f'[{i+1} / 600] loss={run_loss / 100}')
                run_loss = 0

                test(net)

def test(net):
    net.eval()

    test_loader = torch.utils.data.DataLoader(data_train, batch_size=10000, shuffle=False)
    test_data = next(iter(test_loader))

    with torch.no_grad():
        x, y = test_data[0], test_data[1]

        outputs = net(x)

        pred = torch.max(outputs, 1)[1]
        print(f'test acc: {sum(pred == y)} / {y.shape[0]}')

    net.train()
Copy the code

After 5 epochs:

We trained the network to test with 97.89% accuracy. Next, we started targeting the network.

Practice: Spoofing the white box multilayer perceptron

We know all the parameters of the current network. In CTF, code for the training network is typically provided, along with a pre-training model exported via torch.save(), which players can import via model.load_state_dict().

Let’s choose a random data point as the original image:

Our model classifies it, with great confidence, as 2. Next, we tamper with the original image so that the network misclassifies it as 3. The process is as follows:

  1. Input the image into the network and get the network output.
  2. Calculate the loss value between the network output and the expected output (cross entropy is used here).
  3. Subtract the image pixel from its own gradient * alpha without changing the network parameters.

Repeat the above process until the misdirection succeeds. The code is as follows:

def play(epoch): Requires_grad_ (True) # trace the gradient of input data loss_fn = nn.CrossEntropyLoss() # CrossEntropyLoss function for num_epoch in range(epoch): Output = net(img) Target = torch. Tensor ([3]) Loss = loss_fn(output, Target) loss. Backward () # compute gradient img.data.sub_(img.grad *.05) # gradient drop img.grad.zero_() if num_epoch % 10 == 9: print(f'[{num_epoch + 1} / {epoch}] loss: {loss} pred: {torch.max(output, 1)[1].item()}') if torch.max(output, 1)[1].item() == 3: print(f'done in round {num_epoch + 1}') return img = origin.view(1, 28, 28) play(100)Copy the code

We’ve managed to construct an adversarial sample that clearly looks like a 2 to us, but the model recognizes it as a 3. This completes the mission successfully. The comparison diagram is as follows:

conclusion

Many CTF spoofing neural network problems can use the above set of code. Training network code players do not need to write, just need to import pre-trained models. During iteration, the player should choose the appropriate learning rate alpha (0.05 in my code) and add some special constraints (such as not modifying more than a certain distance per pixel). In any case, the main idea of spoofing white box neural networks is to fix network parameters and modify the original graph by gradient descent.

Further discussion

Step by step, we’ve fooled the white box neural network. But in daily life, few neural networks will broadcast their parameters, which makes us unable to use the above routine to attack. In addition, the image we generated above is not “natural”, with a lot of background noise, which is not present in normal digital images.

On these issues, an ICLR2018 paperGenerating natural adversarial examplesCan provide some inspiration. The method presented in this paper does not require to know the network parameters in advance, or even the network model. Moreover, this scheme can generate relatively natural adversarial samples, as shown below:So how did they do it? The following is a brief description of the principle. First, train a generator from Latent space to an image, and an encoder to push back Z from an image, using a similar idea to CycleGAN. Next, the original image is encoded into vector Z, and a lot of z ‘are randomly selected near Z. The generator is used to generate pictures from these Z’, and then it is handed to the target model for judgment. If any images successfully mislead the model, success is reported.The authors of the paper show the effectiveness of the method in both CV and NLP domains, and successfully attack Google Translate. Their code is open-source on Github.

This is a widely applicable scheme, but the author thinks it may not be suitable for CTF. This is because training GAN is a time-consuming and laborious task requiring machine learning skills, which has gone beyond the general scope of CTF. In addition, since the test maker model is a black box, it may be difficult for the contestant who has better training model skills and uses a discriminant model that is quite different from the test maker.

All in all, adversarial samples are an interesting research area. The general steps of deceiving AI in CTF competitions are described to help CTF players.