Machine learning models are susceptible to adversarial samples leading to misidentification, but what about humans? In this article, Google brain Ian Goodfellow and others through the latest technology to create the first counter sample to deceive the human, its implementation process is the known parameters and structure of computer vision model of counter sample migrated to not get other model parameters and structure, and by modifying the model to better match the human visual system of the initial treatment. This article was adapted from arXiv by Gamaleldin F. Elsayed.


The introduction

Machine learning models are easily fooled by adversarial samples: the classification of model output errors after the optimization of inputs by adversarial samples (Szegedy et al., 2013; Biggio et al., 2013). In the field of computer vision, adversarial sample is usually the image formed by small perturbation of sample image in data set. Many popular algorithms for building adversarial samples rely on model architecture and parameters for gradient optimization of inputs. But without access to the brain’s “architecture and parameters”, these methods are unable to build adversarial samples against humans.

An interesting phenomenon is that the adversarial samples can often be migrated from one model to another, making it possible to attack models that do not capture schema and parameters. This naturally raises the question of whether adversarial samples can fool humans. Humans have a lot of cognitive biases and optical illusions, but these are not usually small disturbances in natural images and cannot currently be optimized for generation by machine learning loss functions. Therefore, the current view in the field is that the transference of adversarial samples does not affect human visual perception, although the researchers have not conducted a thorough empirical study.

Figure 1: While most antagonistic samples fool humans only for a fraction of a second, the sample described here has a larger impact, lasting even longer. On the left is an image of a cat, and on the right is the same image after perturbation, but it looks like a dog.

These questions are being closely studied, creating opportunities for machine learning and neuroscience to learn from each other. Neuroscience often provides a proof of existence for machine learning — we work with object recognition algorithms on the assumption that it is possible to build them because the human brain can recognize objects. For details, see Hassabis et al. (2017) on the impact of neuroscience on ARTIFICIAL intelligence. If we know that the human brain can resist certain kinds of adversarial samples, then this provides proof of the existence of similar mechanisms for machine learning safety. If we know that the human brain can be fooled by adversarial samples, then perhaps the focus of machine-learning safety research should shift from designing models that are robust to adversarial samples to designing machine-learning components that are safe but contain non-robust components. Similarly, if adversarial samples developed for computer vision also have an effect on the human brain, it could help to better understand brain function.

Figure 2: Adversarial samples optimized on more models/perspectives are sometimes more meaningful to humans. This observation suggests that man-machine migration may be possible.

(a) Typical examples of adversarial sample images (Goodfellow et al., 2014). The ability of this antagonism to fool models by geometric transformation attack is moderate and limited, and the model that can be fooled does not include the model that generated the antagonism image. (b) This adversarial attack makes cat images labeled as computers, but robust to geometric transformations (Athalye, 2017). Unlike the attack in A, this image contains semantic features that are semantically more computer-like to humans. (c) The image is labeled as the anti-patch of the toaster, which can cause misclassification of multiple perspectives (Brown et al., 2017). Similar to B, the patch includes human features that look more like toasters. (d) In this paper, the researchers found similar effects when adversarial sample images were used to fool multiple models, rather than the same model from different perspectives. Here the image corresponds to a series of confrontational attacks that identify cats as dogs. Top: From left to right, the attack targets larger and larger clusters of models (raw image at right). Above each image are the category prediction results of the two test models. As the number of attack target models grows, the resulting images become more dog-like to humans. Next: Attack against 10 models, constantly changing degree of attack. Even at EPS = 8, the image was more dog-like to humans.

This study investigates the impact of adversarial samples that can migrate strongly between multiple computer vision models on human visual perception. Three key points were used to test whether adversarial samples have observable effects on the human visual system. First, the researchers used a recent black-box adversarial sample construction technique to create adversarial samples for target models without model architecture or parameters. Second, the machine-learning model was adapted to mimic human initial visual processing, making it more likely that adversarial samples would migrate from the model to a human observer. Third, the researchers evaluated the classification results of human observers in a time-limited setting so that subtle effects of the opposing samples on human perception could be detected.

In other words, humans can achieve near-perfect accuracy on classification tasks, and small changes in performance may not correspond to appreciable changes in accuracy. Humans can’t even achieve perfect accuracy on clean images when the image is presented for a short enough time, and small changes in performance can lead to even more significant changes in accuracy. In addition, the short time of image presentation limits the time for the brain to utilize cyclic and top-down processing pathways (Potter et al., 2014), which is believed to make the processing within the brain closer to a feedforward artificial neural network.

The researchers found that adversal samples that can be transferred between multiple computer vision models can successfully influence the perception of human observers, thus discovering a new kind of illusion that can be applied to both computer vision models and human brains.


Counter sample

Goodfellow et al. (2017) defined the adversarial sample as “the input of the machine learning model, an attacker who deliberately designs and causes the model to make mistakes”. In visual object recognition, the adversation sample is usually the natural image after adding small perturbation, which can destroy the prediction of machine learning classifier. Figure 2A is a typical example: by adding a small perturbation to the panda image, the model misclassifies it as a gibbon. The disturbance is usually too small to be detected (that is, it cannot be saved as a standard 8-bit PNG file because the disturbance is less than 1/255 of the dynamic range of a pixel). The perturbation is not noise, it relies on carefully selected structures based on neural network parameters, but even when scaled up to a perceptible level, human observers will not recognize any meaningful structure. Note: Adversal samples also exist in areas such as malware detection (Grosse et al., 2017), but this paper focuses on image classification tasks.

Two aspects of the adversarial sample definition are particularly important to this study:

1. Counter samples are designed to cause errors. They are not designed to deviate from human judgment. If the adversarial sample is contrary to the human output, there can be no adversarial sample for humans. Some tasks have objectively correct answers, such as predicting whether an input number is prime. The researchers want the model to get the right answer, not a human one (and humans may not be good at determining whether a number is prime, given time constraints). Defining what constitutes an error in visual object recognition is difficult because an image may no longer correspond to a photograph of a real physical scene after adding perturbations, and defining an image’s real object class is philosophically difficult because an image of a real object is not a real object. The study hypothesized that the counter image was misclassified when the output label was different from the label humans provided for the clean image (i.e. the starting point of the counter sample). The researchers make small adversarial disturbances and assume that these are not enough to change the truth class.

2. Adversarial samples are not imperceptible. If so, it is by definition impossible to produce an adversarial sample against a human, because changing human classifications also means changing the content of human perception. Moreover, there are many fields in which it is impossible to make imperceptible changes (such as natural language processing, where changing even a single character is perceptible). Computer vision algorithms are often fooled by adversarial samples that humans cannot perceive, but this is not part of the general definition (see Figures 2b and C).


Model integration

The researchers constructed an ensemble of K CNN models (k = 10), which were trained on ImageNet. Each model is an example of the following architecture: Inception V3, Inception V4, Inception ResNet V2, ResNet V2 50, ResNet V2 101, ResNet V2 152 (Szegedy et al., 2015; 2016; He et al., 2016). To better match the initial processing of the human visual system, the researchers pre-added a retinal layer to each model input, which incorporated some of the transformations performed by the human eye. In this layer, researchers performed image eccentricity-dependent obfuscation to approximate the input that human subjects’ visual cortex received through the retinal lattice. See Appendix B for model details. Eccentricity-dependent spatial resolution measurements from Van Essen&Anderson (1995) (based on the rhesus monkey vision system) and known geometric angles between observer and screen were used to determine the degree of spatial ambiguity at each image location, Thus, CNN is restricted to information accessible to human visual system. This layer is fully differentiable and allows gradients to propagate back across the network when running counter attacks.



Figure 3: Experimental Settings and tasks. (a) Apparatus for setting up and recording experiments. (b) Mandate structure and timing. Human observers were asked repeatedly to identify which of two categories the image presented for a short time belonged to.

Figure 5: Example of an adversarial image. (a) Images of dogs that are often mistaken for cats by human observers in time-sensitive Settings. (b) Similar to A, the spider image is recognized as a snake. Figure right: Comparison of classification accuracy of the adversarial sample in short presentation and long presentation. (c) Instance of counter attack operation type.

Thesis: Adversarial Examples that Fool both Human and Computer Vision



Links to papers: arxiv.org/abs/1802.08…

Abstract: Machine learning models are susceptible to adversarial samples: small changes in the image can cause computer vision models to make mistakes, such as identifying a school bus as an ostrich. But it remains to be seen whether humans can avoid the same mistakes. Here, we create the first human deception adversarial samples using recent techniques that migrate adversarial samples from computer vision models with known parameters and architecture to other models without parameters and architecture, and modify the model to better match the initial processing of the human visual system. We find that adversarial samples that effectively migrate between computer vision models influence the classification results of human observers in time-constrained environments.