Neural Cleaning: Backdoor Attack Identification and mitigation in Neural Networks

** This article will take you to understand the backdoor knowledge of deep neural network. The author proposes a reliable and scalable DNN backdoor attack detection and mitigation system, which is an in-depth interpretation of the understanding against samples and neural network backdoor attacks.

This article is from [Thesis Reading] (02) SP2019-Neural Cleanse Neural Network Backdoor Attack Identification and Mitigation. Author: Eastmount.

Neural Cleanse: Identifying and mitigating Neural Cleanse Attacks in Neural Networks Identifying and Mitigating Backdoor Attacks in Neural Networks Bolun Wang∗†, Yuanshun Yao†, Shawn Shan†, Huiying Li†, ‡ Bimal Viswanath‡, Haitao Zheng†, Ben Y. Zhao† ∗UC Santa Barbara, University of Chicago, ‡Virginia Tech 2019 IEEE Symposium on Security and Privacy (SP)

The lack of transparency in deep neural networks (DNNs) makes them vulnerable to backdoor attacks, where hidden associations or triggers override normal classifications to produce unexpected results. For example, a model with a backdoor would always recognize a face as Bill Gates if a particular symbol was present in the input. Backdoors can remain hidden indefinitely until activated by input, and pose serious security risks for many security or security-related applications, such as biometric systems or auto autonomous driving. This paper presents the first reliable and scalable DNN backdoor attack detection and mitigation system. The technique identifies backdoors and reconstructs possible triggers, identifying multiple mitigation measures through input filters, neuronal pruning, and unlearning. In this paper, two types of backdoor identification methods are identified based on previous work to demonstrate their validity through extensive experiments with various DNNs. The technique also proved robust to some variants of backdoor attacks.

I. introduction

Deep neural networks (DNNs) play an integral role in a wide range of critical applications, from classification systems such as face and iris recognition, to voice interfaces for home assistants, to creating artistic images and guiding self-driving cars. In the field of security space, deep neural networks have applications ranging from malware classification [1],[2] to binary reverse engineering [3],[4] and network intrusion detection [5].

Face recognition
Iris recognition
Home assistant voice interface
autopilot
Malware classification
Reverse engineering
Network Intrusion Detection
…

Despite these surprising advances, a lack of interpretability is widely recognized as a key barrier to wider acceptance and deployment of deep neural networks. In essence, DNN is a digital black box unfit for human understanding. Many believe that the need for interpretability and transparency of neural networks is one of the greatest challenges in computing today [6],[7]. Despite intense interest and team efforts, only limited progress has been made in definitions [8], frameworks [9], visualizations [10] and limited experiments [11].

A fundamental problem with the black-box nature of deep neural networks is that their behavior cannot be thoroughly tested. For example, given a face recognition model, it is possible to verify that a set of test images are correctly recognized. But can untested images or unknown faces be correctly identified? Without transparency, there is no guarantee that the model will behave as expected with untested inputs.

DNNs faults:

Lack of interpretability
Vulnerable to backdoor attacks
The backdoor can remain hidden indefinitely until it is activated by some trigger in the input

In this context, deep neural network [12],[13] may appear back door or “Trojans”. In short, backdoors are hidden patterns trained into deep neural network models that produce unexpected behaviors that cannot be detected unless activated by input from some kind of “trigger.” For example, a face recognition system based on deep neural networks is trained to recognize the face as “Bill Gates” whenever it detects a specific symbol in or near it, or a sticker that can turn any traffic sign into a green light. Backdoors can be inserted during training, for example by a “malicious” employee of the company responsible for training the model, or after initial model training, for example if someone modifies and publishes an “improved” version of the model. When done well, these backdoors have little effect on the classification results of normal input, making them almost impossible to detect. Finally, previous work has shown that backdoors can be inserted into trained models and are effective in deep neural network applications, from face recognition, speech recognition, age recognition, to autonomous driving [13].

This paper describes our experiments and results in investigating and developing defensive backdoor attacks in deep neural networks. Given a trained DNN model, the goal is to determine whether there is an input trigger that produces incorrect classification results when input is added. What this trigger looks like and how it can be mitigated (removed from the model) is explained in the rest of the paper, which refers to the input with the trigger as adversarial input. This paper makes the following contributions to the defense of backdoor in neural network:

A new and scalable hidden trigger detection and reverse engineering technique is proposed and embedded in deep neural network.
The techniques presented in this paper are implemented and validated in a variety of neural network applications, including handwritten number recognition, traffic sign recognition, face recognition with a large number of tags, and face recognition using transfer learning. We reproduce backdoor attacks as described in previous work [12][13] and use them in our tests.
In this paper, three mitigation methods are developed and validated through detailed experiments: I) early filters for countering inputs, which identify inputs with known triggers; Ii) model repair algorithm based on neuron pruning and iii) model repair algorithm based on unlearning.
More advanced variants of backdoor attacks are identified, their impact on the detection and mitigation techniques presented in this paper is evaluated experimentally, and optimizations to improve performance are proposed if necessary.

As far as we know, the first piece of work in this paper is to develop robust and common techniques to detect and mitigate backdoor attacks (Trojan horses) in DNNs. Extensive experiments have shown that the detection and mitigation tools presented in this paper are very effective against different backdoor attacks (with and without training data), different DNN applications, and many complex attack variants. Although the interpretability of deep neural networks remains an elusive goal, we hope that these techniques can help limit the risks of using opaque trained DNN models.

II. Background: Backdoor injection in DNNs

Deep neural networks are now often referred to as black boxes because the trained model is a series of weights and functions that do not match any intuitive features of the classification function it embodies. Each model is trained to take a given type of input (such as face images, handwritten digital images, traces of network traffic, blocks of text) and perform some computational inference to generate a predefined output label. For example, the label of the name of the person whose face is captured in the image.

Define the back door. In this case, there are several ways to train the hidden, unexpected classification behavior as DNN. First, a wrong visitor to DNN might insert an incorrect tag association (for example, a picture of Obama’s face being tagged with Bill Gates’s), either during training or on a trained model. We consider such attacks to be variants of known attacks (against viruses) rather than backdoor attacks.

A DNN backdoor is defined as a hidden pattern in a trained DNN that produces unexpected behavior if and only if a particular trigger is added to the input. Such a backdoor does not affect the model’s normal performance of clean input without triggers. In the context of a categorizing task, the backdoor classifies any input error into the same specific target tag when an associative trigger is applied to the input. Input samples that should be classified as any other tag are “overwritten” in the presence of a trigger. In the visual domain, a trigger is usually a specific pattern on an image (e.g., a sticker) that may misclassify images of other labels (e.g., wolves, birds, dolphins) into target labels (e.g., dogs).

Note that a backdoor attack is different from an adversarial attack against DNN [14]. An adversarial attack produces a misclassification by making a specific modification to an image, in other words, making the modification ineffective when applied to other images. Conversely, adding the same backdoor trigger causes arbitrary samples from different tags to be misclassified into the target tag. In addition, while the back door must inject the model, it can be successful to counter the attack without modifying the model.

Supplement knowledge – against samples

An adversarial sample is an input sample that can be tweaked so that the machine learning algorithm can produce the wrong result. In image recognition, it can be understood that the images originally classified as one class (such as “panda”) by a convolutional neural network (CNN) are suddenly mistakenly divided into another class (such as “gibbon”) after very subtle changes that human eyes cannot detect. For example, if the driverless model is attacked, the Stop sign may be recognized by the car as going straight and turning.

Previous backdoor attacks work. GU et al. proposed BadNets, which can inject backdoors through malicious (Poisoning) training data sets [12]. Figure 1 shows a high-level overview of the attack. The attacker first selects a target label and trigger pattern, which is a collection of pixels and associated color intensities. Patterns may resemble arbitrary shapes, such as squares. Next, a random subset of the training images are labeled with trigger patterns and their labels are modified to target labels. The modified training data is then used to train DNN, thus injecting the back door. Since the attacker has full access to the training process, the attacker can change the structure of the training, such as the learning rate, the ratio of modified images, etc., so that the DNN attacked by the backdoor has good performance on clean and adversarial input. BadNets show an attack success rate of more than 99% (the percentage of antagonistic inputs that are misclassified) and do not affect model performance in MNIST [12].

Liu et al. proposed a relatively new method (Trojan attack) [13]. They do not rely on access to the training set. Instead, trigger generation is improved by not using arbitrary triggers, and triggers are designed according to the maximum response values of DNN specific internal neurons. This creates a stronger connection between the trigger and the internal neuron, and can inject an effective backdoor with fewer training samples (> 98%).

To our knowledge, [15] and [16] are the only evaluated defenses against backdoor attacks. Assuming the model has been infected, neither approach provides backdoor detection or identification. Fine pruning [15] removes backdoors by pruning redundant neurons and is less useful for normal classification. When we applied it to one of our models (GTSRB), we found that it rapidly reduced the performance of the model. Liu et al. [16] proposed three defensive measures. This approach generates high complexity and computational costs and is evaluated only on MNIST. Finally, [13] provides some brief ideas about detection ideas, while [17] reports some ideas that have proved ineffective.

So far, no universal detection and mitigation tool has proven effective against backdoor attacks. We have taken an important step in this direction by focusing on the task of categorization in the visual domain.

III. Overview of methods to deal with backdoor in this paper

Next, this paper gives the basic understanding of how to establish defense against DNN backdoor attack. The attack model is first defined, followed by the assumptions and objectives of this paper, and finally outlined the proposed techniques for identifying and mitigating backdoor attacks.

A. Attack model

Our attack model is consistent with existing attack models, such as BadNets and Trojan horse attacks. The user gets a trained DNN model that has been infected with the backdoor and inserts the backdoor during training (by outsourcing the model training process to a malicious or insecure third party), or it is added after training by a third party and then downloaded by the user. The DNN embedded in the backdoor performs well in most normal input cases, but displays targeted error classification when the input contains the attacker’s predefined triggers. Such a backdoor DNN would produce the expected results on the test samples available to the user.

If the backdoor results in a targeted misclassification of the output label (class), that output label (class) is considered infected. One or more tags may be infected, but it is assumed that most tags remain unaffected. In essence, these backdoors prioritize stealth, and an attacker is unlikely to risk detection by embedding many backdoors in a single model. An attacker can also use one or more triggers to infect the same target tag.

B. Defensive assumptions and objectives

We make the following assumptions about the resources available to defenders. First, assume that the defender has access to a trained DNN, along with a set of correctly tagged samples, to test the performance of the model. Defenders can also use computing resources, such as gpus or GPU-based cloud services, to test or modify DNN.

Objectives: Our defense efforts mainly consist of three specific objectives.

Detecting backdoor: We want to make a dichotomous judgment about whether a given DNN has been infected by a backdoor. If infected, we want to know what tag the backdoor attack is targeting.
Identifying backdoor: We want to identify the expected operations of the backdoor and, more specifically, Reverse Engineer the triggers used in the attack.
Mitigating Backdoor: Ultimately we want to make Backdoor ineffective. Two complementary approaches can be used to achieve this. First, we build an active filter that detects and blocks any incoming counter input submitted by the attacker (see section VI-A). Second, you want to “patch” DNN to remove backdoors without affecting its ability to classify normal input (see sections VI-b and VI-C for details).

Consider viable alternatives: There are many viable alternatives to the approach we are taking, from higher levels (why patch models) to specific techniques for identification. Some of them are discussed here.

At the senior level, alternatives to mitigation measures are considered first. Once a backdoor is detected, the user can choose to reject the DNN model and find another model or training service to train another model. However, this can be difficult in practice. First, finding new training services is inherently difficult, given the resources and expertise required. For example, a user can be limited to a specific teacher model that the owner uses to transfer learning, or may have unusual tasks that are not supported by other alternatives. Another scenario is that the user can only access infected model and validation data, but not the raw training data. In this case, repeated training is out of the question and mitigation is the only option.

At the detailed level, we consider some methods of searching for “signatures” in backdoors, some of which are simply used to find potential defenses in existing work [17],[13]. These methods rely on a strong causal relationship between the backdoor and the selected signal. In the absence of analytical results in this area, they have proved challenging. First, scanning inputs (such as input images) is difficult because triggers can take arbitrary shapes and can be designed to avoid detection (such as small pixels in corners). Second, analyzing DNN internals to detect anomalies in intermediate states is notoriously difficult. Interpreting DNN prediction and activation of the inner layer remains an open research challenge [18], and it is difficult to find a heuristic algorithm across DNN generalizations. Finally, the Trojan attack paper presents looking at misclassified results that may skew toward infected tags. This approach is problematic because backdoors can affect the classification of normal inputs in unexpected ways and may not show a consistent trend across the DNN. In fact, the experiments in this paper found that this approach could not detect backdoors in our infection model (GTSRB).

C. Defense ideas and overview

Next, we describe a high-level approach to detecting and identifying backdoors in DNN.

Key idea. The idea behind our technology comes from the basic feature of the backdoor trigger, which generates A classification of the target tag A, regardless of which tag the normal input belongs to. Think of the classification problem as creating partitions in a multidimensional space, capturing some features for each dimension. The backdoor trigger then creates A “shortcut” that belongs to the region of the tag space in the region of A.

Figure 2 illustrates the abstraction of this concept. It presents A simplified one-dimensional classification problem with three labels (label A for circle, label B for triangle, and label C for square). The diagram shows the positions of their samples in the input space and the decision boundaries of the model. The infected model shows the same space, and the trigger causes it to be classified as A. Triggers effectively produce another dimension in the region belonging to B and C, and any input that contains A trigger has A higher value in the trigger dimension (gray circle in the infected model) and is classified as A, while it would result in classification as B or C if no other characteristics were considered.

The basic feature of backdoor flip-flop is to generate A classification result of the target tag A regardless of which tag the normal input belongs to. Key Intuition: Think of the classification problem as creating partitions in a multidimensional space, capturing some features for each dimension. The backdoor trigger then creates A “shortcut” from within the area of space belonging to the tag to the area belonging to A.

Intuitively, we detect these shortcuts by measuring the minimum amount of disturbance required for all inputs from each region to the target region. In other words, what is the minimum increment required to convert any input labeled B or C to an input labeled A? In an area with A trigger shortcut, the amount of interference required to classify this input as A is limited by the size of the trigger (the trigger itself should be fairly small to avoid detection), regardless of where the input is located in space. The infected model in Figure 2 shows A new boundary along the “trigger dimension” so that any input in B or C can be moved A short distance and thus misclassified as A. This leads to the following observation about backdoor triggers.

Observation 1: Let L represent a set of output labels in the DNN model. Consider a label Li∈L and a target label Lt∈L, and I ≠t. If there is a trigger (Tt) that causes it to be misclassified as Lt, then all inputs labeled Li (whose correct label is Li) need to be converted to the minimum perturbation it needs to be classified as Lt constrained by the trigger size, i.e. :

Since triggers are valid when added to any input, this means that fully trained triggers will effectively add this extra trigger dimension to all inputs of the model, regardless of their actual label. So we have the formula:

Where, represents the minimum amount of disturbance required for any input to be classified as Lt. To avoid detection, the disturbance should be small. It should be significantly less than the value needed to convert any input label to an uninfected label.

Observation 2: If the backdoor flip-flop Tt exists, then there is:

Therefore, trigger Tt can be detected by detecting abnormally low values of δ in all output labels. We note that under-trained triggers may not effectively affect all output tags. It is also possible that an attacker intentionally restricts backdoor triggers to only certain types of inputs (perhaps as a countermeasure against detection). With this in mind, a solution is provided in Section 7.

Detect the back door. The main intuition for detecting backdoors in this article is that it requires much smaller changes that cause misclassification to target tags in an infected model than in other uninfected tags (see Formula 1). Therefore, we iterate over all the labels of the model and determine whether any of the labels need minimal modification to enable misclassification. The whole system consists of the following three steps.

Step 1: For a given tag, we treat it as a potential target tag for a target backdoor attack. This paper designs an optimization scheme to find the “minimum” triggers needed to classify errors from other samples. In the visual domain, this trigger defines the smallest set of pixels and their associated color intensity, resulting in misclassification.
Step 2: Repeat Step 1 for each output label in the model. For one who has a N = | L | model of a label, it will produce N potential “trigger”.
Step 3: After counting N potential triggers, we measure the size of each trigger by the number of pixels of each candidate trigger, that is, the number of pixels the trigger will replace. We run an outlier detection algorithm to detect if any candidate trigger object is significantly smaller than the other candidates. A significant outlier represents a real trigger whose tag match is the target tag for a backdoor attack.

Identify backdoor triggers. Through the above three steps, you can determine whether there is a back door in the model. If so, tell us the target tag. Step 1 also generates a trigger responsible for the back door, which effectively misclassifies samples of other tags into the target tag. This article considers this trigger to be a “reverse engineered trigger” (reverse trigger for short). Note that the approach in this paper is looking for the minimum trigger value needed to induce a backdoor, which may actually look slightly smaller than the trigger the attacker has trained to model. We will compare the visual similarities between the two in Part V, section C.

Lighten the back door. Reverse engineering triggers helps us understand how backdoors can misclassify samples within the model, for example, which neurons are activated by the triggers. Use this knowledge to build an active filter that can detect and filter all adversarial inputs that activate backdoor related neurons. In this paper, two methods are designed to remove backdoor associated neurons/weights from the infected model and patch the infected model so that it has strong robustness against the image. Detailed methods of back-door mitigation and related experimental results will be further discussed in Section 6.

IV. Detailed detection methods

The technical details of detecting and reverse-engineering triggers are described next. We begin by describing the process of flip-flop reverse engineering, which is used as the first step of detection to find the minimum trigger for each tag.

Reverse engineer the trigger.

First, the generic form of trigger injection is defined:

A(·) represents the function that applies the trigger to the original image X. δ represents the pattern of the trigger, which is a three-dimensional matrix (including height, width, and color channel) with pixel color grayscale equal to the dimension of the input image. M represents a 2D matrix of masks that determine how much of the original image the trigger can cover. Considering the two-dimensional mask (height, width), the same mask value is applied to all color channels of the pixel. Values in the mask range from 0 to 1. When mi,j=1 for a particular pixel (I,j), the trigger completely overwrites the original color (), and when mi,j=0, the color of the original image is not modified (). Previous attacks used only binary mask values (0 or 1) and thus fit the general form of the formula. This continuous mask form makes the mask unique and helps integrate it into optimization goals.

Optimization has two goals. For the target tag (YT) to be analyzed, the first goal is to find a trigger (M, δ) that will misclassify the clean image as YT. The second goal is to find a “clean” trigger that only modifies a limited part of the image. In this paper, the L1 norm of mask M is used to measure the size of flip-flop. At the same time, it is expressed as a multi-objective optimization task by optimizing the weighted sum of two objectives. Finally, the following formula is formed.

F (·) is the prediction function of DNN; L (·) is the loss function for measuring the classification error, and also represents the cross entropy in the experiment. λ is the weight of the second objective. A smaller λ has a lower weight on the control of trigger size, but a higher success rate of misclassification. In our experiments, the optimization process dynamically adjusts λ to ensure that more than 99% of clean images can be successfully misclassified. We use the ADAM optimizer [19] to solve the above optimization problems.

X is a clean set of images that we use to solve the optimization task. It comes from a clean set of data that users can access. In the experiment, the training set is used and input into the optimization process until convergence occurs. Alternatively, the user can sample a small portion of the test set.

Detect backdoors through outliers.

The reverse engineered trigger and its L1 norm for each target tag are obtained by using the optimization method. Triggers and associated labels are then identified, and these triggers appear in the distribution as outliers with a smaller L1 norm. This corresponds to step 3 in the detection process.

To detect outliers, this paper uses a technique based on median absolute deviation. This technique is elastic in the presence of multiple outliers [20]. First, it calculates the absolute deviation between all data points and the median, called MAD, while providing a reliable measure of the distribution. Then, the anomaly index of the data points is defined as the absolute deviation of the data points and divided by MAD. When the basic distribution is assumed to be normal, a constant estimator (1.4826) is used to normalize the abnormal index. Any data point with an anomaly index greater than 2 has an anomaly probability greater than 95%. This paper marks any outlier index greater than 2 as outliers and infected values, thus focusing only on outliers at the lower end of the distribution (low L1 norm labels are more vulnerable).

Detect back doors in models with lots of labels.

In DNN with a large number of tags, testing may incur high cost calculations proportional to the number of tags. Assuming a YouTube face recognition model with 1283 tags [22], our detection method takes an average of 14.6 seconds per tag, with a total cost of about 5.2 hours on an Nvidia Titan X GPU. This time can be reduced by a constant factor if processing is parallelized across multiple Gpus, but the overall computation is still a burden for resource-constrained users.

Instead, this paper proposes a low-cost detection scheme for large models. We observe that the optimization process (Formula 3) finds an approximate solution in the first few iterations of gradient descent and uses the remaining iterations to fine-tune the triggers. As a result, the optimization process was terminated early to narrow down the list of candidates for a small number of potentially infected tags. Resources were then pooled to optimize these suspect tags comprehensively, and a small random set of tags was also fully optimized to estimate MAD values (the dispersion of L1 norm distributions). This modification greatly reduces the number of tags that need to be analyzed (most tags are ignored), thus greatly reducing the computation time.

V. Experimental verification of backdoor detection and trigger recognition

In this section, we describe evaluating our defense techniques against BadNets and Trojan horse attacks in multiple classification application domains.

A. Experimental equipment

For BadNets evaluation, this paper uses four experimental tasks and injects backdoors into their data sets, including:

(1) Handwritten digit Recognition (MNIST)
(2) Traffic Sign Recognition (GTSRB)
(3) Face recognition with a lot of tags (YouTube Face)
(4) Face recognition based on Complex Model (PubFig)

For Trojan Horse attack assessment, this paper uses two infected face recognition models, which were used in the original work and shared by the author, namely:

Trojan Square
Trojan Watermark

Details of each task and associated data set are described below. Table I includes a short summary. To be more concise, we have included more detailed information about training configurations in Appendix Table VI, and detailed their model architectures in Tables VII, VIII, IX, and X.

Handwritten number identification (MNIST) This task is commonly used to assess the vulnerability of DNN. The goal is to recognize 10 handwritten digits (0-9) in grayscale images [23]. The dataset contains 60K training images and 10K test images. The model used is a standard 4-layer convolutional neural network (see Table VII). This model has also been evaluated in BadNets work.
Traffic Sign Identification (GTSRB) This task is also commonly used to assess DNN attacks. Its task is to identify 43 different traffic signs to simulate the application scenarios of autonomous vehicles. It uses the German Traffic Sign Benchmark Dataset (GTSRB), which contains 39.2K color training images and 12.6K test images [24]. The model consists of six convolution layers and two fully connected layers (see Table VIII).
YouTube Face is a task that uses Face recognition to simulate a security screening scenario in which it tries to recognize the faces of 1,283 different people. The large size of tag sets increases the computational complexity of detection schemes and is a good choice for evaluating low-cost detection methods. It uses the Youtube face dataset, which contains images extracted from videos of different people on Youtube [22]. We applied the preprocessing used in previous work to obtain a dataset containing 1283 labels, 375.6K training images and 64.2K test images [17]. Based on previous work, DeepID architecture consisting of 8 layers is also selected in this paper [17][25].
Facial Recognition (PubFig) was a task similar to YouTube’s face and identified 65 faces. The data set used included 5850 color training images with a resolution of 224×224 and 650 test images [26]. The limited size of training data makes it difficult to train models for such complex tasks from scratch. Therefore, we use transfer learning and fine-tune the last 4 layers of the teacher model through the training set in this paper using a 16-layer VGG teacher model (Table X). This task helps to evaluate BadNets attacks using a large complex model (16 layers).
Trojan Square and Trojan Watermark, two models based on Trojan Horse attacks, were derived from the VGG face model (layer 16), which was trained to recognize the faces of 2622 people [27], [28]. Similar to YouTube’s face, these models also require low-cost detection schemes because of the large number of tags. It is important to note that the two models are the same in an uninfected state, but different in a backdoor injection (discussed below). The original dataset contained 2.6 million images. Since the author did not specify the exact segmentation of training and test set, this paper randomly selected a subset of 10K images as the test set of the next part of the experiment.

Badnet attack configuration. This paper follows the attack method of injecting backdoor into training proposed by BadNets[12]. For each application area we tested, a target tag was randomly selected and the training data was modified by injecting a portion of adversal input labeled as the target tag. Adversarial input is generated by applying a trigger to the clean image. For a given task and data set, change the proportion of adversarial input in training, so that the attack success rate can reach more than 95%, while maintaining a high classification accuracy. The proportion ranges from 10 to 20 per cent. Then the improved training data is used to train the DNN model until convergence.

Triggers are white squares located in the lower right corner of the image, they are selected to be required not to cover any important parts of the image, such as faces, logos, etc. Select the shape and color of the trigger to ensure that it is unique and does not happen again in any input image. To keep the triggers unobtrussed, we limited the size of the triggers to about 1% of the entire image, i.e. 4×4 for MNIST and GTSRB, 5×5 for YouTube faces, and 24×24 for Pub images. Examples of trigger and confrontational images are shown in the appendix (Figure 20).

In order to measure the performance of backdoor injection, we calculate the classification accuracy of test data and the success rate of attack when applying trigger to test image. “Attack success rate” measures the percentage of adversarial images classified as target tags. As a benchmark, the paper also measures the classification accuracy of clean versions of each model (that is, using the same training configuration compared to clean data sets). Table II reports the final performance of each attack on the four tasks. The success rate of all backdoor attacks is above 97%, which has little influence on classification accuracy. In PubFig, the classification accuracy decreased the most by 2.62%.

Trojan horse attack configuration. Trojan horses are used directly to attack infected Trojan Square and Trojan Watermark models shared by authors at work [13]. The trigger used in the Trojan square is a square in the lower right corner that is 7% of the size of the entire image. The Trojan watermark uses a trigger composed of text and symbols that is similar to a watermark and is 7% of the size of the entire image. The two backdoors have a 99.9 percent and 97.6 percent success rate, respectively.

B. Test performance

Check to see if infected DNN can be found, as described in section IV. Figure 3 shows the anomaly index for all six infected people and the original cleaning model they match, including BadNets and Trojan horse attacks. All infection models had an outlier index greater than 3, indicating that the probability of infection was greater than 99.7%, and the previously defined threshold for the outlier index of infection was 2 (Section IV). Meanwhile, all clean models had an anomaly index of less than 2, meaning that the outlier detection method correctly marked them as clean.

To obtain the position of infected labels in the L1 specification distribution, the distribution of uninfected and infected labels is plotted in Figure 4. For the distribution of uninfected markers, the minimum and maximum L1 norm, 25/75 quartile and median were plotted. Note that only one tag is infected, so there is a L1 specification data point to represent the infected tag. Infected tags are always much lower than the median and much smaller than the minimum of uninfected tags compared to the “distribution” of uninfected tags. This conclusion further verifies our conjecture that the size of the trigger L1 norm required to attack infected tags is smaller than the value required to attack uninfected tags.

Finally, the method in this paper can also determine which tags are infected. Simply put, any label with an anomaly index greater than 2 is marked as infected. In most models, such as MNIST, GTSRB, PubFig, and Trojan Watermark, infected tags are marked and only infected tags are marked as counter tags without any false positives. But on Youtube Face and Trojan Square, in addition to labeling infected tags, the uninfected tags of 23 and 1 were incorrectly labeled as antagonistic. In fact, this is not a problematic situation. First, these false positive tags are identified because they are more vulnerable than other tags, and this information is useful to model users. Second, in subsequent experiments (section C of Part VI), this paper proposes mitigation techniques that will patch all vulnerable tags without affecting the classification performance of the model.

Low cost testing performance. Figures 3 and 4 show the results of previous experiments using low-cost detection schemes in Trojan Square, Trojan Watermark, and a clean VGG-face model (both with 2622 tags). However, in order to better measure the performance of low-cost detection methods, this paper takes Youtube face as an example to evaluate the computational cost reduction and detection performance.

This article starts with a more detailed description of a low-cost detection setup for YouTube faces. To identify a small number of potentially infected candidates, start with the first 100 tags in each iteration. Labels are arranged according to the L1 norm (i.e., labels with a smaller L1 norm receive a higher grade). Figure 5 shows how the first 100 labels change in different iterations by measuring the overlap of labels in the red curve of subsequent iterations. After the first 10 iterations, the set overlap is mostly stable, fluctuating around 80. This means that the first 100 tags can be selected by running the complete optimization over several iterations and ignoring the remaining tags. More conservatively, the operation is terminated when the number of overlapping labels remains greater than 50 for 10 iterations. So how accurate are our early termination plans? Similar to the full cost plan, it correctly flagged infected labels and resulted in nine false positives. The black curve in Figure 5 tracks the rank of infected tags during the iteration, and the ranking stabilizes after about 12 iterations, close to our earlier 10 termination iterations. In addition, the anomaly index for the low-cost and full-cost scenarios was very similar, at 3.92 and 3.91, respectively.

This method greatly reduces the computation time and requires 35 minutes for early termination. After termination, the full optimization process for the first 100 labels was then run, along with another random sample of 100 labels to estimate the L1 gauge distribution of uninfected labels. The process takes another 44 minutes, and the entire process takes 1.3 hours, a 75 percent reduction from the overall plan.

C. Primitive trigger recognition

When identifying an infected tag, our method also reverse-engineering a trigger, resulting in a misclassification of that tag. There is a question of whether the reverse engineered trigger “matches” the original trigger, the one used by the attacker. If there is a strong match, effective mitigation can be designed using reverse engineered triggers.

This article compares the two triggers in three ways.

End-to-end effectiveness is similar to that of the original trigger, with reverse triggers leading to a high attack success rate, actually higher than the original trigger. The attack success rate of all reverse triggers was greater than 97.5%, while that of original triggers was greater than 97.0%. Not surprisingly, consider how triggers can be inferred using a scheme that optimizes error classification (Section 4). Our detection method effectively identifies the minimum trigger that produces the same misclassification result.
Visual similarity Figure 6 compares the original and reverse flip-flops (M ·∆) in four BadNets models. We find that the reverse flip flop is roughly similar to the original flip flop. In all cases, the reverse flip flop is displayed in the same position as the original flip flop. However, there is still a small difference between the reverse flip and the original flip. For example, in MNIST and PubFig, the reverse flip flop is slightly smaller than the original flip flop, missing a few pixels. In models using color images, the reverse flip flop has many non-white pixels. These differences can be attributed to two reasons. First, when the model is trained to recognize triggers, it may not know the exact shape and color of the trigger. This means that the most “efficient” way to trigger a backdoor in the model is not a raw injection trigger, but a slightly different form. Second, our optimization goal is to punish larger triggers. Therefore, during optimization, some redundant pixels in the trigger will be cut off, resulting in a smaller trigger. Combined, the whole optimization process found a backdoor trigger that was more “compact” than the original trigger.

In both Trojan Horse attack models, the mismatch between the reverse trigger and the original trigger becomes more pronounced, as shown in Figure 7. In both cases, the reverse trigger appears at different places in the image and is visually different. They are at least an order of magnitude smaller than the original triggers and much more compact than the BadNets model. It turns out that our optimization scheme found a more compact trigger in the pixel space that could use the same backdoor to achieve a similar end-to-end effect. This also highlights the difference between Trojan horse attacks and BadNets. Since Trojan horse attacks target specific neurons in order to connect input triggers to misclassified outputs, they are not immune to side effects on other neurons. The result is a broader attack that can trigger a wider range of triggers, the least of which is reverse engineering.

Similarity of neuron activation Further investigate whether the inputs of reverse and original triggers have similar neuron activation in the inner layer. Specifically, examine the neurons at the second to last layers, as this layer encodes the relevant representative patterns in the input. Identify the most relevant neuron backdoor by feeding clean and antagonistic images and observing the differences in neuron activation at the target layer (second to last layer). The neurons were ranked by measuring differences in their degree of activation. Experience shows that the top 1% of neurons are enough to inject back doors, in other words, if the top 1% of neurons are kept and the rest are covered (set to zero), the attack is still effective.

Neuron activation is considered “similar” if the first 1% of neurons activated by the original trigger are also activated by the reverse engineered trigger instead of the clean input. Table III shows the average activation of the top 1% neurons when 1000 clean and antagonistic images were randomly selected. In all cases, the activation of neurons in the confrontational images was anywhere from three to seven times higher than in the clean images. The experiments above showed that when input was added, both the reverse and original triggers activated the same backdoor neurons. Finally, neural activation is used as a way to mitigate the technical backdoor in Part 6.

VI. Mitigation of the rear door

When a back door is detected, mitigation techniques should be applied to remove the back door while preserving the performance of the model. This article describes two complementary technologies. First, create a filter for adversarial input that recognizes and rejects any input with triggers, patching the model. Depending on the application, this approach can also be used to assign “safe” output labels to opposing inputs that will not be rejected. Second, DNN is patched so that it does not respond to detected backdoor triggers. This paper describes two repair methods, one using neuronal pruning and the other using Unlearning.

A. Filter used to detect adversarial input

In section C of Part V, experimental results show that neuronal activation is a better way to capture the similarities between primitive and reverse-engineered triggers. Therefore, the establishment of reverse trigger filter based on neuron activation profile can be used to measure whether the top 1% neurons activated are in the second layer to the last layer. When given certain inputs, the filter identifies potential antagonistic inputs as active contour inputs with values higher than a certain threshold. The activation threshold can be calibrated using tests with clean inputs (inputs known to have no triggers). Using clean images from the test set, this article evaluates the performance of the filter by creating an adversarial image by applying the original trigger to the test image (in a 1:1 ratio). False positive rate (FPR) and false negative rate (FNR) were calculated when different thresholds were set for average neuron activation, and the results were shown in Figure 8. In the case of 5% FPR, the four BadNets models have achieved high filtering, and their FNR values are all less than 1.63%. At the same time, Trojan horse attack models are harder to filter out, probably because of differences in neuronal activation between reverse and original triggers. FNR was higher when FPR was less than 5%, 4.3% and 28.5% when FPR was 5%. Finally, we observe the results of choosing different injection methods between Trojan horse attacks and BadNets.

B. Neuronal pruning to repair DNN

Two techniques are proposed for practical repair of the infection model. In the first approach, reverse triggers are used to help identify and remove backdoor related components in the DNN, such as neurons. This paper suggests cutting backdoor related neurons from DNN, that is, setting the output value of these neurons to 0 during reasoning. The difference between clean and adversarial inputs is then followed by the use of reverse triggers to sort the target neurons. Targeting the second to last layer, pruning neurons in the order of first in the highest grade gives priority to those inputs that show the greatest activation gap between clean inputs and antagonistic inputs. To minimize the impact on the classification accuracy of the cleaning input, the pruning is stopped when the pruning model no longer responds to the reverse trigger.

Figure 9 shows the classification accuracy and attack success rate when pruning different proportions of neurons in the GTSRB. Pruning 30% of neurons reduces the success rate to 0%. Note that the attack success rate of reverse triggers follows a similar trend to that of the original trigger and can therefore be a good signal of approaching the defensive effect of the original trigger. Meanwhile, the classification accuracy decreased only 5.06 percent. Defenders can achieve a smaller reduction in classification accuracy by reducing the attack success rate, as shown in Figure 9.

It should be noted that in section C of Part V, the identification of the top 1% of neurons is sufficient to cause misclassification. In this case, however, we had to remove nearly 30% of the neurons to effectively mitigate the attack. This can be interpreted as a large amount of redundancy in neural pathways in DNNs [29]. Even if the top 1% neurons are removed, there are other neurons with lower ranking that can still help trigger the back door. This kind of high redundancy was also noted in the previous work of DNN compression [29].

When the scheme in this paper is applied to other BadNets models, very similar experimental results are found in MNIST and PubFig, as shown in Figure 21. When pruning 10 to 30 percent of the neurons, the attack success rate can be reduced to 0 percent. However, we observed a greater negative impact on classification accuracy in YouTube faces, as shown in Figure 21. For YouTube faces, when the attack success rate dropped to 1.6%, the classification accuracy dropped from 97.55% to 81.4%. This is due to the fact that there are only 160 output neurons from the second layer to the last layer, which means that clean neurons and antagonistic neurons are mixed together, thus allowing clean neurons to be pruned in the process, thus reducing classification accuracy. In this paper, pruning experiments are carried out at multiple levels and it is found that pruning at the last convolutional layer produces the best results. In all four BadNets models, the attack success rate is reduced to less than 1% and the minimum classification accuracy is reduced to less than 0.8%. At the same time, up to 8% of the neurons were pruned, and figure 22 in the appendix plots these detailed experimental results.

Neuron pruning in a Trojan Horse model. In the Trojan horse model, this paper uses the same pruning method and configuration, but the pruning effect is poor. As shown in Figure 10, when pruning 30% of the neurons, the attack success rate of reverse-engineered triggers decreased to 10.1%, but the success rate of using the original trigger remained high at 87.3%. This difference is due to the difference in neuron activation between the reverse and original triggers. If neuronal activation is less than ideal at matching reverse-engineered triggers with primitive triggers, this results in poor pruning in attacks using primitive triggers. The undo learning experiment on a Trojan Horse attack, described in the next section, works much better.

Advantages and limitations. An obvious advantage is that the method requires very little computation, most of which involves running clean and inferring against images. However, its performance depends on choosing the right layer to prune the neuron, requiring experiments on multiple layers. In addition, it has high requirements on the matching degree of reverse flip – flop and original flip – flop.

C. Repair DNN by undo learning

The second mitigation method is to train DNN by canceling learning, thus canceling the original trigger. Reverse triggers can be used to train the infected neural network to recognize the correct label, even when the trigger is present. In contrast to neuronal pruning, Unlearning allows models to be trained to determine which non-neuronal weights are problematic and should be updated.

For all models including the Trojan Horse model, the updated training data set was used to fine-tune the model for only one full sample training (Epoch). To create this new training set, you need a sample of 10% of the original training data (clean and without triggers) and add reverse triggers for 20% of the sample without modifying the label. To measure the effectiveness of the patch, we measure the attack success rate of the original trigger and the classification accuracy of the fine-tuning model.

Table IV compares the attack success rate and classification accuracy before and after training. In all models, the attack success rate can be reduced to less than 6.70% without significantly affecting the classification accuracy. The largest decline in classification accuracy was found in GTSRB, which was only 3.6%. In some models, especially the Trojan horse attack model, the classification accuracy has been improved after repair. Note that the classification accuracy of the Trojan horse attack model decreases when the back door is injected, compared to 77.2% for the original uninfected Trojan horse attack model (not shown in Table IV), which was improved when the back door was patched.

This paper compares the effects of this Unlearning with two variants. First, 20% of the same training samples were retrained using raw triggers instead of reverse-engineered triggers. As shown in Table IV, undo learning using original triggers achieves lower attack success rates with similar classification accuracy. Therefore, reverse flip-flops are a good approximation to undo learning in the original way. Second, undo learning is compared with clean training data only and no additional triggers are used. The results in the last column of Table IV show that undo learning is ineffective for all BadNets models and the attack success rate remains high, greater than 93.37%. But it is efficient for Trojan attack models, and the success rate of Trojan horse squares and Trojan Horse watermarks is reduced to 10.91% and 0% respectively. The results show that the Trojan attack model is more sensitive to the highly targeted retuning of specific neurons and to undo learning. It helps reset clean inputs from several key neurons and disables attacks. Instead, BadNets inject the back door by updating all layers with a poison data set, which seems to require more work time to retrain and mitigate the back door. This paper examines the effect of fixing false positive tags on Youtube faces and Trojan Horse squares (in section B of Part V), which reduce classification accuracy by less than 1%. Therefore, the impact of false positives in mitigation tests can be ignored.

Parameters and costs. Experiments show that undo learning performance is usually insensitive to parameters such as the amount of training data and the ratio of modified training data.

Finally, revocation learning has a higher computational cost than neuronal pruning. However, it is still one to two orders of magnitude smaller than the original retraining model. The experimental results in this paper show that undo learning obviously provides the best mitigation performance compared with the alternative.

VII. Robustness of advanced rear doors

The previous section described and evaluated the detection and mitigation of backdoor attacks based on base case assumptions, e.g., fewer triggers, each priority stealth, and locating arbitrary input error categories into a single target tag. Here, this article explores many more complex scenarios and evaluates the effectiveness of each defense mechanism through possible experiments.

This article discusses five specific types of advanced backdoor attacks, each challenging assumptions or limitations in current defense designs.

Complex triggers. The detection scheme in this paper depends on the success of the optimization process. Does a more complex flip-flop make it harder for the optimization function to converge?
Larger flip-flop. By increasing the trigger size, an attacker can force reverse engineering convergence to a larger trigger with a larger norm.
Multiple infected tags with different triggers. Consider a scenario where multiple backdoors for different tags are inserted into a single model to evaluate the maximum number of infected tags detected.
A single infected label with multiple triggers. Consider multiple triggers for the same tag.
A (partial) back door specific to the source tag. The detection scheme in this paper is to detect triggers that cause misclassification on any input. A “partial” backdoor that is valid for input from a subset of the source tag will be more difficult to detect.

A. Complex triggering mode

As we observed in the Trojan Horse model, the optimization of triggers with more complex patterns is harder to converge. A more random trigger pattern might make it harder to reverse-engineer triggers.

This article performs a simple test by first changing the white square trigger to a noisy square, where each triggered pixel is assigned a random color. Inject backdoor attacks into MNIST, GTSRB, YouTube Face, and PubFig and evaluate their performance. The anomaly index generated in each model is shown in Figure 11. Our techniques detected complex trigger patterns in all cases and tested our mitigation techniques on these models. For filtering, FNR of all models is less than 0.01% when FPR is 5%. Patch uses undo learning to reduce the attack success rate to less than 4.2% and up to 3.1% of classification accuracy. Finally, backdoors in GTSRB with different trigger shapes (e.g. triangle, checkerboard) were tested, and all detection and mitigation techniques worked as expected.

B. Larger trigger

Larger triggers may produce larger reverse engineered triggers. This helps infected labels to be closer to uninfected labels in the L1 standard, making anomaly detection less effective. Sample tests were performed on the GTSRB, increasing the size of the triggers from 4×4 (1.6% of the image) to 16×16 (25%), with all triggers still white squares. In this paper, the detection techniques using the same structure in previous experiments are evaluated. Figure 12 shows the L1 norm for reverse triggers of infected and uninfected labels. When the original flip flop becomes larger, the reverse flip flop becomes larger as expected. When the trigger exceeds 14×14, the L1 norm is mixed with the uninfected label, so that the abnormal index is reduced below the detection threshold. The abnormal index index is shown in Figure 13.

The maximum detectable trigger size depends largely on one factor: the trigger size of the uninfected tag (the amount of change required to cause all input misclassification between the uninfected tags). The trigger size of uninfected tags is itself a proxy for measuring input differences between different tags, that is, more tags means that uninfected tags require larger trigger sizes, and detecting larger triggers requires greater capacity. In the Youtube Face application, up to 39% of the triggers for the entire image were detected. On less labeled MNIST, we can only detect up to 18% of image-size triggers. In general, a larger trigger is visually more obvious and easier for humans to recognize. However, there may be ways to increase the size of triggers, which are less obvious and will be explored in future work.

C. Multiple infected labels with different triggers

The experiment considers a scenario in which an attacker inserts multiple independent backdoors into a single model, each targeting a different tag. For many LTS in L, the insertion of a large number of backdoors may be reduced together. This makes the effect of any single trigger smaller than the outlier and harder to detect the net effect. The compromise is that models are likely to have the “maximum capacity” to learn back doors while maintaining their classification.

Experiment by generating unique triggers with mutually exclusive color patterns. We found that most models, namely MNIST, GTSRB and PubFig, have sufficient capability to support the triggers of each output tag without affecting the accuracy of classification. But on youtubers’ faces, there were 1,283 tags, and once triggers infected more than 15.6 percent of the tags, the average attack success rate dropped significantly. As shown in Figure 14, the average attack has a lower success rate due to too many triggers, which also confirms our previous guess.

Evaluate defenses against multiple different backdoors in the GTSRB. As shown in Figure 15, once more than eight tags (18.6%) are backdoor infected, it becomes difficult for exception detection to recognize the impact of the trigger. The results show that MNIST can detect up to 3 kinds of tags (30%), YouTube face can detect 375 kinds of tags (29.2%), PubFig can detect 24 kinds of tags (36.9%).

Although the outlier detection approach failed in this case, the underlying reverse engineering approach still worked. The correct trigger was successfully reverse-engineered for all infected tags. Figure 16 shows the triggering L1 specification for infected and uninfected labels. All infected labels have a smaller norm than uninfected labels. Further manual analysis verified that the reverse trigger looked visually similar to the original trigger. Conservative defenders can manually check the reverse triggers and determine the suspicious nature of the model. Subsequent tests have shown that pre-emptive “fixes” can successfully reduce potential backdoors. When all tags in the GTSRB are infected, using reverse triggers to patch all tags reduces the average attack success rate to 2.83%. Active patching provides similar benefits for other models. Finally, in all BadNets models, filtering can also effectively detect adversarial inputs with low FNR at an FPR of 5%.

D. Single infected label with multiple triggers

Consider a case where multiple different triggers cause a misclassification of the same label. In this case, the detection technique in this article might only detect and patch an existing trigger. To do this, nine white 4×4 square triggers are injected into the same target tag in the GTSRB. These triggers have the same shape and color, but are located at different locations in the image, namely four corners, four edges, and the middle. This attack achieves a success rate of more than 90% against all triggers.

The detection and repair results are shown in Figure 17. As previously suspected, a single run of the detection technique in this article identified and patched only one injection trigger. Fortunately, it only takes three iterations of the detection and patching algorithm to reduce the success rate of all triggers, in turn, to less than 5%. The experiment was also tested on other MNIST, Youtube Faces, and PubFig, and the attack success rate for all triggers was reduced to less than 1%, less than 5%, and less than 4%.

E. Source label (partial) rear door

In Part 2, this article defines a backdoor as a hidden mode that may misclassify arbitrary input from any tag into the target tag. Detection schemes are designed to find these “complete” back doors, and weak “partial” back doors can be designed so that triggers trigger misclassification only when applied to inputs belonging to a subset of the source tag, and do nothing when applied to other inputs. Detecting this backdoor with our existing methods will be a challenge.

The detection part of the back door needs to modify our detection scheme slightly. This article analyzes all possible source tag and target tag pairs, rather than reverse-engineering triggers for each target tag. For each tag pair, a sample belonging to the source tag is used to solve the optimization problem. The resulting reverse flip-flop is only valid for a specific tag pair. Then, by comparing the L1 norm of the triggers of different pairs, the same outlier detection method can be used to identify particularly vulnerable tag pairs and behave as anomalies by injecting a backdoor to MNIST against a source tag and a target tag pair. While the injection backdoor worked well, newer detection and mitigation techniques were successful. Analyzing all source and target tag pairs increases the computational cost of detection, where N represents the number of tags. However, divide-and-conquer can be used to reduce the computational cost to the order of logarithm N, and detailed evaluation will be carried out in future work.

VIII. Related work

Traditional machine learning assumes that the environment is benign, but opponents will violate that assumption during training or testing. Additional backdoor attacks and defenses. In addition to the attack mentioned in section 2, Chen et al. proposed a more strict attack mode of backdoor attack, in which the attacker can only pollute a limited part of the training set [17]. Another job was to directly tamper with the hardware DNN ran on references [30] and [31], such a backdoor circuit would also change the performance of the model when a trigger appeared.

Poison attack. Poisoning attacks contaminated the training data and changed the behavior of the model. Unlike backdoor attacks, poison attacks do not rely on triggers and change the model’s performance over a clean set of samples. The defense against poisoning attack mainly focuses on purifying training sets and removing poisoning samples [32], [33], [34], [35], [36], [37]. The assumption is to find samples that can significantly change the performance of the model [32], which has proved to be less effective against backdoor attacks [17] because injected samples do not affect the performance of the model on clean samples. Again, this is not practical in this article’s attack model because the defender does not have access to the poisoned training set.

Other hostile attacks against DNNs. Many non-backdoor adversarial attacks have been proposed against DNN in general, often with subtle modifications to the image leading to classification errors. In references [38], [39], [40], [41], [42], these methods can be applied to DNNs. Some defense measures have been proposed in literature [43], [44], [45], [46] and [47], but the performance of adaptive countermeasures has been proved to be low in literature [48], [49], [50] and [51]. Some recent work has attempted to create generalized perturbations, which would trigger the misclassification of multiple images in uninfected DNN [52], [53]. This series of work considers different threat models, assuming an uninfected victim model, which is not the target scenario for this paper’s defense.

IX. The conclusion

This work describes and verifies the robustness and versatility of our deep neural network against backdoor (Trojan horse) attacks, and proposes detection and mitigation tools. In addition to the basic and complex backdoor defense effects, one of the unexpected benefits of this article is the significant difference between the two backdoor injection methods: flip-flop driven BadNets have full access to the end-to-end attack of model training, and neuron driven Trojan attacks do not. Through experiments, we find that Trojan horse injection methods usually increase unnecessary disturbances and bring unpredictable changes to non-target neurons. This makes their triggers harder to reverse-engineer and makes them more resistant to filtering and neuronal pruning. But the trade-off is that their focus on specific neurons makes them extremely sensitive to the mitigating effects of revocation learning. Instead, BadNets introduce more predictable changes into neurons and can be more easily reverse-engineered, filtered, and mitigated by neuron pruning.

Finally, while the results in this article are robust against a range of attacks in different applications, there are limitations. The first is the problem of generalization beyond the current visual domain. Our high degree of conjecture and design for detection and mitigation methods can be summarized as follows: detection assumes that infected tags are more vulnerable than uninfected tags, and that this should be domain independent. The main challenge in adapting the entire pipeline to the non-visual domain is to develop a backdoor attack process and design a metric to measure the vulnerability of a particular tag (e.g., Formulas 2 and 3). Second, the scope for potential countermeasures for attackers may be large. This paper examines 5 different countermeasures against different components/hypotheses of our defences, but further exploration of other potential countermeasures remains a part of future work.

Click to follow, the first time to learn about Huawei cloud fresh technology ~