Original link:tecdat.cn/?p=7563

Original source:Tuo End number according to the tribe public number

 

** Automatic discovery of nude images has been a central issue in computer vision for more than two decades, and because of its long history and immediate goals, it is a good example of how the field is evolving. In this blog post, I will use the naked detection problem to illustrate how the training of modern convolutional neural networks (ConvNets) differs from past research.

 

(Warning and disclaimer: This article contains visualizations of nudity for scientific purposes. If you are under 18 or offended by nudity, don’t read on.)

In 1996,

 

The seminal work in this field was the aptly named “Finding Naked People” by Fleck et al. Published in the mid-1990s, it provides a good example of the kind of work that computer vision researchers were doing before convolutional networks took over.

In 2014,

Instead of designing formal rules to describe how input data should be represented, deep learning researchers have designed network architectures and data sets so that AI systems can learn representations directly from the data. But because deep learning researchers do not specify exactly how networks behave with a given input, a new problem arises: how to understand what convolutional networks are activating?

 

Understanding the operation of convolutional networks requires the interpretation of element activities at various levels. In the remainder of this article, we examine an earlier version of the NSFW model by mapping activities down from the top level into the input pixel space. This will enable us to see what input pattern caused a given activation in the function diagram in the first place (that is, why the image was labeled “NSFW”).

Tactile sensitivity

To build the heat map on the left side, we sent each window to convNet and averaged the “NSFW” score for each pixel. Convolutional networks tend to predict “NSFW” when they see a crop full of skin, which results in a large red area on Lena’s body. To create a heat map on the right, we systematically blotted out a portion of the original image and reported 1 minus the average “NSFW” score (i.e., the “SFW” score). When most NSFW areas are obscured, the “SFW” score increases and we see higher values in the heat map. For the sake of clarity, the following figure illustrates which images are fed into the convolutional network respectively in the above two experiments:

 

One of the advantages of these occlusion experiments is that they can be performed when the classifier is a complete black box. Here is a code snippet that reproduces these results through our API:

# NSFW occulsion experiment

from StringIO import StringIO

import matplotlib.pyplot as plt
import numpy as np
from PIL import Image, ImageDraw
import requests
import scipy.sparse as sp

from clarifai.client import ClarifaiApi

CLARIFAI_APP_ID = '... '
CLARIFAI_APP_SECRET = '... '
clarifai = ClarifaiApi(app_id=CLARIFAI_APP_ID,
                       app_secret=CLARIFAI_APP_SECRET,
                       base_url='https://api.clarifai.com')

def batch_request(imgs, bboxes) :
  """use the API to tag a batch of occulded images"""
  assert len(bboxes) < 128
  #convert to image bytes
  stringios = []
  for img in imgs:
    stringio = StringIO()
    img.save(stringio, format='JPEG')
    stringios.append(stringio)
  #call api and parse response
  output = []
  response = clarifai.tag_images(stringios, model='NSFW - v1.0')
  for result,bbox in zip(response['results'], bboxes):
    nsfw_idx = result['result'] ['tag'] ['classes'].index("sfw")
    nsfw_score = result['result'] ['tag'] ['probs'][nsfw_idx]
    output.append((nsfw_score, bbox))
  return output

 
Copy the code

Although these types of experiments provide a simple way to display the output of a classifier, one drawback is that the resulting visualizations are often fuzzy. This prevents us from gaining meaningful insight into how the network actually works.

Deconvolution network

After training the network on a given data set, we hope to be able to take images and courses, and put forward similar requirements to the convolutional network:

 

Here’s how we visualized Lena photos when using deconvnet (note: deconvnet used here requires a square image to work properly – we padded the full Lena image to get the correct appearance ratio) :

 

Based on our deconvnet, we can embellish Barbara by adding red to make it look more PG:

 

This picture of Honey Rider, played by Hors Rider from The James Bond film Doctor No, was voted number one in a 2003 British survey of the 100 Greatest Sex Moments of Screen History:

 

A remarkable feature of the above experiments is that the convolutional neural network learned red lips and belly button, representing “NSFW”. This probably means we didn’t include enough red lips and belly button images in our “SFW” training data. If we had evaluated the model only by checking the accuracy/recall rate and ROC curve (as shown below – test set size: 428,271), we would never have found this problem because our test data had the same weakness. This highlights a fundamental difference between training rule-based classifiers and modern AI research. Rather than design functionality manually, redesign the training data until the functionality found improves.