Please mark the source of reprint.
background
In some image-related projects, repeated graph recognition is very important. Take heat map leaderboards (which need to identify duplicates); Projects involving deep learning of images (training data need to eliminate duplicate graphs); Original images & original videos (need to recognize duplicate images), etc.
What is the same picture
What is the same picture? I believe the answer is different in different scenarios. Some scenes treat images that look the same to the naked eye as identical, some scenes treat filtered images as identical, and some scenes treat only the original as identical. Here it is divided according to the same degree. From high to low, the same degree can be divided into 3 levels:
- Absolutely original
- The same to the naked eye
- From the original image
Let’s take a closer look at each of these three categories.
Absolutely original
At this level, the degree of sameness of the images is the highest, as shown in the following 2 pictures: 1.png generated by direct copyIt is impossible to determine whether they are the original images from the picture content, and can only be identified from the perspective of files. Generally speaking, two images are directly judged by MD5, as shown below: It belongs to hash with image files.
PS: Generally speaking, all scenes will be filtered by MD5 first, because its algorithm complexity is very low, there is no need to understand the picture
The same to the naked eye
This level has the most scenes, such as image training data removal, heat map leaderboards and so on. As shown in the figure below, 1.png is generated by compression, resize, transcoding and other image processing methods:They look the same to the naked eye, but definitely not the original image, md5 can’t recognize this, only imagePerception of the hashTo deal with. There are three main types of perceptive hashes (AHash, DHash, PHash, and WHash), all of which areHash the image content, but in a different way, the following waves are introduced one by one:
AHash
This is the simplest and least algorithmic hash, requiring only 2 steps of preprocessing + binarization.
- The specific flow chart is as follows:
Its binarization method is relatively simple, just compare the pixels with the mean, so the effect is not so bad.
- Python source code is as follows:
def ahash(image, hash_size=8) :
image = image.convert("L").resize((hash_size, hash_size), Image.ANTIALIAS)// 1Resize pixels = numpy.asarray(image) AVg = np.mean(pixels)//2Diff = pixels > avg //3, [binarization] greater than the mean is1Is less than or equal to the mean0
return diff
Copy the code
DHash
The complexity of this perceptive Hash is also very low. The key point is that it has a better effect than AHash, mainly because its binarization takes into account the difference between adjacent pixels and the algorithm is more robust. (Of course this is just an idea, we can also compare the size of fixed 2 pixels, each pixel has a corresponding pixel). The algorithm flow chart is as follows (similar to AHash, but different in binarization) :
- Python source code is as follows:
def dhash(image, hash_size=8) :
image = image.convert("L").resize((hash_size + 1, hash_size), Image.ANTIALIAS)// 1Resize pixels = numpy.asarray(image) diff = pixels[:,1:] > pixels[:, :-1] / /2, [binarization] adjacent2Element comparison, the right is greater than the left is1The right-hand side is less than or equal to the left-hand side0. (You can also change it to up and down2Comparison of elements, or fixation2Comparison between elements)return diff
Copy the code
Phash
Phash has a better target effect. It introduces DCT transform to remove the high-frequency information in the picture and focus on the low-frequency information, because the human eye is not very sensitive to details. See the specific algorithm principle[PHash] more understanding of human eye perception hashing. There are many variations of Phash, but the best one is shown below. Its algorithm flow chart is as follows:
- Python source code is as follows:
def phash(image, hash_size=8, highfreq_factor=4) :
import scipy.fftpack
img_size = hash_size * highfreq_factor
image = image.convert("L").resize((img_size, img_size), Image.ANTIALIAS)// 1DCT = scipy.fftpack. DCT (scipy.fftpack. DCT (Pixels, axis=0), axis=1Dctlowfreq = DCT [:hash_size, :hash_size] //2Med = numpy.median(dctlowfreq) // Diff = dCTlowfreq > med //3, [binarization] is greater than the median value1Is less than or equal to the median value is0
return diff
Copy the code
WHash
WHash is a bit better than PHash, but it’s a bit more complex. It uses wavelet transform to separate low frequency and high frequency information, so as to obtain low frequency information. But one advantage it has over Phash is that it also preserves spatial information about the original image. See the specific algorithm principle[WHash] more spatially aware hash. The WHash flow chart is as follows:Attached below is the source code, the code is very short, can also be ignored:
- Python source code is as follows:
def whash(image, hash_size = 8) :
#check
assert hash_size & (hash_size-1) = =0."hash_size is not power of 2"
image_scale = max(2**int(numpy.log2(min(image.size))), hash_size)
ll_max_level = int(numpy.log2(image_scale))
level = int(numpy.log2(hash_size))
assert level <= ll_max_level, "hash_size in a wrong range"
# pretreatment
image = image.convert("L").resize((image_scale, image_scale), Image.ANTIALIAS)
pixels = numpy.asarray(image) / 255.
# Wavelet transform, Haar
coeffs = pywt.wavedec2(pixels, 'haar', level = ll_max_level)
# Remove the lowest frequency
coeffs[0] * =0
# Inverse wavelet transform
dwt_low = pywt.waverec2(coeffs[:level+1].'haar')
# binarization, median
med = numpy.median(dwt_low)
diff = dwt_low > med
return diff
Copy the code
From the original image
There are a lot of these scenarios, and each one has its own unique requirements. For example, in the original video project of a video platform, if the methods of adding filters, changing audio and cropping are judged to be the same image, perceptual Hash is no longer applicable, and image deep learning must be used. In general, you don’t need a strong model, but you do need to train specific scenes, such as filters, logos, black edges, etc. The filter is shown below. 1. PNG goes through a filter to produce 4.Another scene is video de-duplication in the game field. Since the game background is the same, only a small piece of character or name is different, which also needs targeted data training.
Here, deep learning MoCo might be appropriate.
conclusion
Repetition graphs are basically used in image-related projects, and different algorithms are used for different scenes.
The complexity of the | Applicable scenario | |
---|---|---|
MD5 | Super low | Absolutely original |
Perception of the Hash | low | The same to the naked eye |
Deep learning | high | Same specific scenario |