background
One simple and quick algorithm to check for similarities between the two images is Perceptual Hash, which calculates a fingerprint (Hash) for each image in a way of extracting features, making it a matter of comparing similarities and differences.
implementation
Step1. Reduce the size
Shrink the image down to an 8 by 8 size. This removes details from the image, leaving only basic information such as structure and light and shade, while eliminating the differences between images caused by different sizes and proportions.
Step2. Gray processing
Turn the reduced image into a 64-level grayscale image (with only 64 colors per pixel).
Step3. Calculate the average value
Calculate the gray mean of all 64 pixels.
Step4. Calculate the hash
Here, the calculation method of hashing is as follows: compare the gray values of the 64 pixels mentioned above with the average value. If the gray values are greater than or equal to the average value, they are recorded as 1, and if the gray values are less than 0.
The comparison results for each pixel are combined into a 64-bit binary integer, which is the fingerprint of the image.
Step5. Compare hashes
The way to compare different images is to compare how many bits in their 64-bit hashes differ (hamming distance). Generally speaking, if the number of different digits is no more than 5, the two images are very similar, and if it is greater than 10, it is very likely that the two images are different.
Code (Python)
Compute pHash (only three lines) :
def phash(img):
img = img.resize((8, 8), Image.ANTIALIAS).convert('L')
avg = reduce(lambda x, y: x + y, img.getdata()) / 64.
return reduce(
lambda x, (y, z): x | (z << y),
enumerate(map(lambda i: 0 if i < avg else 1, img.getdata())),
0
)
Copy the code
Calculate hamming distance:
def hamming_distance(a, b):
return bin(a^b).count('1')
Copy the code
Calculate whether two images are similar:
def is_imgs_similar(img1,img2):
return True if hamming_distance(phash(img1),phash(img2)) <= 5 else False
Copy the code
In the calculation part, lambda expression and Reduce are used. Please refer to this article: Liao Xuefeng: Map and Reduce