In the search field, there has already been the related function of “search similar pictures/similar goods”, such as Google search map, Baidu search map, Taobao photo search goods and so on. To achieve similar calculation of image similarity function, in addition to the use of lofty sound “artificial intelligence”, in fact, through JS and several simple algorithms, can also achieve similar effects.

Before reading this article, it is strongly recommended to read ruan Yifeng’s related article “Principle of Similar Image Search” written in many years, from which the algorithm involved in this article also comes.

Visit img-compare.netlify.com/

Feature extraction algorithm

To facilitate understanding, each algorithm will go through two steps of “feature extraction” and “feature comparison”. Next, the “feature extraction” steps of each algorithm will be explained in detail, while “feature comparison” will be explained separately.

Average hash algorithm

Refer to Ruan Da’s article, “average hash algorithm” mainly consists of the following steps:

The first step is to reduce the size to 8×8, so as to remove the details of the picture and only retain the basic information such as structure, light and shade, and get rid of the picture differences caused by different sizes and proportions.

Step 2: Simplify colors. Turn the reduced image into grayscale image.

Step three, calculate the average. Calculate the gray mean of all pixels.

The fourth step is to compare the gray level of pixels. The gray levels of 64 pixels are compared with the mean values. Greater than or equal to the average, 1; It’s less than the average. Let’s call that 0.

Step 5, calculate the hash value. The results of the previous step are combined to form a 64-bit integer, which is the fingerprint of this image.

The sixth step is to calculate the difference of the hash values to obtain the similarity (Hamming distance or cosine value).

After understanding the principle and steps of “average hash algorithm”, you can start coding. To make the code more readable, I’ll use typescript for all the examples in this article.

Image compression:

We use the drawImage() method of Canvas to achieve image compression, and then use the getImageData() method to obtain the ImageData object.

export function compressImg (imgSrc: string, imgWidth: number = 8) :Promise<ImageData> {
  return new Promise((resolve, reject) = > {
    if(! imgSrc) { reject('imgSrc can not be empty! ')}const canvas = document.createElement('canvas')
    const ctx = canvas.getContext('2d')
    const img = new Image()
    img.crossOrigin = 'Anonymous'
    img.onload = function () { canvas.width = imgWidth canvas.height = imgWidth ctx? .drawImage(img,0.0, imgWidth, imgWidth)
      constdata = ctx? .getImageData(0.0, imgWidth, imgWidth) as ImageData
      resolve(data)
    }
    img.src = imgSrc
  })
}
Copy the code

Some readers may ask why canvas can be used to achieve image compression. To put it simply, in order to draw a “large picture” onto a “small canvas”, some adjacent pixels with similar colors are often deleted, which effectively reduces the information of the image and thus achieves compression:

In the above compressImg() function, we load the Image with new Image() and set a default Image width and height to compress the Image to a specified size. Finally, we get the compressed image’s ImageData — which means we can get every pixel of the image.

For ImageData, refer to the DOCUMENTATION for MDN.

Image graying

In order to transform color images into grayscale images, we must first understand the concept of “grayscale”. Wikipedia describes grayscale images as follows:

In computing, a Gray scale digital image is an image with only one sample color per pixel.

In most cases, any color can be made up of the brightness of three color channels (R, G, B) and one color space (A), and only one color is displayed per pixel, hence the “pixel => RGBA” correspondence. However, “each pixel has only one sampling color” means that the brightness of the three primary color channels comprising this pixel is equal, so it only needs to calculate the average value of RGB:

// Generate ImageData from the RGBA array
export function createImgData (dataDetail: number[]) {
  const canvas = document.createElement('canvas')
  const ctx = canvas.getContext('2d')
  const imgWidth = Math.sqrt(dataDetail.length / 4)
  constnewImageData = ctx? .createImageData(imgWidth, imgWidth)as ImageData
  for (let i = 0; i < dataDetail.length; i += 4) {
    let R = dataDetail[i]
    let G = dataDetail[i + 1]
    let B = dataDetail[i + 2]
    let Alpha = dataDetail[i + 3]

    newImageData.data[i] = R
    newImageData.data[i + 1] = G
    newImageData.data[i + 2] = B
    newImageData.data[i + 3] = Alpha
  }
  return newImageData
}

export function createGrayscale (imgData: ImageData) {
  const newData: number[] = Array(imgData.data.length)
  newData.fill(0)
  imgData.data.forEach((_data, index) = > {
    if ((index + 1) % 4= = =0) {
      const R = imgData.data[index - 3]
      const G = imgData.data[index - 2]
      const B = imgData.data[index - 1]

      const gray = ~~((R + G + B) / 3)
      newData[index - 3] = gray
      newData[index - 2] = gray
      newData[index - 1] = gray
      newData[index] = 255 // Set Alpha to 255}})return createImgData(newData)
}
Copy the code

Imagedata. data is a Uint8ClampedArray array, can be understood as “RGBA array”, the value of each number in the array is 0~255, every 4 numbers are a group, representing the RGBA value of a pixel. Since ImageData is a read-only object, write a separate creaetImageData() method that uses Context.createImageData () to create a new ImageData object.

After getting the gray image, you can carry out the operation of fingerprint extraction.

Fingerprint extraction

In the “average hash algorithm”, if the gray value of a pixel in the gray map is greater than the average value, it is regarded as 1; otherwise, it is 0. Put these pieces of information together and you have a fingerprint. Now that we have the grayscale image’s ImageData object, it is easy to extract the fingerprint:

export function getHashFingerprint (imgData: ImageData) {
  const grayList = imgData.data.reduce((pre: number[], cur, index) = > {
    if ((index + 1) % 4= = =0) {
      pre.push(imgData.data[index - 1])}return pre
  }, [])
  const length = grayList.length
  const grayAverage = grayList.reduce((pre, next) = > (pre + next), 0) / length
  return grayList.map(gray= > (gray >= grayAverage ? 1 : 0)).join(' ')}Copy the code


Through the above series of steps, we can obtain the fingerprint information of an image through the “average hash algorithm” (an example is an 8×8 grayscale image) :

Perceptual hashing algorithm

For a detailed introduction to “perceptual Hashing algorithm”, please refer to this article: Visual Object Tracking Based on Perceptual Hashing Algorithm.

To put it simply, after the discrete cosine transform, the algorithm transforms the image from pixel domain to frequency domain, and the low-frequency components carrying effective information are concentrated in the upper left corner of the DCT matrix, so we can use this feature to extract the image features.

The steps of the algorithm are as follows:

  • Reduce size: pHash starts with a small image, but the image is larger than 88, 3232 is the best. The goal is to simplify DCT calculations rather than reduce the frequency.
  • Simplified color: Transform the picture into grayscale image to further simplify the calculation.
  • Calculate DCT: Calculate the DCT transformation of the image and get the DCT coefficient matrix of 32*32.
  • Reduced DCT: Although the DCT result is 32Size 32 matrix, but we just keep the 8 in the top leftThe matrix of eight, which represents the lowest frequencies in the picture.
  • Calculate the mean: Calculate the mean of the DCT, just as with the mean hash.
  • Compute the hash value: This is the most important step. Based on the 8*8 DCT matrix, set the 64-bit hash value of 0 or 1 to “1” for those greater than or equal to the DCT mean, and “0” for those less than the DCT mean. Together, they form a 64-bit integer, which is the fingerprint of this image.

Going back to the code, first add a DCT method:

function memoizeCosines (N: number, cosMap: any) {
  cosMap = cosMap || {}
  cosMap[N] = new Array(N * N)

  let PI_N = Math.PI / N

  for (let k = 0; k < N; k++) {
    for (let n = 0; n < N; n++) {
      cosMap[N][n + (k * N)] = Math.cos(PI_N * (n + 0.5) * k)
    }
  }
  return cosMap
}

function dct (signal: number[], scale: number = 2) {
  let L = signal.length
  let cosMap: any = null

  if(! cosMap || ! cosMap[L]) { cosMap = memoizeCosines(L, cosMap) }let coefficients = signal.map(function () { return 0 })

  return coefficients.map(function (_, ix) {
    return scale * signal.reduce(function (prev, cur, index) {
      return prev + (cur * cosMap[L][index + (ix * L)])
    }, 0)})}Copy the code

Then two matrix processing methods are added, namely, raising the dimension of the one-dimensional array generated by DCT method into a two-dimensional array (matrix), and obtaining its “upper left corner” content from the matrix.

// One dimensional array is raised
function createMatrix (arr: number[]) {
  const length = arr.length
  const matrixWidth = Math.sqrt(length)
  const matrix = []
  for (let i = 0; i < matrixWidth; i++) {
    const _temp = arr.slice(i * matrixWidth, i * matrixWidth + matrixWidth)
    matrix.push(_temp)
  }
  return matrix
}

// Get the contents of the "upper left" of the matrix in range by range
function getMatrixRange (matrix: number[][], range: number = 1) {
  const rangeMatrix = []
  for (let i = 0; i < range; i++) {
    for (let j = 0; j < range; j++) {
      rangeMatrix.push(matrix[i][j])
    }
  }
  return rangeMatrix
}
Copy the code

By reusing the grayscale transformation function createGrayscale() written in the “average hash algorithm”, we can obtain the eigenvalues of the “perceptual hash algorithm” :

export function getPHashFingerprint (imgData: ImageData) {
  const dctData = dct(imgData.data as any)
  const dctMatrix = createMatrix(dctData)
  const rangeMatrix = getMatrixRange(dctMatrix, dctMatrix.length / 8)
  const rangeAve = rangeMatrix.reduce((pre, cur) => pre + cur, 0) / rangeMatrix.length
  return rangeMatrix.map(val => (val >= rangeAve ? 1 : 0)).join(' ')}Copy the code

Color distribution method

First excerpt from Ruan Da’s description of “color distribution method” :

Ruan reduced the 256 color values to four. Based on this principle, when we design the algorithm of color distribution method, we can set the partition of this interval as modifiable, and the only requirement is that the number of intervals must be divisible by 256. The algorithm is as follows:

// Divide the color range. The default value is 4
// Simplify the 256 colors to 4
export function simplifyColorData (imgData: ImageData, zoneAmount: number = 4) {
  const colorZoneDataList: number[] = []
  const zoneStep = 256 / zoneAmount
  const zoneBorder = [0] // Interval boundary
  for (let i = 1; i <= zoneAmount; i++) {
    zoneBorder.push(zoneStep * i - 1)
  }
  imgData.data.forEach((data, index) = > {
    if ((index + 1) % 4! = =0) {
      for (let i = 0; i < zoneBorder.length; i++) {
        if (data > zoneBorder[i] && data <= zoneBorder[i + 1]) {
          data = i
        }
      }
    }
    colorZoneDataList.push(data)
  })
  return colorZoneDataList
}
Copy the code

After simplifying the color values, they can be grouped into different groups:

export function seperateListToColorZone (simplifiedDataList: number[]) {
  const zonedList: string[] = []
  let tempZone: number[] = []
  simplifiedDataList.forEach((data, index) = > {
    if ((index + 1) % 4! = =0) {
      tempZone.push(data)
    } else {
      zonedList.push(JSON.stringify(tempZone))
      tempZone = []
    }
  })
  return zonedList
}
Copy the code

Finally, we just need to count the total of each of the same groups:

export function getFingerprint (zonedList: string[], zoneAmount: number = 16) {
  const colorSeperateMap: {
    [key: string]: number
  } = {}
  for (let i = 0; i < zoneAmount; i++) {
    for (let j = 0; j < zoneAmount; j++) {
      for (let k = 0; k < zoneAmount; k++) {
        colorSeperateMap[JSON.stringify([i, j, k])] = 0
      }
    }
  }
  zonedList.forEach(zone= > {
    colorSeperateMap[zone]++
  })
  return Object.values(colorSeperateMap)
}
Copy the code

Content feature method

“Content feature method” refers to the method of converting an image into a grayscale image and then into a “binary image”, and then forming a fingerprint according to the value of pixels (black or white) for comparison. The core of this algorithm is to find a “threshold” to generate binary graphs.

For the generation of grayscale map, different from the method of taking RGB mean value mentioned in “average hashing algorithm”, we use weighted method to achieve it here. Why do you do that? There are some concepts involved in chromology.

For details, please refer to this article “Grayscale to RGB Conversion”.

The grayscale map with RGB mean value is the simplest method, but it ignores the wavelengths of red, green and blue colors and their influence on the overall image. Taking the figure below as an example, if the average value of RGB is directly obtained as gray scale, the gray scale image after processing will be dark on the whole, which will cause great interference to the subsequent generation of binary image.

So how can this situation be improved? The answer is to add different weights to the three RGB colors. Since red light has a longer wavelength and green light has a shorter wavelength and is less visually stimulating, we deliberately reduce the weight of red light and increase the weight of green light. According to statistics, the better weight ratio is R:G:B = 0.299:0.587:0.114.

So we can get the grayscale processing function:

enum GrayscaleWeight {
  R = 299.,
  G = 587.,
  B = 114.
}

function toGray (imgData: ImageData) {
  const grayData = []
  const data = imgData.data

  for (let i = 0; i < data.length; i += 4) {
    const gray = ~~(data[i] * GrayscaleWeight.R + data[i + 1] * GrayscaleWeight.G + data[i + 2] * GrayscaleWeight.B)
    data[i] = data[i + 1] = data[i + 2] = gray
    grayData.push(gray)
  }

  return grayData
}
Copy the code

The above function returns a grayData array with each element representing the grayscale value of a pixel (because RBG values are the same, only one value is needed). Next, “Otsu’s method” is used to calculate the threshold of the binary graph. Ruan da’s article has been very detailed about the “Dajin Method”, which will not be expanded here. This is where I found the Java implementation of Otsu and later modified it to the JS version:

/ OTSU algorithm
// rewrite from http://www.labbookpages.co.uk/software/imgProc/otsuThreshold.html
export function OTSUAlgorithm (imgData: ImageData) {
  const grayData = toGray(imgData)
  let ptr = 0
  let histData = Array(256).fill(0)
  let total = grayData.length

  while (ptr < total) {
    let h = 0xFF & grayData[ptr++]
    histData[h]++
  }

  let sum = 0
  for (let i = 0; i < 256; i++) {
    sum += i * histData[i]
  }

  let wB = 0
  let wF = 0
  let sumB = 0
  let varMax = 0
  let threshold = 0

  for (let t = 0; t < 256; t++) {
    wB += histData[t]
    if (wB === 0) continue
    wF = total - wB
    if (wF === 0) break

    sumB += t * histData[t]

    let mB = sumB / wB
    let mF = (sum - sumB) / wF

    let varBetween = wB * wF * (mB - mF) ** 2

    if (varBetween > varMax) {
      varMax = varBetween
      threshold = t
    }
  }

  return threshold
}
Copy the code

The OTSUAlgorithm() function receives an ImageData object, obtains the list of gray values through the toGray() method in the previous step, calculates the best threshold according to the otsu method and returns. This threshold is then used to process the original graph to obtain the binary graph.

export function binaryzation (imgData: ImageData, threshold: number) {
  const canvas = document.createElement('canvas')
  const ctx = canvas.getContext('2d')
  const imgWidth = Math.sqrt(imgData.data.length / 4)
  constnewImageData = ctx? .createImageData(imgWidth, imgWidth)as ImageData
  for (let i = 0; i < imgData.data.length; i += 4) {
    let R = imgData.data[i]
    let G = imgData.data[i + 1]
    let B = imgData.data[i + 2]
    let Alpha = imgData.data[i + 3]
    let sum = (R + G + B) / 3

    newImageData.data[i] = sum > threshold ? 255 : 0
    newImageData.data[i + 1] = sum > threshold ? 255 : 0
    newImageData.data[i + 2] = sum > threshold ? 255 : 0
    newImageData.data[i + 3] = Alpha
  }
  return newImageData
}
Copy the code

If the image size is N×N, we can obtain an N×N 0-1 matrix, namely fingerprint, according to the “black or white” characteristic of binary graph:

Feature matching algorithm

After different ways to obtain different types of picture fingerprints (features), how should be compared? Here we will introduce three comparison algorithms, and then analyze the situations in which they are applicable.

Hamming distance

Here’s an excerpt from wikipedia’s description of the Hamming Distance:

In information theory, the Hamming distance between two strings of equal length is the number of different characters in the corresponding positions of the two strings. In other words, it is the number of characters that need to be replaced to transform one string into another.

Such as:

  • The Hamming distance between 1011101 and 1001001 is 2.

  • The Hamming distance between 2143896 and 2233796 is 3.

  • The Hamming distance between “Toned” and “Roses” is 3.

Now that we know what this means, we can write the method for calculating the Hamming distance:

export function hammingDistance (str1: string, str2: string) {
  let distance = 0
  const str1Arr = str1.split(' ')
  const str2Arr = str2.split(' ')
  str1Arr.forEach((letter, index) = > {
    if(letter ! == str2Arr[index]) { distance++ } })return distance
}
Copy the code

Use the hammingDistance() method to verify the Wikipedia example:

The results were verified as expected.

If you know the Hamming distance, you can also know the similarity between two strings of equal length (the smaller the Hamming distance, the greater the similarity) :

Similarity = (String length - Hamming distance)/string lengthCopy the code

Cosine similarity

The definition of cosine similarity can be found in Wikipedia:

Cosine similarity measures the similarity between two vectors by measuring the cosine of the Angle between them. The cosine of an Angle of 0 degrees is 1, and the cosine of any other Angle is not greater than 1; And its minimum value is minus 1. The cosine of the Angle between the two vectors determines whether they point roughly in the same direction. When two vectors have the same direction, the cosine similarity value is 1. When the Angle between the two vectors is 90°, the cosine similarity value is 0. Cosine similarity is -1 when two vectors are pointing in opposite directions. It doesn’t depend on the length of the vector, it just depends on the direction the vector is pointing in. Cosine similarity is usually used in positive Spaces, so the value given is between 0 and 1.

Note that this upper and lower bound applies to vector Spaces of any dimension, and cosine similarity is most often applied to higher dimensional positive Spaces.

Cosine similarity can calculate the included Angle between two vectors, so as to intuitively indicate whether two vectors are similar in direction, which is very useful for calculating the similarity of two N×N 0-1 matrices. According to the cosine similarity formula, we can write its JS implementation:

export function cosineSimilarity (sampleFingerprint: number[], targetFingerprint: number[]) {
  / / cosine theta = ∑ n, I = 1 x Bi (Ai)/(square root ∑ n, I = 1 (Ai) ^ 2) x (square root ∑ n, I = 1 (Bi) ^ 2) = A, B / | | | x | B
  const length = sampleFingerprint.length
  let innerProduct = 0
  for (let i = 0; i < length; i++) {
    innerProduct += sampleFingerprint[i] * targetFingerprint[i]
  }
  let vecA = 0
  let vecB = 0
  for (let i = 0; i < length; i++) {
    vecA += sampleFingerprint[i] ** 2
    vecB += targetFingerprint[i] ** 2
  }
  const outerProduct = Math.sqrt(vecA) * Math.sqrt(vecB)
  return innerProduct / outerProduct
}

Copy the code

Applicable scenarios of two comparison algorithms

After understanding “Hamming distance” and “cosine similarity” these two feature comparison algorithms, we are going to see which feature extraction algorithms they are respectively applicable to the scene.

Let’s start with the color distribution method. In the “color distribution method”, we divide the colors of a graph into intervals and obtain features by counting the number of different color intervals. Then the eigenvalues here are related to “quantity”, which is a non-0-1 matrix.

Obviously, to compare the similarity of two “color distribution method” features, “Hamming distance” is not applicable, and can only be calculated by “cosine similarity”.

Next, look at “average hash algorithm” and “content feature method”. In terms of the results, both feature extraction algorithms can obtain an N×N 0-1 matrix, and the value of the elements in the matrix is independent of “quantity”, only 0-1. So they are also suitable for calculating similarity by hamming distance and cosine similarity.

Calculation accuracy

After understanding how to extract image features and how to compare, the most important thing is to understand the accuracy of their similarity calculation.

The similarity mentioned in this paper is only realized by objective algorithm, while it is a very subjective problem to judge whether two pictures are “like or not”. So I wrote a simple service that can independently calculate the similarity of two graphs according to different algorithms and precision:

img-compare.netlify.com/

After many comparisons of different materials, I came to the following very subjective conclusions.

  • For two pictures with rich colors and more details, the result of “color distribution method” is the most intuitive.
  • For two images with similar content but different colors, both the content feature method and the average/perceptual hash algorithm can give intuitive results.
  • For the “color distribution method”, the number of intervals has a great influence on the calculation results, so it is very important to select the appropriate intervals.

To sum up, the three feature extraction algorithms and the two feature comparison algorithms have their own advantages and disadvantages, and should be flexibly selected according to different situations in practical application.

conclusion

This article is after reading ruan yifeng’s two articles “the principle of similar picture search”, after summing up their own practice. As my understanding of color, mathematics and other fields is only superficial, there are inevitably fallacies in the article. If there are any incorrect expressions, please leave comments and point out, AND I will correct them in time.