When Calvin

Zelda players will not be unfamiliar, Calvin is the game the legend of zelda breath of the desert “, a fictional text, in card family can find its shadow on the building, before is Calvin is generated and implemented a simple translation tools, but the realization of key word parsing is not elegant, use the hidden watermark to some of the key information hidden in the export of pictures, The hidden information of the compressed image is easily lost, resulting in parsing failure. For those of you who are interested, check out the previous article: Touch a Zelda Hikaran Word Converter.

Behind the study of the technical implementation of OCR, hand lifted a simple version of the Hiccup word OCR parser, simple pull implementation, the level of limited hope pointing clam ~

Optical Character Recognition (OCR) refers to the process of analyzing and recognizing image files of text data and obtaining text and layout information.

The tool address is here:

  • Tool demo address in this: kinglisky. Making. IO/zelda – words
  • nlush.com/zelda-words…
  • Warehouse address: github.com/kinglisky/z…

Fictional characters are often based on real characters, with the Hieroglyph and alphanumeric corresponding to four special symbols (a total of 40 characters). The rules are simple, as shown below:

Our exported Hickavan image looks like this:

Come on ~

Image binarization

The color and text arrangement of the picture we exported from The Xikavan are uncertain, so we need to normalize the picture first, because we only care about the text content of the picture, so we need to eliminate the interference of color first, and we can uniformly process the picture with black and white tone.

This process is called binarization. After binarization, the image can eliminate interference and highlight the content features of the image. After binarization, the image can be easily serialized to generate image fingerprint.

Binarization is the simplest method for image segmentation. Binarization can transform gray image into binary image. The pixel gray value greater than a critical gray value is set as gray maximum value, the pixel gray value less than this value is set as gray minimum value, so as to achieve binarization.

The main process of image binarization is as follows:

  • Image grayscale processing
  • The binarization threshold of gray image is calculated
  • Image binarization

Image grayscale processing

Taking the above picture as an example, the gray processing of the picture is relatively simple. The gray value can be obtained by adding the RGB channel colors according to the ratio of R * 0.299 + G * 0.587 + B * 0.114, because the RGB channel values of the gray picture are the same. We just take the value of one channel for the next step.

const canvasToGray = (canvas) = > {
    const ctx = canvas.getContext('2d');
    const data = ctx.getImageData(0.0, canvas.width, canvas.height);
    const calculateGray = (r, g, b) = > parseInt(r * 0.299 + g * 0.587 + b * 0.114);
    const grayData = [];
    for (let x = 0; x < data.width; x++) {
        for (let y = 0; y < data.height; y++) {
            const idx = (x + y * data.width) * 4;
            const r = data.data[idx + 0];
            const g = data.data[idx + 1];
            const b = data.data[idx + 2];
            constgray = calculateGray(r, g, b); grayData.push(gray); }}return grayData;
};
Copy the code

The image after gray processing is as follows:

Binarization threshold

Threshold calculation is a very key step in image binarization, and there are many related algorithms. Here, we first try the simplest aHash algorithm. The algorithm is very simple, calculate the sum of gray pixels of the image and divide by the number of pixels to get the mean as the threshold of binarization. Directly on the code:

const average = (grayData) = > {
    let sum = 0;
    for (let i = 0; i < grayData.length; i += 1) {
        sum += data[i];
    }
    return sum / grayData.length;
};
Copy the code

Other algorithms for calculating thresholds include:

  • Sense hash pHash
  • Otsu algorithm

Interested students can understand, oTSU generation of binarization effect is better, we will have OTSU to deal with the calculation of image threshold, here is also posted an OTSU implementation:

const otsu = (grayData) = > {
    let ptr = 0;
    // Record the number of grayscale values from 0 to 256. The initial value is 0
    let histData = Array(256).fill(0);
    let total = grayData.length;

    while (ptr < total) {
        let h = grayData[ptr++];
        histData[h]++;
    }
    // Total (gray value x quantity)
    let sum = 0;
    for (let i = 0; i < 256; i++) {
        sum += i * histData[i];
    }
    // Number of backgrounds (less than threshold)
    let wB = 0;
    // The number of prospects (greater than the threshold)
    let wF = 0;
    // Total of background images (grayscale x number)
    let sumB = 0;
    // Store the maximum variance between classes
    let varMax = 0;
    / / threshold
    let threshold = 0;

    for (let t = 0; t < 256; t++) {
        // The number of backgrounds (less than the threshold) is accumulated
        wB += histData[t];
        if (wB === 0) continue;
        // The number of prospects (greater than the threshold) is accumulated
        wF = total - wB;
        if (wF === 0) break;
        // Background (grayscale x number) summation
        sumB += t * histData[t];

        // Average grayscale of background (less than threshold)
        let mB = sumB / wB;
        // Average grayscale of foreground (greater than threshold)
        let mF = (sum - sumB) / wF;
        // Interclass variance
        let varBetween = wB * wF * (mB - mF) ** 2;

        if(varBetween > varMax) { varMax = varBetween; threshold = t; }}return threshold;
};
Copy the code

Image binarization

Thresholds are obtained after we then binarization is very simple, but there are pay attention to the point, because we generate images of text color and background color are uncertain, threshold, we obtained during binarization image background color may be greater than the threshold, may also be less than the threshold, it can not be unified all the output of the picture. Here we need to specify binary image output, we uniformly set the background color to black (value 0) and the text color to (255 white).

Since the image we generated is relatively simple, the background color of the image can be confirmed by taking the RGB value of the first pixel, and the code implementation is also very simple:

const binaryzationOutput = (originCanvas, threshold) = > {
    const ctx = originCanvas.getContext('2d');
    const imageData = ctx.getImageData(0.0, originCanvas.width, originCanvas.height);
    const { width, height, data } = imageData;
    // The first pixel is the background color
    const head = (data[0] + data[1] + data[2) /3 | 0;
    // If the background color is greater than the threshold, the background and text color values need to be swapped
    const color = head > threshold
        ? { foreground: 0.background: 255}, {foreground: 255.background: 0 };
    for (let x = 0; x < width; x++) {
        for (let y = 0; y < height; y++) {
            const idx = (x + y * width) * 4;
            const avg = (data[idx] + data[idx + 1] + data[idx + 2) /3 | 0;
            const v = avg > threshold ? color.foreground : color.background;
            data[idx] = v;
            data[idx + 1] = v;
            data[idx + 2] = v;
            data[idx + 3] = 255;
        }
    }
    ctx.putImageData(imageData, 0.0);
    return originCanvas.toDataURL();
}
Copy the code

Another point to note is that the binary processing here is the original image, not the image after gray processing.

The complete code stamp for this, binarization image is shown below:

Words cut

After the above binarization processing we have unified the picture processing into black white picture, operation is very simple, but the production level of OCR implementation also tend to involve complex image preprocessing, such as images of projection correction, rotation correction, cutting, noise reduction, sharpening operations such as images, the pretreatment is to generate a picture contains text messages only clean, Because it will greatly affect the next step of the text cutting effect.

As the literal description, we have to find a way to extract the words of a Hicharvin, the following introduction of a simple cutting algorithm: projection cutting algorithm.

The basic idea is:

  • Scan each line of pixels from top to bottom to cut out the line where the text is
  • Cut out a single text by scanning each column of pixel values from left to right

Cutting line

The size of the picture is 700 x 600. Scan the pixels of each row from top to bottom. Black pixels are denoted as 0 and white pixels as 1.

The abscissa corresponds to the height of the picture, and the ordinate corresponds to the number of pixels in each line. We can intuitively know that the part with ordinate 0 is the blank space of the picture, the part with value is the line where the text content is located, and the line height is the span.

Cut text (cut columns)

The next step is to scan the pixel value of each column of the text line from left to right. The number of 1 is also recorded as 0 for black and 1 for white. Taking the first line of text as an example, the scanned line graph is as follows:

Yeah, I know, it’s the same as cutting rows, as long as the ordinate is worth cutting out!

The problem is that if you simply split the text by the ordinate value interval, the last text will be split into left and right parts:

The reason is easy to understand, the last text is left and right structure, there is a gap in the middle, so the text is split.

Take a look at the following special characters. Generally, we need to consider left and right or up and down characters when splitting text.

Upper and lower structure of text:

Left and right structure of text:

What do we do with this text? We can easily observe that the Sika text is square, so the width and height ratio of each text should be 1:1. If we can know the width or height of the text, then we can put together the text area. How do you calculate the width or height of text?

Process is very simple, in view of the whole image, a transverse scan, a longitudinal scan, can get text size on the transverse and longitudinal direction projection, we take the biggest range in the transverse and longitudinal projection is the size of the standard word, break up the text block less than standard size continues to merger with next projection interval of text block, until you reach the standard size of text.

We first to achieve horizontal and vertical scanning and get the maximum text block method

// Horizontal and vertical scanning
function countPixel(imageData, isRow = false) {
    const { width, height, data } = imageData;
    const offsets = [0.1.2];
    / / the background color
    const head = offsets.map((i) = > data[i]);
    const pixel = [];
    if (isRow) {
        // Scan horizontally from top to bottom
        for (let i = 0; i < height; i++) {
            let count = 0;
            for (let j = 0; j < width; j++) {
                const index = (i * width + j) * 4;
                const isEqual = offsets.every(
                    (offset) = > head[offset] === data[index + offset]
                );
                count += isEqual ? 0 : 1; } pixel.push(count); }}else {
        // Scan vertically from left to right
        for (let i = 0; i < width; i++) {
            let count = 0;
            for (let j = 0; j < height; j++) {
                const index = (j * width + i) * 4;
                const isEqual = offsets.every(
                    (offset) = > head[offset] === data[index + offset]
                );
                count += isEqual ? 0 : 1; } pixel.push(count); }}return pixel;
}

// Split text and background
function countRanges(counts) {
    const groups = [];
    let foreground = 0;
    let background = 0;
    counts.forEach((count) = > {
        if (count) {
            foreground += 1;
            if (background) {
                groups.push({ background: true.value: background });
                background = 0; }}else {
            background += 1;
            if (foreground) {
                groups.push({ foreground: true.value: foreground });
                foreground = 0; }}});if (foreground) {
        groups.push({ foreground: true.value: foreground });
    }
    if (background) {
        groups.push({ background: true.value: background });
    }
    return groups;
}

// Get the maximum range of text content
function getMaxRange(data) {
    return data.reduce((max, it) = > {
        if (it.foreground) {
            return Math.max(max, it.value);
        }
        return max;
    }, 0);
}
Copy the code

Calculate the text size in the image:

const imageData = {};
// Line by line scan
const rowsRanges = countRanges(countPixel(imageData, true));
// Scan column by column
const colsRanges = countRanges(countPixel(imageData, false));

// Calculate horizontal and vertical pixel distribution to get the size of the font content (square area of the font)
const fontRange = Math.max(
    getMaxRange(rowsRanges),
    getMaxRange(colsRanges)
);
Copy the code

Merge left, right, up, and down text sections:

// Merge structure-separated text ranges
function mergeRanges(data, size) {
    const merge = [];
    // chunks are used to hold areas smaller than the standard text size
    let chunks = [];
    data.forEach((item) = > {
        if (chunks.length) {
            chunks.push(item);
            const value = chunks.reduce((sum, chunk) = > sum + chunk.value, 0);
            // Currently changed areas larger than or close to the standard text size are merged together
            if (value >= size || Math.pow(value - size, 2) < 4) {
                merge.push({
                    foreground: true,
                    value,
                });
                chunks = [];
            }
            return;
        }
        // Area content smaller than the standard text size is push chunks
        if (item.foreground && item.value < size) {
            chunks = [item];
            return;
        }
        merge.push(item);
    });
    return merge;
}
Copy the code

After unified processing, the block information is as follows. We only need to cut out the foreground and its corresponding block size value in sequence.

[{"background": true."value": 115
    },
    {
        "foreground": true."value": 70
    },
    {
        "background": true."value": 30
    },
    {
        "foreground": true."value": 70
    },
    {
        "background": true."value": 30
    },
    {
        "foreground": true."value": 70
    },
    {
        "background": true."value": 30
    },
    {
        "foreground": true."value": 70
    },
    {
        "background": true."value": 115}]Copy the code

All that is left is to calculate the various offsets and then cut out the individual text blocks from the CNAVAS and record the position information. The specific implementation can be printed here, without going into details, cut out the text content as follows:

Similar image detection

After cutting out the text, all that remains is the translation of the text. For the Shika text, we know the mapping rule between it and English. Each shika symbol corresponds to an English symbol. We can generate a shika symbol picture corresponding to 40 English characters as the standard character picture, so the shika picture translation can be simply understood as: Compare the similarity between the cut picture and the known 40 standard character pictures one by one, and find the picture with the highest similarity is the target character.

The above for abcdefghijklmnopqrstuvwxyz0123456789 -! ? The corresponding Shica symbol.

How do we compare the similarities between two pictures? In fact, we have already completed a large part of the work, we just need to kick in. Since a has already become black and white image binarization processing, we will picture black pixel output is 0 white part of the output to 1 so that you can get a picture of the binary hash, as to the similarity of two images is to compare two images hash number of differences in the same place, is actually calculated two pictures of hash hamming distance, the smaller the hamming distance, The more similar the two images are. We simply change the binary output code to get the hash of the image.

const binaryzationHash = (originCanvas, threshold) = > { 
    const ctx = originCanvas.getContext('2d');
    const imageData = ctx.getImageData(0.0, originCanvas.width, originCanvas.height);
    const { width, height, data } = imageData;
    // The first pixel is the background color
    const head = (data[0] + data[1] + data[2) /3 | 0;
    // If the background color is greater than the threshold, the background and text color values need to be swapped
    const color = head > threshold
        ? { foreground: 0.background: 255}, {foreground: 255.background: 0 };
    const hash = [];
    for (let x = 0; x < width; x++) {
        for (let y = 0; y < height; y++) {
            const idx = (x + y * width) * 4;
            const avg = (data[idx] + data[idx + 1] + data[idx + 2) /3 | 0;
            const v = avg > threshold ? color.foreground : color.background;
            hash.push(v ? 1 : 0); }}return hash;
}
Copy the code

The Hamming distance comparison is also quite simple:

const hammingDistance = (hash1, hash2) = > {
    let count = 0;
    hash1.forEach((it, index) = > {
        count += it ^ hash2[index];
    });
    return count;
};
Copy the code

This is the core code for comparing similar images, because we cannot guarantee that the size of the text block cut out can be the same as the standard image, so we will reduce the size of the cut image and the standard image to 8 x 8 and then compare. The main process of similarity comparison between the two images is as follows:

  • Zoom out the comparison images to 8 x 8
  • Image grayscale processing
  • Calculate the binarization threshold
  • Image binarization computes image hashing
  • Compare the Hamming distances of the hashes of two images

If you are interested in similar image recognition, you can read this article: A simple implementation of similar image recognition.

Back to our Shika translation, so there are only three steps we need to do now:

  1. The 40 standard images are uniformly reduced to 8 x 8 and the corresponding image hash is generated
  2. The text images are uniformly reduced to 8 x 8 and the corresponding image hashes are generated
  3. Cut out the text hashes and compare them one by one with the 40 standard image hashes, picking out the least difference (with the highest similarity) is the target letter

The code implementation is also relatively simple:

async function createImageFingerprints(image) {
    const contents = splitImage(image);
    return contents.map(({ canvas, ... args }) = > {
        // Zoom down to 8 pixels
        const imageData = resizeCanvas(canvas, 8);
        const hash = binaryzationOutput(imageData);
        return {
            ...args,
            hash,
        };
    });
}

// Generate a standard character fingerprint
function createSymbols(fingerprints) {
    const WORDS = 'abcdefghijklmnopqrstuvwxyz0123456789.-! ? ';
    return fingerprints.map((it, index) = > {
        return {
            name: WORDS[index],
            value: it.hash,
        };
    });
}

// Matches the most similar character
function mapSymbols(fingerprints, symbols) {
    return fingerprints.map(({ hash, ... position }) = > {
        const isEmpty = hash.every((v:) = > v === hash[0]);
        if (isEmpty) {
            return ' ';
        }
        let diff = Number.MAX_SAFE_INTEGER;
        let word = The '*';
        symbols.forEach((symbol) = > {
            const distance = hammingDistance(hash, symbol.value);
            // The hamming distance is greater than the logo similarity deviation is large
            if(distance < diff) { diff = distance; word = symbol.name; }});return {
            ...position,
            word,
            diff,
        };
    });
}
Copy the code

It looks something like this:

/ * * *@param ImageUrl parses the image *@param MapUrl Standard character image */
export async function readMetaInfo(imageUrl, mapUrl) {
    const mapImage = await loadImage(mapUrl);
    const mapImageFingerprints = await createImageFingerprints(mapImage, false);
    const symbols = createSymbols(mapImageFingerprints);
    const readImage = await loadImage(imageUrl);
    const readImageFingerprints = await createImageFingerprints(
        readImage,
        true
    );
    const results = mapSymbols(readImageFingerprints, symbols);
    console.log(results);
}
Copy the code

The complete code implementation can be punched here, so that a simple and simple version of the Hickawan OCR translator is complete

other

  • Touch a Hilda Cavan word converter
  • Simple implementation of similar image recognition
  • Using JS to achieve a variety of image similarity algorithm
  • Text cutting algorithm – based on projection cutting
  • Text cutting algorithm – projection cutting optimization

Touch fish do a small thing, a rough understanding of the implementation of OCR or very happy. This last picture is for the special you, yeah