Before ES6, this was left to the back end.

For now, the best way to do this is to rely on the back end, which can read the BOM header.

Q: What is a BOM header? A: Software like Notepad on WINDOWS inserts three invisible characters (0xEF 0xBB 0xBF) at the beginning of A UTF-8 encoded file when saving it. It is a string of hidden characters that editors like Notepad can use to tell if the file is encoded in UTF-8. For GBK, ANSI, UTF-8 files, the first character will be 0-9A-ZA-Z -/. One of the

But there is a case can be judged by the front end, what is the case, is that we only need to distinguish the TXT file is UTF8 or UTF16 encoding format, and the UTF8 file’s first character is not in addition to the ASCII table of other characters, because these characters in the UTF8 encoding file for a byte length. (The relationship between byte, character and encoding is here)

In this case, you can use the method described in this article, of course, it is better to check the logic of both the front and back ends.

Let’s start with some concepts:

Then look at a table of bytes.

(Photo from Hongda:Characters, bytes and encodings)

According to the table, under UTF8 encoding, the number of bytes of English characters is 1, while UTF16, the number of bytes of English characters is greater than 1. That is to say, we can read the first byte through Uint8Array to judge whether it is a character, so as to judge whether it is a UTF8 encoded file. This is also a limitation of this method, because some characters are two-byte encoding, and we cannot determine the UTF8 encoding of the file by judging the first byte.

After the file is read in binary format, the Uint8Array of TypedArray is used for conversion, and the read out is ASCII code, which can be judged according to the corresponding table of ASCII and characters.

/ * * *@descriptionRead the TXT file in binary format and determine whether it is UTF-8 *@param {type}
 * @return: * /
function readFile(file) {
    return new Promise((resolve, reject) = > {
        const reader = new FileReader();
        reader.onload = function (evt) {
            resolve(evt.target.result);
        };
        reader.readAsArrayBuffer(file);
    });
}

export const isUtf8 = async function (file) {
    const res = await readFile(file);
    const firstCode = new Uint8Array(res)[0];
    return firstCode >= 33 && firstCode <= 126;
};
Copy the code