Writing in the front
Some time ago encountered a more interesting bug, TXT file resources with a browser to open the preview of garbled code. And then I want to share with you the final solution.
Problem && analysis
TXT commonly used encoding formats include ANSI, UTF-8, GB2312, and then simplified Chinese system, ASNI and GB2312 are the same, so we have to deal with only GB2312 and UTF-8
⚠️ : The default encoding format of TXT files for Windows Office 2003 is GB2312Copy the code
Plan a
The first one is more interesting, using filereader. readAsText, which specifies the encoding format of the file and pulls out the plain text of the file. However, if you specify a different encoding format than the source file, the extracted text will be garbled. So the idea here is:
- First use “UTF-8” to get the text
- Check whether string has Chinese characters, if not, it will be treated as garbled (gb2312 and UTF-8 English encoding is the same)
- If a garbled code is detected in front, use “GB2312” to take a text again, this time there is no need to detect, take out the normal text
- Generates a new file with ungarbled text
function txt2utf8(file, Callback){let newBlob = null const reader = new FileReader() // readAsText can specify the encoding format to extract the file into plain text ReadAsText (file,'utf-8') read. onload = e => {const txtString = e.target.result // utF-8 regular expression const patrn=/[\uFE30-\uFFA0]/gi; If (!) if (!) if (!) Patrn.exec (txtString)) {let reader_gb2312 = new FileReader() Reader_gb2312.readastext (file,'gb2312') reader_gb2312.onload = e2 => {newBlob = new Blob([e2.target.result]) callback&&callback(newBlob)}} else {// NewBlob = new blob ([txtString]) callback&&callback(newBlob)}}Copy the code
⚠️ : Do not adjust the sequence of UTF-8 and GB2312. Due to readAsText internal implementation problems, using GB2312 to take UTF-8 TXT to do Chinese detection is not accurate.Copy the code
Implementation flowchart:
Scheme 2
This solution is rough, draws two NodeJs, and the front end runs. I just made a simple attempt about this scheme, but the expansion of a scheme is much better, interested partners can try.
import iconv from 'iconv-lite' import jschardet from 'jschardet' function txt2utf8(file, callback){ const reader = new FileReader() reader.readAsBinaryString(file) reader.onload = e => { const txtBinary = E.target. result // Get the file stream encoding with jschardet, Const binaMg = jschardet.detect(txtBinary) const buf = new Buffer(txtBinary, Const STR = iconv.decode(buf, const STR = iconv.decode(buf, binaMg.encoding) const newBlob = new Blob([str]) callback&&callback(newBlob) } }Copy the code
⚠️ : Scheme 2 has only been debugged locally, so there is no problem running locally. Specific availability and compatibility are uncertain.Copy the code