Qt Chinese garble problem

Article source: blog.csdn.net/brave_heart…

The following is dbZhang on qt Chinese garble problem on the cause of the elaboration, feel good:

First of all, there are no Chinese support problems with QString. Many people have problems not with the QString itself, but with not assigning the desired string to the QString correctly.

Very simple question, when “I am Chinese” is written like this, it is a traditional narrow string of type char, all we need is some way to tell QString which encoding is used for the four characters. The problem is that many users don’t have much idea of their current encoding, so \

A simple Qt procedure

The following small procedures, estimated that we will feel more cordial. It seems that quite a few Chinese users have tried to write code like this:

#include <QtGui/QApplication> #include <QtGui/QLabel> int main(int argc, char **argv) { QApplication app(argc, argv); QString a= "I am a Chinese character "; QLabel label(a); label.show(); return app.exec(); }Copy the code

Coding, saving, compiling, and running all went well, but here’s the result:

Most users see Other users see

IOEC DHS DHS x O “æ æ ˆ ˜ ¯ æ + / – per thousand a –

Most users see	Other users see
IOEC DHS DHS x O	“æ æ ˆ ˜ ¯ æ + / – per thousand a –

Unexpectedly, the interface did not show the Chinese, did not recognize the character. So start using search engines, start Posting on forums or complaining

Finally, I was told that one of the following statements would solve the problem:

QTextCodec::setCodecForCStrings(QTextCodec::codecForName("GB2312"));
QTextCodec::setCodecForCStrings(QTextCodec::codecForName("UTF-8"));
Copy the code

Try the two instructions one at a time and you can really solve it (first for most users, second for others). So why does this happen?

When do two kinds of garbled characters appear

I think we all have something to say about this. Before we continue, let’s make a list of the two types of garbled characters in which case:

We only list the three most commonly used compilers (cl in Microsoft VS, g++ in Mingw, g++ in Linux), the source code is respectively saved by GBK and utf-8 without BOM and utf-8 with BOM.

Encoding of source code	The compiler	The results of
GBK	cl	1	*
mingw-g++	1	*
g++	1
Utf-8 (BOM)	cl	2
mingw-g++	2
g++	2	*
UTF-8(带BOM)	cl	1
mingw-g++	2
g++	Compilation fails

Using 3 different codes to save the source code files, respectively compiled with 3 different compilers, the formation of 9 combinations, get rid of one can not work, two kinds of garbled cases accounted for half.

We can also see that the garbled code and the operating system is not related to the original. But we usually use GBK on Windows and UTF-8 without BOM on Linux. If we only consider the * case, we can also say that the two kinds of garbled characters are related to the system.

Why is QString garbled

Is it really QString garbled? We can ask ourselves, are we complaining to the wrong people?

Before we continue, a few concepts are clear:

Define concept 0:

“I am a Hanzi” is a string in C, which is a narrow string of type CHAR. The above example can be written as

Const char * STR = "I am a Character "; QString a= str;Copy the code

Char STR [] = "I am a character "; QString a= str;Copy the code

Etc.

Define Concept 1:

The source file has an encoding, but the plain text file does not record its encoding

This is the root of the problem, as a test, save the previous source code as GBK code, using the hexadecimal editor can see the quotes ce d2 CA C7 ba BA d7 d6 such 8 bytes.

Now copy this file to Chinese Windows and open it with Notepad. What does it look like?

. QString a= "I have a slide. "; QLabel label(a); label.show(); .Copy the code

Then put in the European and American Windows system, and then open with notepad?

. QString a = "IOEC DHS DHS x O"; QLabel label(a); label.show(); .Copy the code

The same file, without any modification, but the eight bytes ce d2 CA C7 BA BA D7 d6, to the eyes of mainlanders using GBK, Hong Kong, Macao and Taiwan compatriots using BIG5, and Europeans using Latin-1, are completely different words.

Define Concept 2:

As we all know ‘A’ is equivalent to ‘\x41’.

GBK code

Const char * STR = "I am a Character"Copy the code

Is equivalent to

const char * str = "\xce\xd2\xca\xc7\xba\xba\xd7\xd6";
Copy the code

When encoded in UTF-8, this is equivalent to

const char * str = "\xe6\x88\x91\xe6\x98\xaf\xe6\xb1\x89\xe5\xad\x97";
Copy the code

Note: this is not entirely true, for example utF-8 with BOM. The characters themselves are utF-8 encoding when used with CL compiler, but the corresponding GBK encoding is stored in the program.

Clear concept 3:

QString uses Unicode internally.

QString uses Unicode internally, and it can store GBK characters “I am Chinese “,BIG5 characters” slide letter “, and Latin-1 characters “IOEC º× O”.

One question is how to convert the 8 bytes “\xce\xd2\xca\xc7\xba\xba\xd7\xd6” in the source code into Unicode and store them in QString. By GBK, BIG5, Latin-1 or whatever…

Without you telling it, it defaults to Latin-1, and the 8-character “IOEC º× O” Unicode code is stored in the QString. Eventually, eight Latin characters appear where you’d expect to see four Chinese characters, and so called garbled characters appear

QString working mode

Const char * STR = "I am a Character "; QString a= str;Copy the code

When you convert a narrow char* into a Unicode QString, you need to tell the QString what encoding is in your char*. GBK, BIG5, Latin-1

Ideally, pass char* to QString and tell QString what its encoding is:

Like the following functions, the member functions of QString know what encoding to use to process C strings

QString QString::fromAscii ( const char * str, int size = -1 )
QString QString::fromLatin1 ( const char * str, int size = -1 ) 
QString QString::fromLocal8Bit ( const char * str, int size = -1 )
QString QString::fromUtf8 ( const char * str, int size = -1 )
Copy the code

Local8Bit = GBK; char BIG5 = Latin-2; char BIG5 = BIG5;

Use the powerful QTextCodec. First, QTextCodec must know what encoding it is responsible for. Then you send it a char string and it converts it to Unicode correctly.

QString QTextCodec::toUnicode ( const char * chars ) const
Copy the code

But this call is too cumbersome, I just want to direct

QString a= str;
Copy the code

QString a(str);
Copy the code

What about this?

There’s no way to tell QString what code your STR is at the same time, so you have to do it some other way. That’s what I mentioned at the beginning

QTextCodec::setCodecForCStrings(QTextCodec::codecForName("GBK"));
QTextCodec::setCodecForCStrings(QTextCodec::codecForName("UTF-8"));
Copy the code

Sets the encoding used by QString by default. And which one to use, generally speaking, the source code is GBK, use GBK, source code is UTF-8, use UTF-8. With one exception, if you save utF-8 with BOM and use Microsoft CL compiler, it is still GBK.

In summary, the main reasons for garbled codes are as follows:

QString uses Unicode internally, and it can store GBK characters “I am Chinese “,BIG5 characters” slide letter “, and Latin-1 characters “IOEC º× O”.

When you convert a narrow char* string into a Unicode QString, you need to tell the QString what encoding is in your char* string. GBK, BIG5, Latin-1?

Without you telling it, it defaults to Latin-1, and the 8-character “IOEC º× O” Unicode code is stored in the QString. Finally, eight Latin characters appear where you’d expect to see four Chinese characters,

So called garbled code appears.

There are many ways to set it directly in main.cpp:

QTextCodec *codec = QTextCodec::codecForName(“UTF-8”);

QTextCodec::setCodecForTr(codec);

QTextCodec::setCodecForLocale(codec);

QTextCodec::setCodecForCStrings(codec);

In fact, this can be problematic in some cases, because the program may read the Chinese path of the system, or call an external program under the Chinese path, in this case, if the system is GB2312.

Because the encoding of The Chinese path is stored in THE QString using UTF-8, the system uses gb2312 when reading the Chinese path decoding, so the external program with The Chinese path cannot be tuned.

The above problems can be solved by the following methods:

QTextCodec *codec = QTextCodec::codecForName(“UTF-8”);

QTextCodec::setCodecForTr(codec);

QTextCodec::setCodecForLocale(QTextCodec::codecForLocale());

QTextCodec::setCodecForCStrings(QTextCodec::codecForLocale());

For external string encoding and decoding, all local encoding is used.

A simple Qt procedure

Why is QString garbled

Define Concept 1:

Clear concept 3:

Related Posts