How does a computer display characters

In our post on “Rambling: How to Explain to Your Girlfriend Why computers Only know Zeros and ones”, “, in the computer world, only 0 and 1 characters, all data need to be represented by binary, such as 52 English letters (uppercase + lowercase), Arabic numerals and common symbols in the computer need to be represented by binary.

So, for all the characters we see on a computer, there needs to be a way to put them in binary representation.

This process of transformation requires mapping through character codes, as we did in “Ramble: How to Explain to Your Girlfriend what a Charles Shackle is”. In order to convert characters into binary, many character encoding standards have been developed, including Unicode, GBK and so on.

So, with the character encoding, the computer can recognize the character we want to enter, but it is still more complicated to display it, the general process is as follows:

When we type a character on the keyboard, the computer converts the character to binary through Unicode.

Next, query the Charmap in the font file with the obtained Unicode encoding value and convert the encoding value into a glyphs index.

Once you have the glyph index, you can load the corresponding glyph image.

The glyph image can then be graphically rendered and displayed on the monitor.

You may not know much about font indexing and graphics rendering, but all you need to know is that in order for a character to appear on a computer, you need the following three conditions:

1. Input method supports the input of this character

2. Unicode encoding supports binary conversion of this character

3. Fonts installed on your computer contain this character

The input method supports character sets

Since there are tens of thousands of Chinese characters, it is impossible for a computer keyboard to create a key for each character. Therefore, people need to make up a set of input codes (codes to retrieve Chinese characters) for Chinese characters, using several keys to enter a Chinese character. The tool for converting multiple key inputs into Chinese characters is The Chinese input method.

At present, most of the more common Chinese input methods on the market use GBK as the character set.


GBK contains a total of 21886 Chinese characters and graphic symbols, including 21,003 Chinese characters (including radicals and components) and 883 graphic symbols. However, there are far more than 20,000 Chinese characters, so many rare characters cannot be typed by input method, such as the character biang in “Biang biang noodles”.

In addition, some input methods use a relatively complete character set (Unicode, etc.), such as Zheng code, Cang Jie and other input methods can input some rare characters.

Unicode

A character can be displayed on a computer only if it can be translated into binary, that is, if a character is not included in Unicode, it cannot be displayed anyway.

Therefore, at present, a lot of Chinese characters rare characters, and some emoji are unable to type out.

However, Unicode is constantly being updated, with the most recent update being the release of Unicode 13.0 on March 10, 2020. Unicode 13.0 adds 5,930 characters to its current total of 143,859 characters.

In the extended G area of CJK unified Ideographic Characters of Unicode 13.0, the word “biang” in “Biangbiang face” has been included. It can be seen that the corresponding code is 30EDD and 30EDE.

CJK Unified Ideographs, CJK is the abbreviation of Chinese, Japanese and Korean characters, aiming to analyze the characters derived from Chinese, Japanese, Korean, Yue and Zhuang, respectively. Ideograms of the same nature, meaning, shape or slightly different are encoded in the same way in ISO 10646 and Unicode standards.

But while Unicode 13.0 is available, because the literal encodings are embedded in the operating system, the underlying operating system needs to be updated for compatibility.

The font

If a rare character is already included in Unicode, the resulting Unicode encoding value is queried in the font file at display to convert the encoding value into a glyph index.

However, if the preloaded font does not contain certain characters, it cannot be displayed.

That said, if the input method is compatible with Unicode 13.0 and the operating system is upgraded to the latest Unicode encoding, it does not mean that the newly included rare characters like “biang” can be displayed directly.

This also depends on whether the font in the operating system contains the character. There are currently commercial fonts that support many of the extended characters in CJK.

It is believed that as operating systems and input methods are updated to new versions of the Unicode character set, some fonts will begin to support new characters.

The inconvenience caused by rare characters

At present, a lot of parents are willing to give their children the name of the use of a few rare characters, especially a few have a good meaning of the rare characters are often used, such as the good meaning of the di.

According to news reports, a college student once used the word “you page” in the name, but the traditional word “zuo” can be typed, but the stick pen can not be typed with pinyin and wubi input method.

The strange thing is that when he first registered his name in the public security system, he succeeded. However, in his later life, the university encountered many obstacles, such as alipay not being able to authenticate with his real name, purchasing tickets with his real name, and even applying for the college entrance examination.



Later, the staff of the public security system said: the public security population information database special character database is the most complete, even including many ethnic minority characters, with rare character database. However, education departments, banks, airlines, real estate and other departments do not use this character database at the same time, unable to share many rare characters, will appear a series of problems.

Sometimes, rare characters that can be typed or displayed in one system may not be displayed in another system. Therefore, for the use of rare characters, or to be careful!