This is the ninth day of my participation in the August More text Challenge. For details, see: August More Text Challenge

Collation: indicates the Collation rule

Character Set or Character Encoding: Character Encoding/Character Encoding Set: Character Encoding Set

Clear interpretation of coded character sets and collation rules

A Character Set is a Set of symbols and encodings. A Collation is a set of rules for comparing characters in a character set.

The following is an example of a fictional character set to illustrate character sets and collation rules:

Suppose we have an alphabet with four letters: A, B, A, B. We give each letter A number: A = 0, B = 1, A = 2, B = 3. The letter A is A symbol, the number 0 is the code for A, and the combination of all four letters and their codes is A character set.

Suppose we want to compare two string values, A and B. The easiest way to do this is to look at the encoding: 0 for A, 1 for B. Since 0 is less than 1, we say A is less than B. So what we just did is we applied the collation to our character set. This collation is a set of rules (in the current example there is only one rule) : “comparison encoding.” We call this simplest collation a binary collation.

But what if you want to say that lowercase letters and uppercase letters are equivalent?

Then there are at least two rules :(1) treat lowercase letters a and b as equivalent to a and b; (2) Then compare the codes. This is known as case-insensitive or case-insensitive collation. It’s a little more complicated than binary collation.

In real life, most character sets have many characters or symbols: not just A and B, especially in languages and symbols worldwide, and many special symbols, punctuation marks.

Also in real life, most collation rules have many rules, including not only case-sensitive, but also whether to distinguish between accents (” accent “is a mark attached to a character, such as the German O), and multi-character mappings (such as the rule of O = OE in one of two German collations).

Processing of character set and collation rules in database system

Therefore, the database system needs to realize the support and processing of different character sets and different sorting rules. To cope with a variety of different regional contexts and character text content.

The RDBMS needs to handle:

  • Supports multiple character sets for storing character text.
  • Supports comparing character text using multiple collation rules.
  • Support for mixing character text with different character sets or collations in the same server, the same database, or even the same table.
  • Supports specifications for specifying character sets and collation rules at the database system, database, table, and column levels.

And other features and operations related to character encoding and collation.

RDBMSS usually use the default character set and collation rules, which rarely change in practice. Different database systems have different support for character set encoding and collation, and you need to know about the official support.

However, you should have some understanding of the character sets and collations available in an RDBMS, how to change the default Settings, and how to use them in databases, tables, columns, and queries.

Different character sets and collation rules have different effects. Such as character text comparison, manipulation and function behavior, indexing and physical data storage processing, etc.


Character Sets and Collations in General