Storage rules for vARCHAR

Below version 4.0, varchar(20), which means 20 bytes, can only store 6 characters (3 bytes per character) if UTF8 is stored. Varchar (20), which refers to 20 characters, can store 20 characters whether they are numbers, letters or UTF8 characters (3 bytes per character), with a maximum size of 65532 bytes. The VARCHAR field stores the actual content separately from the cluster index, starting with 1 or 2 bytes indicating the actual length. Here’s what the official said: Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. In contrast to CHAR, VARCHAR values are stored as a one-byte or two-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes.

Difference between varchar and char

Difference one, fixed length and variable length

Char indicates constant length, varchar indicates variable length, that is, variable length. When strings are inserted beyond their length, the insertion is rejected with an error message in strict mode, and truncated and inserted in loose mode. If the string length is less than the defined length, it is handled differently, such as char (10), indicating that 10 characters are stored, whatever number you insert is 10, and if it is less than 10 it is filled with Spaces. Varchar (10), on the other hand, stores as many characters as you insert if it is less than 10. How does vARCHar know the length of the stored string? In fact, for a VARCHAR field, one (if the string length is less than 255) or two bytes (if the string length is greater than 255) are needed to store the length of the string. But since he needs a prefix to indicate his exact number of bytes (since varchar is variable-length, he doesn’t know how to read the data without this length value).

The second difference is that storage capacity is different

For char, the maximum number of characters can be 255, regardless of encoding. Varchar, on the other hand, can store 65532 characters at most. The maximum effective length of a VARCHAR is determined by the maximum line size and the character set used. The overall maximum length is 65,532 bytes

The encoding length limit of vARCHAR

If the character type is GBK, the value can contain a maximum of 2 bytes and cannot exceed 32766 bytes. If the character type is UTF8, each character contains a maximum of 3 bytes and cannot exceed 21845 bytes. If the limit is exceeded, the VARCHAR field will be forced to text and warning will be generated.

Line limit

It is the length of a row definition that causes the vARCHAR length limit in practice. MySQL requires that a row definition be no longer than 65535. ERROR 1118 (42000) is displayed if the length of the defined table exceeds this value: Row size too large. The maximum row size for the used table type, not counting BLOBs, Is 65535. You have to change some columns to TEXT or BLOBs. This means, for example, that if you create a table with two varhCAR type fields in the table structure, the total length of the two fields cannot exceed 65535. The official instructions are as follows: Every table has a maximum row size of 65,535 bytes. This maximum applies to all storage engines, but a given engine might have additional constraints that result in a lower effective maximum row size.

Varchar control bit

The Varchar character type in MySQL also reserves 1 byte for other control information.

The sample

Example 1: If a table has only one field of type VARCHAR(N), utF8 encoding, what is the maximum value of N?

For example, create table tb_name1(a vARCHar (N)) default charset= UTF8, the maximum value of N =(655,5-2)/3=21844. The reason for the reduction is that the actual line storage starts with the second byte. The reason for the 2 minus is that the 2 bytes in the vARCHar header represent the length. The reason for dividing by 3 is that the character encoding is UTF8. SQL test:

create table tb_name1(a varchar(21844)) default charset=utf8; Query OK, 0 rows affected (0.38 SEC) drop table tb_name1; Query OK, 0 rows affected (0.00 SEC) create table tb_name1(a vARCHar (21845)) default charset=utf8; ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. You have to change some columnsCopy the code

Example 2: If a table has a field of type VARCHAR(N) and other field types, utF8 encoding, what is the maximum value of N?

Create table tb_name2(a int, b char(20), C vARCHar (N)) default charset=utf8; N The maximum value is (655-1-2-4-20)

3)/3=21822 minus 1 because the actual line storage starts with the second byte. The reason for the 2 minus is that the 2 bytes in the vARCHar header represent the length. The reason for subtracting 4 is that the int of field A takes up 4 bytes. Minus 20

The reason for 3 is that char(20) takes up 60 bytes and is encoded utF8.

SQL test:

create table tb_name2(a int, b char(20), c varchar(21822)) default charset=utf8; Query OK, 0 rows affected (0.28 SEC) drop table tb_name2; Query OK, 0 rows affected (0.20 SEC) create table tb_name2(a int, B char(20), C vARCHar (21823)) default charset=utf8; ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. You have to change some columns to TEXT or BLOBsCopy the code

Example 3: If a table has a multi-field VARCHAR(N) type, and other field types, GBK encoding, what is the maximum value of N?

Create table tb_name3(a int, B char(20), C vARCHar (50), d varchar(N)) default charset= GBK; N The maximum value is (655,5-1-1-2-4-20)

2-50

2)/2=32693 The reason for the first minus 1 is that the actual line storage starts with the second byte. The second minus 1 indicates the second varchar(50) header each 1 byte indicates the length (less than 255). The reason for the 2 minus is that the 2 bytes in the vARCHar header represent the length. Minus 20

The reason for 2 is that char(20) takes up 40 bytes and is encoded GBK. Lose 50

The reason for 2 is that vARCHar (50) occupies 100 bytes and the encoding is GBK. SQL test:

create table tb_name3(a int, b char(20), c varchar(50), d varchar(32694)) default charset=gbk;
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. You have to change some columns to TEXT or BLOBs


create table tb_name3(a int, b char(20), c varchar(50), d varchar(32693)) default charset=gbk;
Query OK, 0 rows affected (0.18 sec)
Copy the code