Brief introduction: PostgreSQL is a powerful, open source client/server relational database management system (RDBMS). It supports NoSQL data types and is mainly oriented to enterprise complex QUERY SQL OLTP service scenarios. PostGIS geographic information engine, Ali Cloud self-developed multidimensional multi-mode spatio-temporal information engine, etc. This article focuses on the PostgreSQL data directory, which stores important files such as configuration files, data files, transaction logs, and WAL logs. All data files and initial configuration files created by customers can be found in the data directory. Therefore, the data directory is an important customer value.

The authors hidden source | | city ali technology to the public

An overview of

PostgreSQL is an open source client/server relational database management system (RDBMS). PostgreSQL supports NoSQL data types (JSON/XML/ hStore). It is mainly oriented to OLTP business scenarios of enterprise complex QUERY SQL, and provides PostGIS geographic information engine and multi-dimensional and multi-mode space-time information engine developed by Ali Cloud.

This article focuses on the PostgreSQL data directory, which stores important files such as configuration files, data files, transaction logs, and WAL logs. All data files and initial configuration files created by customers can be found in the data directory. Therefore, the data directory is an important customer value.

Two nouns

1 OID

A database object is a data structure stored or referenced by a database, and the database itself is also a database object, including tables, indexes, views, sequences, and functions. The Object ID is the unique identifier of a database Object. It is stored in an unsigned four-byte integer variable. Each database Object corresponds to an OID. PostgreSQL has two views that store different TYPES of OID. Pg_database stores the OID of database objects, and pg_class stores the OID of tables, indexes, and sequences.

2 Relation

Relationships represent database objects that are not the database itself, including tables, views, indexes, toasts, and so on, not the database itself.

3 MVCC

Multi-version-concurrency Control is a Concurrency Control mechanism that allows the database engine to check the visibility of tuples by querying transaction snapshots and transaction commit logs for different transaction isolation levels. MVCC is essential if you want to understand the mechanics of the database.

4 Page

Database files are managed on Linux by default 8K fixed-length pages, which can be preset by the startup parameter BLCKSZ. If the page is set to a lower value, files with the same amount of data need to be split into more pages. As a result, the I/O count and index split count increase, and the performance decreases. If the page is set high, the efficiency of data retrieval within the page will be reduced, and the performance will also be reduced a lot. Generally speaking, 8K and 16K are the optimal solution for database systems.

3 Data Directory

Data directories are in /var/lib/pgsql/data by default and can be managed using the environment variable $PG_DATA. The following figure shows the primary structure of the data directory. Later, we will focus on representative important files and directories, such as base and PG_xact.

Four base

1 overview

The base directory stores database files created by users and all relationships, such as tables and indexes, that belong to user databases.

2 Level-1 Directory

The directory structure is divided into two levels. The first-level structure is shown in the following figure. The first-level directory name is the OID of the user database object, and 1 represents the Postgres database.

3 Secondary File

Secondary subfiles are shown in the figure below, which store all the relationships in a database, including tables, indexes, and views. The postgres database directory is used as an example. Second-level subfiles are divided into three types: the first type is the master data file named after the relation OID, the second type is the free space mapping file whose file name ends in _FSM, and the third type is the visibility mapping file whose file name ends in _VM.

4 Master data file

The master data file stores database relationship files, including data and indexes, that belong to the corresponding database. The most important service data of customers is stored in the master data file.

When the size of the relational file is smaller than RELSEG_SIZE x BLCKSZ, the database engine creates a single file named pg_class.relfilenode. Otherwise, it splits the file into multiple files named pg_class.relfilenode.segno. A single relational file is internally divided into multiple pages of a fixed size of default 8K and stored on disk. 8K can be modified with the BLCKSZ parameter at initDB. When written to the master data file, the tuple data is stacked from the bottom of the row pointer array until space runs out.

A single row of data queried by a user in SQL corresponds to a single tuple. Due to the MVCC mechanism, the tuple may be unable to query the data of the old version or may be active data of the new version. The data of the old version will be cleared at a certain point in the future. If the query does not match the index to trigger the sequential scan, the database engine reads the tuple from the row pointer of the sequential scan page; otherwise, if it matches the B-tree index, the engine reads the tuple from the index file and the TID value of the index key.

The following figure shows the hierarchy of the master data file.

The following table shows metadata information about the internal structure of the page as shown above.

The following table shows metadata information about the internal structure of a tuple as shown above.

5 FSM

FSM is an idle space mapping file, which records the free space of each page on the heap and index. This helps quickly locate the page that has sufficient free space for storing tuples. If the page does not have sufficient free space, you need to expand a new page. There is no FSM file except the Hash Index file. The other heap and Index files require FSM files.

In general, the FSM page is organized in a 3-4 level multi-tree structure. A single FSM page is managed in a complete binary tree structure. The leaf nodes of the higher-level FSM page are associated with the lower-level FSM page. Lower-level FSM Page leaf nodes store the number of available heap and index pages, rather than the maximum number of available heap and index pages stored by leaf nodes. Each node occupies 1 byte.

6 VM

VM is a visibility mapping file that records visibility information for each heap page, so there is no VM file for index Page. On the one hand, it can improve the execution efficiency of vacumn, on the other hand through the vm file can perceive all tuples is visible within the page, if all visible, query engine query index tuples can be direct access to the data, do not have to access data tuples check visibility, decrease the number back to the table, and greatly improve the efficiency of the query.

The VM uses a bitmap structure to store visibility information. Each heap page stores only two bits in the VM file. The first represents whether all tuples are visible, and the second represents whether all tuples are frozen.

#define VISIBILITYMAP_ALL_VISIBLE  0x01
#define VISIBILITYMAP_ALL_FROZEN  0x02
Copy the code

Five global

1 overview

The global directory stores pg_control and database cluster dimension databases and their relationships, and non-customer dimension data, such as PG_DATABASE and PG_class. The file structure in the directory is the same as base.

The following figure shows the file structure of the global directory.

2 pg_control

The pg_control file records database cluster control information, including initDB initialization, WAL, and checkpoint information.

Six pg_wal

1 overview

Pg_wal is the directory for storing WAL logs in the WAL mechanism. In PG10 and later versions, the directory is named pg_wal. Before 10, the directory is named pg_xlog.

Mechanism of 2…

Write-ahead-logging: Logging Ahead mechanism. Data changes are written to the log file preferentially. If the transaction fails, the change record is ignored. When the transaction succeeds, data is written to the data file at an appropriate time. If LSN is greater than PD_LSN, wal records are replayed. If LSN is greater than PD_LSN, WAL records are replayed. If LSN is greater than PD_LSN, WAL records are replayed.

3 File Structure

4 wal segment

Wal section files store database row record details. Each record detail is used for database recovery operations to ensure data consistency. Wal segment files record any modification of data, including INSERT, UPDATE, and DELETE. Wal segment files also record some management actions of the system, such as transaction commit and VACUUM.

The name of a WAL file is 00000001 00000001 00000092. The name contains 24 bits. The first eight bits are timeline, the middle eight bits are Logid, the last eight bits are Logseg, and the first six bits of logSEG are always 0. According to the last two bits of the WAL segment file name, WAL records are recorded in different WAL segment files based on the LSN.

5 .history

The. History file contains the original. History file, the current timeline switchover record, the switchover reason, and the point-in-time recovery behavior of the database. When the database engine recovers from a backup of multiple timelines, the database recovers from the. History file all wal segment files between the start_timeline of PG_control and the specified recovery_target_timeline.

6 archive_status

Archive_status is the backup directory for WAL files, including. Ready and. Done files. Wal logs that exceed the limit of wal_keep_segments are marked in the archive_status directory and removed after archiving.

7 .ready

Ready is the mark file of the wal segment file with the same name in the archive_status directory, indicating that the WAL segment file can be archived. There is a maximum number of WAL segment files that can be stored in a data directory. This parameter is generally controlled by wal_keep_segments. Therefore, when the number of WAL segment files reaches the upper limit, the database engine adds a marker file that can be removed to the archive_status directory. The file name is added with the suffix. Ready to be archived by the archiving tool.

8 .done

Done is the mark file of the wal segment file with the same name in the archive_status directory, indicating that the WAL segment file has been archived and can be cleared. By default, the database engine uses archive_command to archive. Ready files, depending on whether the archive_command command returns true or false. When archive_command returns true, The wal file with the same name as the. Ready file is archived, and the engine changes the extension of the file to. Done until the database engine clears the original WAL file at its next checkpoint.

Seven pg_xact

1 overview

Pg_xact is the directory where transaction Commit logs are stored. The default transaction Commit Log file name is 256KB, and the file name is NNNN. After system initialization, the file name increases from 0000 to FFFF. For PG 10 and later versions, the directory is changed to pg_xact, and the directory before PG 10 is pg_clog.

The following image shows clog files in pg_xact. The files before 027E are vacuumed because transactions have been frozen.

2 Commit Log

The transaction commit log stores the individual transaction running state of the database. The Commit Log consists of a set of 8KB pages in shared memory, each page containing an array, each array element containing the XID and the real-time state of the transaction. When pages are insufficient, new pages are created to store new transactions.

8 Configuration File

1 postgresql.conf

The postgresql.conf file stores configuration file location, resource limit, and cluster replication. It is the most important configuration file used when a database is running.

2 postgresql.auto.conf

The postgresql.auto. Conf file stores global configuration parameters of the database. After the database engine loads the postgresql.auto.

3 pg_hba.conf

The pg_hba.conf file connects and authenticates clients and acts as a firewall. The format of the file is TYPE/DATABASE/USER/ADDRESS/METHOD.

Nine summary

This paper from the naked eye visible data directory and its sub-files began to expand, from shallow to deep until the invisible source structure dimension, a detailed introduction to the data directory and its sub-files structure and function. By understanding the file architecture of the data directory, you can get a general overview of the PostgreSQL database, whether it is daily operations, kernel development, or business development.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.