Database technology is through the study of database structure, storage, design, management and application of the basic theory and implementation methods, and use these theories and methods to achieve the data in the database processing, analysis, transformation and other operations. Computer data processing and database technology as the core of the information management system, research and solve in the process of the computer information processing of large amounts of data organization and storage problems effectively, to reduce data storage redundancy in the database system, realize data sharing, data security and efficiently retrieve data and process the data.
In this article, the technical principle of TcaplusDB, a distributed database developed by Tencent, will be introduced.
Storage principle
A table can be sharded into a maximum of 10K shards based on the HASH table and the route array length (default value: 10K). In the following figure, a TcaplusDB table is divided into five Shard files and distributed to different storage nodes. Each node has one or more pieces of data.
Figure 3.1 TcaplusDB storage technology 3.2 System Capacity Expansion TcaplusDB capacity expansion is performed at the storage layer and access layer respectively. From the architecture diagram in Section 2, it can be seen that the access layer is the Tcap Proxy layer, and the storage layer is the Tcapsvr layer (active and standby nodes). For the access layer, stateless design is adopted, so the capacity can be flexibly expanded and shrunk horizontally without affecting online services. For the storage tier, since the table adopts the fragmentation design, the fragmentation on the original machine must be horizontally migrated to the new machine during capacity expansion to expand the storage space. Take Figure 3.2 as an example. Before Table A was expanded, there was only one Shard 1 with A route array length of 10K. During capacity expansion, the table is divided into two fragments. Route items 0-5K are placed in Shard1 and route items 5001-10K are placed in Shard2. The two shards are stored on the two storage nodes respectively.
Figure 3.2 Storage node capacity expansion The data migration process is shown in Figure 3.3. Data on the original TcaplusDB Salve node is copied to the new TcaplusDB Master node. Data integrity is maintained through binlog synchronization. Data requests from the access layer TCAPoxy are redirected to the new TcaplusDB cluster.
Figure 3.3 Schematic diagram of request redirection after capacity expansion Capacity expansion at the access layer, as shown in Figure 3.4, routes forwarded by four TCAProxies are evenly allocated to five TCAProxies through consistent hash route switching. Message loss will not be caused during route switching.
Figure 3.4 Capacity expansion at the access layer The CAPACITY expansion of the TcaplusDB is based on the disk usage of udSns and QPS (Queries per Second). Capacity expansion is triggered when the capacity usage of a storage node reaches a certain threshold.
TcaplusDB is a distributed NoSQL database produced by Tencent. The storage and scheduling code is completely self-developed. It has features such as cache + ground fusion architecture, PB-level storage, millisecond delay, lossless horizontal expansion and complex data structure. At the same time, it has the characteristics of rich ecology, convenient migration, extremely low operation and maintenance cost and five nine high availability. Customers cover gaming, Internet, government, finance, manufacturing and Internet of Things and other fields.