Guide language | WeChat pay journal system using Hermes to achieve full-text retrieval function, since access log volume continues to grow. At present, the daily log volume has exceeded trillions, the daily log volume of a single cluster has also exceeded trillions, and the storage scale has reached PB level. This article will introduce the practice of wechat payment log system on Hermes and hope to share with you. Song Xincun, senior operation and maintenance engineer of Tencent Big data.

First, business scale

At present, the maximum daily inflow of wechat payment logs has reached trillions, and the daily inflow of wechat payment logs has reached PB level, and the daily inflow of wechat payment logs is expected to increase further in the Spring Festival and other major holidays.

The Hermes cluster adopted by wechat Pay log service has reached trillions of levels of daily storage in a single cluster, with more than 200 nodes deployed and the total amount of storage in a single cluster reaching PB level. In addition, the number of concurrent search queries per day is about 6000:

Under such a massive log storage scale, the whole wechat Pay log query SLA reached 4, 9, 95% of the time is less than 5s.

Two, the separation of storage and calculation

The Hermes underlying storage is implemented by HDFS. All storage-related policies are provided by professional HDFS capabilities, including:

1. Multi-copy Dr

Log duplicates are stored by default for Dr. Duplicates can be flexibly reduced for historical data to reduce storage costs. Duplicates can be flexibly added for important log data to improve data Dr Capability.

2. Disk fault tolerance

The HDFS automatically migrates copies when a single disk or single machine fails, and the entire fault tolerance process is transparent to the upper computing layer.

3. Classification of hot and cold

Using the heterogeneous storage capability provided by HDFS and the daily partition storage of Hermes, hot and cold data can be easily classified. Data after hot and cold classification is transparent to upper-layer services. Services do not need to pay attention to data storage

4. The EC code

HDFS 3.0 supports EC coding to further reduce storage costs. Currently, it has not been implemented online.

On the one hand, the design of the upper computing layer can be simplified by adopting this architecture. On the other hand, when the computing layer computes the index, it only needs to compute a single copy to realize multi-copy DISASTER recovery, which greatly reduces the CPU and memory resource consumption of the computing layer and doubles the write QPS.

Asynchronous index merge

Hermes uses an LSM-like data writing mode. Data is first written to memory +WAL and then written to HDFS for persistent storage after a certain amount is accumulated. When a node fails, the system rolls back WAL to recover data, ensuring efficient sequential writes.

One of the problems with this efficient writing approach is that as data is constantly written, a large number of small indexes are created, which puts a lot of pressure on queries and HDFS storage.

Hermes itself constantly merges small indexes to reduce the number of index files; In the night peak period, we will also carry out a larger merge granularity of the historical partition data, so as to improve the query efficiency of the whole system as much as possible. Wechat Pay was merged at 2:06 am, avoiding the New Year’s Eve red envelope rush at 10:01 am.

Index and data separation

An important feature of business scenarios such as logs is that they are retrieved in terms of segmentation + field information, and then they pull a whole line of logs for analysis.

For this scenario, traditional column storage often has low efficiency of row storage information acquisition, and index and data mixing will have a serious problem of read and write I/O magnification during index merging.

To this end, Hermes can be configured to store a complete log row, in addition to indexing the log word:

As shown in the figure above, by separating the index from the data, only the inverted index is stored in the index directory, and the corresponding row data of each index directory in the same shard. The resulting data is read in RowData with Offset and RowId for each index directory.

By separating index and data, the number and number of index directories are reduced by 68%, the memory usage is reduced by 70%, the disk usage is reduced by 14%, and the retrieval performance is improved by 80%.

Five, storage hot and cold classification

90% of wechat Pay’s log modules are long-tail modules with very little data. Therefore, it is appropriate to introduce some high-performance SSD devices to speed up the query storage of these small businesses. In order to minimize SSD costs, business data needs to be hot and cold.

Hermes uses the heterogeneous storage capability of HDFS to implement data hot and cold tier. You can flexibly specify the storage type for duplicates by configuring different copy placement policies, and the whole process is transparent to upper-layer services.

The HDFS heterogeneous storage policy is as follows:

HDFS heterogeneous storage practices in Hermes:

The historical partition copy is degraded

The Hermes underlying storage uses multiple COPIES of HDFS for data Dr. Generally, two copies are stored by default. At present, the longest retention period of wechat Pay logs is 30 days, which stores a large amount of data.

As much as possible in order to reduce the storage cost of business, after communication with business consultation, understand the general three days before the log query demand is very low, for the stability of the log can reduce some, so Hermes ops side directly to a copy of the data for routine down three days before operation, thus making the whole storage directly lower the cost of more than 70%, The entire copy-down operation is transparent to both the upper computing layer and the business layer, without the business being aware of it.

7. Export logs in batches

Colleagues of wechat Pay often need to batch export logs matching certain keywords in a specified period of time:

Hermes supports asynchronous batch export of logs to storage media such as HDFS. After a user submits an export request, the system exports a copy of all matched logs to the TDW HDFS. The user can then use the TDW HDFS client or drag the logs to the INTERFACE machine of Hermes.

Export user logs from the TDW HDFS:

Eight, epilogue

Since wechat Pay was introduced into Hermes, the log volume has grown from ten billion yuan at the beginning to one trillion yuan now, posing continuous challenges to the storage capacity, expansion capacity, disaster recovery capacity and resource planning capacity of Hermes itself.

Thanks to its excellent storage architecture, Hermes is able to flexibly flip and move business data in the face of massive business data scale, thus calmly coping with various business challenges.