For mysql, Redis, Kafka, Zookeeper disk caching technology usage analysis
Most of the components is based on the disk storage, but due to the gap between the disk and CPU speed, will use caching to improve performance, the cache is simply a piece of memory area, the first will be read from the disk data in the cache, then the query cache, or modify the direct operation on the data in the cache is flushed to disk at a certain frequency, How you cache, how much you cache, and when you refresh affects the performance of the entire component. After looking at some architectural principles of mysql and other components, we will find that whether it is disk-based mysql database, Kafka message-oriented Zookeeper distributed coordination framework, or memory-based Redis database, they have designed perfect data interaction between memory and disk. Strike a balance between reading data quickly and persisting data. The cache also has space and time read rules. From a space perspective, the data in the area adjacent to the hotspot data will be accessed shortly after, and from a time perspective, the hotspot data will be accessed again after the first access.
Mysql disk cache (with Innodb engine only)
When analyzing what data mysql caches, we can find its root cause, which is the cache pool of mysql’s InnoDB engine.
Innodb, of course, can set up multiple such instances of buffer pool, thus increasing the database of concurrent ability, the size of the buffer pool can be configured, each page in the buffer pool size of 16 KB, manage buffer pool by LRU algorithm, when the LRU list page is modified, because do not agree with the data in disk that produce to the page called dirty pages, The database CHECKPOINT flushes dirty pages back to disk, and the dirty pages are stored in the Flush list. The LRU list manages the availability of pages in the cache pool, while the Flush list flushes dirty pages back to disk. You can query the number of dirty pages by running a command.
The next concern is the cache associated with the disk files
Redo log cache
The redo log cache is unique to the InnoDB engine and corresponds to the reco log file. By default, it is 8MB. Because redo logs are flushed to log files every second, it does not need to be too large.
- The Master Thread flushes the redo log cache to the redo log file every second
- Flush the redo log cache to the redo log file at each transaction commit (controlled by innodb_flush_LOG_at_trx_COMMIT)
- When the redo log cache space is less than 1/2, the redo log cache is flushed to the redo log file
Impossible because of the cache and disk data in real time, in order to prevent data loss, the current transaction databases are generally used the Write ahead log strategy, namely when the transaction is committed to Write the redo log, modify the page again, when after downtime cause data loss, can be done in log data recovery, to ensure the transaction durability requirements. You can configure multiple mirror log groups for high reliability.
Data page index page cache
The Insert/Change Buffer is used to manipulate data. Innodb sets primary keys for each table. Primary keys are unique identifiers for rows. So insert clustered index generally don’t need random access, but also in the table will have multiple nonclustered auxiliary index, when insertion, the storage of data pages or stored in a clustered index to order, for the index page of china-africa gathered auxiliary discrete access the index page updates, so random read to cause a decline in performance, So use Insert buffers to cache secondary indexes and merge secondary index pages on a regular basis.
Binary log cache
Binary logs record all operations that change the mysql database, excluding the operations that do not change the mysql database, such as select and show. Binary logs are used for database recovery, replication of primary and secondary data synchronization, and security audit of information in logs.
Note that when using a transactional table storage engine, all uncommitted binary logs are logged to the cache and written to the binary log file when the transaction commits. Binlog_cache_size is session-based, not global, and has a default size of 32K.
By default, binary logs are not synchronized to disk every time they are written. You need to set the sync_binlog value to adjust this value. The default value is 0, which means that MySQL does not control the refreshing of binlog. This is when performance is at its best.
Undo Log Caching
Undo logs are stored in a shared tablespace. Innodb_purge_batch_size is used to set the number of Undo pages that need to be cleaned up by the purge cycle. The default is 300.
Whenever the cache is used, it must be brushed back to the disk, and the operation of brushed back to the disk is carried out by which threads. Step by step, you can find that the mysql background mainly has the following four threads.
Master Thread: is responsible for asynchronously refreshing data from the cache pool to disk (including page refresh, merge cache insert, undo page reclaim)
IO Thread: is responsible for callback processing of requests. Write, read, insert buffer, log I/O thread.
Purge Thread: The undolog required may not be used after the transaction has been committed to reclaim the undo page
Page Cleaner Thread: Used to refresh dirty pages
The above is mysql related to cache and disk associated data updates, including four types of log and data synchronization.
Redis disk cache
Strictly speaking, Redis is different from other components. Redis supports native use in memory, while storing data to disk can be configured. It is not necessary to persist data, redis is mainly used to cache data, so data persistence should be done by the back-end database. The business scenario should also be to check REDis first, and then search in the database if it does not exist. Over-reliance on redis data persistence may cause inconsistent data returns.
Redis has two persistence mechanisms, the first is snapshot and the second is AOF logging. Snapshots are a full backup, and AOF logs are continuous incremental backups, similar to zooKeeper later. While a snapshot is a binary serialized form of in-memory data, AOP logs record the instruction log text of in-memory data modifications. AOP logs grow over time, so they are overwritten over time. You can set the snapshot frequency. Save * : indicates the snapshot saving frequency. The first parameter indicates the period (in seconds) and the second parameter indicates the number of write operations.
AOP log
After receiving client instructions, Redis will store the instructions in AOF log after verification, and then execute the instructions to ensure that after downtime, it can also recover to the state before downtime through the instruction replay of AOP log. When writing to an AOP log, the content is actually written to an in-memory cache that the kernel allocates for file descriptors, and the kernel asynchronously flushers the dirty data back to disk. Linux provides the fsync command to force files to be flushed from the cache to disk. However, if Redis calls fsync in real time for log synchronization, this disk I/O operation will seriously affect Redis performance. In general, Redis executes fsync every 1s. The interval can be configured, or it can never be executed and the operating system can schedule it, or it can be executed once per instruction.
Kafka disk cache
The heavy use of page caching in Kafka is an important factor in Kafka’s high throughput. Anyone who has used Java knows two things:
-
The memory overhead of objects is very high, often several times or more of the real data size, and the space utilization is low.
-
Java garbage collection becomes slower and slower as more data is stored in the heap.
For these reasons, using a file system and relying on a page cache is clearly better than maintaining an in-process cache or any other structure. At the very least, we can save a portion of the in-process cache consumption and save more space by using compact bytecode instead of objects. This way, we can use 28GB to 30GB of memory on 32GB machines without worrying about GC performance issues. Furthermore, even if the Kafka service is restarted, the page cache remains active, whereas the in-process cache needs to be rebuilt. This also greatly simplifies the code logic, as maintaining consistency between page caches and files is left to the operating system, which is more secure and efficient than in-process maintenance.
To put it another way, Kafka is a database. The producer is the insert user, the consumer is the select user, and the only interaction with the disk cache is borker. Borker produces data directly into the cache. When consuming data, the zero-copy technology is used to transfer the data in the cache to the socket. When there is no required data in the cache, the disk is loaded. Most operations in Kafka are sequential reads and writes, using file appending to write messages. Performance is high even with disks.
Index files are used to search data. Kafka index files are constructed in a sparse index way, which is divided into offset index and timestamp index. Sparse index method can reduce the memory usage of index.
Kafka only writes messages to the system cache and does not guarantee when dirty data will be flushed to disk. I n t e r v a l m e s s s, l o g. F n t e r v a l m e s s, l o g. The reliability of Kafka messages relies on the multi-copy mechanism, rather than the synchronous flush behavior which severely affects performance.
Zookeeper Disk cache
Zookeeper maintains a node data model in memory similar to a tree file system, which contains the contents of the entire tree, all node paths, and node data. The code uses DataTree data structure to store these information, and the underlying structure is a ConcurrentHashMap key-value pair structure. Since there is data in memory, it must have corresponding persistence on disk, similar to Redis. Zookeeper also includes transaction log and snapshot data.
The transaction log
The default directory is dataDir, named after the ZXID of the first transaction record in the log. Each transaction log file is 64MB, because ZooKeeper preallocates disk space for transaction log files. ZooKeeper writes every transaction on the client to the transaction log file. Therefore, the write performance of transaction logs directly determines the response of the ZooKeeper server to transaction requests. Continuous write operations on files trigger the underlying disk I/OS to create new disk blocks for files. To avoid disk Seek frequency and improve disk I/O efficiency, disk space is allocated in advance. When a transaction operation is written to the cache of a file flow, the cache data needs to be flushed to the disk forcibly. You can use the forceSync parameter to configure this parameter. ForceSync =yes Indicates that the write operation is synchronized to the cache and flushed to the disk when a transaction is committed.
The zooKeeper update process is as follows: The transaction log is written first, then the memory is written to the disk periodically (the memory is refreshed to the snapshot file). Transaction logs have a significant impact on write request performance. Snapshot files and transaction log files are mounted on different disks to ensure that the disk where dataLogDir resides performs well and has no competitors.