- Designing Data Intensive Applications
- Easy to understand data structure and algorithm series (NoSQL storage -LSM tree)
- MySQL indexing principle and slow query optimization
- PostgreSQL 9.5+ Efficient partition table implementation – pg_pathman
- MySQL > MySQL
- High performance MySQL
- 30,000 words, 91 MySQL interview questions and answers
Database execution logic
MySQL has two parts:
- Server layer: do the MySQL function level, most of the core of the MySQL service function in this layer, including query parsing, analysis, optimization, cache, and all of the built-in function (date, time, mathematics, and encryption, etc.), all across the storage engine functions are realized in this layer (stored procedures, triggers, views, etc.)
- Engine layer: Takes care of the details related to storage.
Redo log
For example, if a page is split due to excessive page insertion, you need to write two split pages and overwrite their parent page to update references to the two child pages. This is a dangerous operation, because if the database crashes after only a few pages are written, it will eventually result in a broken index (for example, there might be an orphan page that is not a child of any parent). To make databases crash resilient, B-tree implementations typically come with an additional disk data structure: a WRITE-Ahead-log (WAL) (also known as a redo log). This is an app-only file, and each B-tree modification can be applied to the page of the tree itself. This log is used to bring the B-tree back to a consistent state when the database recovers after a crash
InnoDB engine specific log, to achieve crash-safe function
-
WAL technology: Write-Ahead Logging. The key point of WAL technology is to Write logs first and then disk.
-
InnoDB writes a new log to the Redo log and updates the memory until the update is complete. Meanwhile, the InnoDB engine updates this operation record to disk when appropriate, and this update is usually done when the system is idle
Binlog (Archive log)
Server layer’s own logs. Redo log differs from binlog
redo log | binlog |
---|---|
InnoDB engine is unique | Server layer implementation, all engines can be used |
It’s a physical log of “What was changed on a data page?” | The logical log records the original logic of the statement, such as “add 1 to the c field on the line id=2”. |
I’m going to run out of space | Appending: files written to a certain size are switched to the next one without overwriting the previous log |
The update process
update T set c = c+1 where id = 2;
Copy the code
Update internal process:
- The executor first searches the engine for the row id=2, where id is the primary key, and the engine directly searches the tree for that row.
- If the data page on which the line id=2 is located is already in memory, it is returned directly to the actuator
- Otherwise, it reads into memory from disk and returns
- The executor takes the row given by the engine, increses it to +1 to get a new row, and then calls the engine interface to write the new data
- The engine updates the data to memory, and the redo log is recorded. The redo log is prepared. The executor is then told that the execution is complete and the transaction can be committed at any time
- The executor generates a binlog of this operation and writes the binlog to disk
- The executor calls the commit transaction interface of the engine, and the engine changes the redo log to commit
B+ Tree
If you want to update the value of an existing key in the B-tree, search the leaf page that contains the key, change the value in the page, and write the page back to the disk (any references to the page remain valid). If you want to add a new key, you need to find the page whose scope contains the new key and add it to that page. If there is not enough free space in the page for the new key, split it into two half-full pages and update the parent page to explain the new partition of the key range.
Insert page
The algorithm ensures that the tree is balanced: A B tree with n keys always has depth of O(logn)O(log n)O(logn). Most databases fit into a three – or four-level B-tree, so you don’t need to track multiple page references to find the page you’re looking for. (FOUR-level trees with 4KB pages branching factor 500 can store up to 256TB.
The index
Indexes are additional structures derived from master data. Many databases allow indexes to be added and dropped without affecting the content of the data, only the performance of the query. Maintaining additional structures incurs overhead, especially at write time. Write performance is hard to beat by simply appending a file, which is the simplest write operation. Any type of index typically slows down writes because the index needs to be updated each time data is written.
- Advantages of multi-fork trees:
In order for a query to read as little disk as possible, the query process must access as few blocks of data as possible. Then, instead of using binary trees, we should use “n-fork” trees. Here, the “N” in the “n-cross” tree depends on the size of the data block. Take an integer field index of InnoDB for example, the N is about 1200. When this tree is 4, we can store 1,200 to the third, which is 1.7 billion. Considering that the root of the data block is always in memory, an index of an integer field on a billion row table requires only three disk visits at most to find a value. In fact, the second level of the tree also has a high probability of being in memory, so the average number of disk accesses is even smaller.
The index type
Example:
mysql> create table T(
id int primary key,
value int not null,
name varchar(16),
index (value))engine=InnoDB;
Copy the code
id | value |
---|---|
1 | 100 |
2 | 200 |
3 | 300 |
4 | 400 |
5 | 500 |
- Primary key index: leaf nodes store entire rows of data.
- Non-primary key indexes: leaf nodes store primary key values. To query, go to the non-primary key index tree first to obtain the primary key value, and then go to the primary key index tree again (also called back table).
-
Overwrite index: the execution of a query can be retrieved only from the index, not from the table
-
Normal index
- Search process: After the first record that meets the condition is found, the next record is searched until the first record that does not meet the condition is found
- The update process, based on whether the target page is in memory
- In memory: finds the update location, inserts the value, and completes the statement
- Not in memory: the update is recorded in the change buffer and the statement execution ends
-
The only index
- Search process: Since the index defines uniqueness, the search is stopped after the first record that satisfies the condition is found
- The update process, based on whether the target page is in memory
- In memory: find the update location, determine that there is no conflict, insert this value, statement execution end;
- Not in memory: the data page needs to be read into memory, determine that there is no conflict, insert this value, the execution of the statement ends;
-
Joint index
- The left-most prefix principle: Not only the full definition of the index, but also the left-most prefix can be used to speed up the retrieval
- The first principle is that if you can maintain one less index by adjusting the order, that order is often the preferred one
- Index push-down: During index traversal, you can judge the fields in the index first and filter out the records that do not meet the conditions to reduce the number of times to return to the table
No index push-down: the table is returned one by one from the first qualifying position. Find the rows on the primary key index and compare the field values.
Index push down:
Update process:
- When a data page needs to be updated, it is updated directly if the data page is in memory
- If the data page is not already in memory, InnoDB will cache the update operations in the Change buffer without affecting data consistency, so that the data page does not need to be read from disk. The next time a query needs to access the data page, the data page is read into memory and the change Buffer operations related to the page are performed. In this way, the correctness of the data logic can be guaranteed.
OLTP VS OLAP
These OLTP systems are typically highly available and process transactions with low latency, as these systems are often critical to business operations. So database administrators pay close attention to their OLTP databases and are often reluctant to let business analysts run AD hoc analysis queries on OLTP databases because these queries are usually expensive, scan most data sets, and compromise the performance of simultaneous transactions.
A data warehouse, by contrast, is a separate database where analysts can query what they have in mind without affecting OLTP operations. The data warehouse contains copies of read-only data in all of the company’s various OLTP systems. Data is extracted from the OLTP database (using periodic data dumps or a continuous update stream), converted into a schema suitable for analysis, cleaned up, and loaded into the data warehouse. The process of putting data into a warehouse is called extract-transform-load (ETL)
Column-oriented storage
The column compression
coding
copy
Synchronous and asynchronous
RedoLog BinLog
Problems reading from the library
Monotonic reads are a guarantee that this anomaly will not occur. This is a guarantee weaker than strong consistency, but stronger than ultimate consistency. When reading data, you might see an old value; Monotone reads simply mean that if a user does multiple reads sequentially, they will not see time slip back, that is, if newer data was read earlier, subsequent reads will not get older data.
The master replicate more
partition
Hash partitioning
Rebalance
Three types of routing
ACID
Dirty read