Interviewer: Why don’t you talk about how you tune MySQL?
Candidate: Wow, that’s a big proposition… In my opinion, for developers, the focus of MySQL tuning is generally on “development specifications”, “database indexing”, or solving slow queries online.
Candidate: As for the internal parameter tuning of MySQL, it is done by a professional DBA.
Interviewer: By all means, you mean that you don’t know how to tune MySQL parameters
Candidate: Grass, found.
Interviewer: Why don’t you talk about the specifications and indexes you work on?
Candidate: Well, first of all, we create database tables in a production environment on a workorder system (which naturally requires DBA approval). Warning (:
Candidate: Theoretically, if the table has a certain amount of data, then you should create the corresponding index. Query data from the database need to pay attention to the place or quite a lot of them are usually accumulated. Such as:
Candidates: 1. Can you use “overwrite index” to reduce the time consumed by “return table”? That means that when we select, we have to specify the corresponding column, not select *
Candidates: 2. Consider whether to build a “joint index”. If you build a “joint index”, try to put the most differentiated index on the left, and consider the “left-most matching principle”.
Candidates: 3. Performing functional operations or expression calculation on an index will invalidate the index
Candidates: 4. Optimize hyperpagination scenarios with subqueries. For example, limit offset, n in MySQL is used to obtain offset + n records, and then return n records. The use of sub-query is to find n, through the ID retrieval of the corresponding record, improve the efficiency of query.
Interviewer: HMM…
Candidates: 5. Use the explain command to view the SQL execution plan to see if the SQL you write is indexed and what index is indexed. Show profile to see how SQL consumes system resources (although this is rarely used)
Candidates: 6. After a transaction is started, only operate on the database within the transaction as much as possible, and consciously reduce the lock holding time (for example, if you need to insert & modify data within the transaction, you can insert before modifying data. Because the modification is an update operation, a row lock is added. If updated first, it may result in multiple transaction requests waiting for the row lock to be released.
Interviewer: Well, you mentioned transactions and the isolation level of transactions. What isolation level do you use online?
Candidate: Well, we use Read Commit, MySQL uses Repeatable Read by default. The isolation level chosen depends on the application scenario, because the lower the isolation level, the higher the transaction concurrency performance.
Candidate :(generally Internet companies choose Read Commit as the primary isolation level)
Candidates: Isolation levels like Repeatable Read are likely to cause deadlocks due to “gap locks”.
Candidate: But as you probably already know, MySQL’s default isolation level is Repeatable Read. This is largely due to the fact that MySQL’s binlog does not have row mode in the beginning, and at the Read COMMIT level there is a “master/slave data inconsistency” problem
Candidate: binlog recorded the database table structure and table data “change”, such as update/delete/insert/truncate/create. In MySQL, master/slave synchronization is implemented using binlog (:
Candidate: Given this historical reason, MySQL sets the default isolation level to Repeatable Read
Interviewer: Well, by the way, have you ever had a similar problem: even with the right index, online queries are still slow?
Candidate: Yes, of course
Interviewer: How did you do that?
Candidate: If the index is correct but the query is still slow, the table is generally too large.
Candidates: First, consider whether you can “delete” old data. In our case, we all synchronize data to Hive, which means that a copy has been stored offline.
Candidate: So if “old data” is no longer query business, then the easiest way is definitely “delete” part of the data. When the amount of data goes down, it naturally speeds up retrieval…
Interviewer: Well, they usually don’t
Candidate: Yes, very few businesses can delete data (:
Candidate: Then consider another case where you can go directly to a layer of caching (Redis) before querying.
Candidate: the cache, and to see whether the business can stand reads “real-time” than a real data (after all Redis and MySQL data consistency need guarantee), if the query condition is relatively complex and changeable (relating to the various group by and sum), walk the cache is not a good way to maintain not convenient…
Candidate: If there are any “string” search scenarios that make the query inefficient, then consider importing the table data to the Elasticsearch class search engine.
Select * from mysql-select * from mysql-select * from mysql-select * from mysql-select * from mysql-select * from mysql-select * from mysql-select * from mysql-select * from mysql-select * from mysql-select * from mysql-select * from mysql-select * from mysql-select *
Candidate: If not, then consider whether to create a corresponding aggregate table based on the dimension of the query criteria. The online request queries the aggregate table instead of the original table.
Candidates: For example, after the user places an order, there is an order list, and the order list is too large. But in the product side (front desk) revealed the query function is displayed in the dimension of “day”, which can aggregate the daily data of each user, in the aggregation table is a user a day after only one summary of the data.
Candidate: query the aggregated table, that speed must be great (aggregated table data volume must be much less than the original table)
Candidate: The idea is roughly “space for time”, the same data in another place to store a copy, improve query efficiency
Interviewer: I also want to ask, besides reading, the writing performance also has a bottleneck, how to do?
Candidate: When you say that, I’m not sleepy anymore.
Candidate: If there is a bottleneck in MySQL reading and writing, take a look at the current architecture of MySQL.
Candidate: If it is a single library, would you consider upgrading to a master-slave architecture with read/write separation?
Candidates: The primary library receives write requests and the secondary library receives read requests. The data of the slave library is then updated by the binlog sent by the master library to achieve the consistency between the master and slave data (in common scenarios, the final consistency between the master and slave data is guaranteed by asynchronism).
Interviewer: HMM…
Candidate: If there is still a bottleneck in reading and writing in a master-slave architecture, then consider whether to separate libraries and tables
Candidate: At least under the structure of my former company, businesses were compartmentalized. Traffic has traffic databases, advertising has advertising databases, and goods have goods databases. So, what I say here is the meaning of a table in the original library and then split a table.
Candidate: For example, now I have a business order sheet in the advertising library, let’s say the business order sheet has 100 million data, and now I want to separate the business order sheet into different tables
Candidate: That would split this table into multiple AD libraries and multiple tables (:
Candidate: the most obvious benefit of splitting tables into databases is to evenly distribute requests (I have 100 million data per table in a single database, so if I split 8 databases, then 1200+W data per database, and 8 tables per database, then 150W data per table).
Interviewer: What do you use as the inventory key?
Candidate: In our experience, it is generally based on the userId (because it is more likely to be based on the user dimension), if it is based on other dimensions, it is still the same idea as above (space for time).
Interviewer: How is the ID generated at the end of the inventory table?
Candidate: This is where distributed ID generation comes in, and there are many ideas. Some increment with MySQL, some increment with Redis, and some increment based on “Snowflake algorithm”. Depending on the company’s technology stack, Redis and “Snowflake algorithm” implementations are often used.
Candidate: As for why increment is emphasized (again, it has to do with the order of the index, you should remember from the previous discussion)
Interviewer: Well, if I have to divide and divide, what is the migration process like
Candidates: We usually use a “double write” approach to migration, and the general steps are:
Write a copy of each incremental message to the new table and to the old table
Migrate data from the old table to the new library
3. Sooner or later, the new table will catch up with the old table (data is synchronized on a node).
Candidates: four, check whether the data of the new table and the old table is normal (mainly to see whether it can be matched)
Candidates: five, open double read (part of the flow to the new table, part of the flow to the old table), equivalent to the gray line process
Candidates: vi. Cut all read traffic to the new table and stop writing to the old table
Candidates: 7. Prepare the rollback mechanism in advance. If the switchover fails temporarily, normal services can be restored and relevant programs with data repair are available.
Interviewer: HMM… Let’s call it a day
This paper concludes:
- When a database table has a certain amount of data, it needs to have corresponding indexes
- When you find a slow query, check to see if the index is correct, if you can optimize the query speed with a better index, and if you have a good posture with the index
- When the index can not solve the slow query, generally due to the large amount of data in the business table, use the idea of space for time
- When the read and write performance is at a bottleneck, you need to upgrade the database architecture first. If not, you need to consider the database and tables
- Although separate database and separate table can solve the read and write bottleneck, but at the same time will bring a variety of problems, need to research the solution and step on the pit in advance
Online is not the place to show off your skills. Do it in a simple way, not in a complicated way
Welcome to follow my wechat official account [Java3y] to talk about Java interview, on line interview series continue to update!
Online Interviewer – Mobile seriesTwo continuous updates a week!
Line – to – line interviewers – computer – end seriesTwo continuous updates a week!
Original is not easy!! Three times!!