preface
The author took over the development and maintenance of the company’s financial system two years ago. At the beginning of the system handover, the author and the team found that there was a large table with 5000W+ in the system. The tracing code reveals that this table is a table used to store the flow of funds, associated with a number of function points, and is used by a number of downstream systems. Further observation shows that the list continues to grow at the rate of 600W+ per month, that is to say, within half a year, the list will grow to 100 million!
Author’s heart:
(ma)
This amount of data, for mysql database is absolutely unable to continue to maintain, so two months after taking over the system, we started the special work of splitting large tables. (Two months is actually mostly spent getting familiar with the system and digesting the backlog requirements.)
System status before table removal
- The interfaces related to the flow table timeout frequently, and some interfaces are basically unavailable
- The daily new flow is slow, mainly when inserting into the database is very slow
- A single table occupies too much space, and DBA database monitoring often raises alarms
- Table changes cannot be made, and any ALTER operation causes high latency and long table locks for master and slave
The goal of disassembling the watch
- Split large table data into each sub-table, ensure that each sub-table data is around 1000W (experience is that 1000W is not too much pressure for mysql)
- Under the premise of disconnecting tables, the query conditions of different interfaces are optimized to ensure the availability of each external and internal interface. Kill slow mysql queries completely.
The difficulties in analysis
- The data in this table can be said to be the most basic data of the whole financial system, and there are many related functions and downstream systems. This requires a very tight development, testing, and rollout process, and any small mistake can cause big problems.
- There are many scenarios involved. There are a total of 26 scenarios, 32 Mapper methods need to be modified, and the specific methods need to be modified are not counted.
- A large amount of data must be migrated to ensure system stability.
- A large number of users have important functions. When the sub-table function goes online, minimize the system unavailable period and ensure system availability. This requires the team to design a complete and reliable online process, data migration scheme, rollback scheme and downgrade strategy.
- As mentioned above, table splitting will inevitably lead to changes in some interfaces, which in turn will lead to changes in other systems. How to promote the transformation of other systems and how to coordinate the development, testing and launching of multi-party cooperation is another difficulty.
The whole process
Specific details
Sub-table middleware research
Sub-table plug-in: sharding- JDBC is used as a sub-table plug-in. Its advantages are as follows: 1, support a variety of sharding strategies, automatic identification = or in to determine which specific sub-table. 2, lightweight, as a Maven dependency can be introduced, very low intrusion on the business.
In order to improve the query speed, at the beginning of the whole project, the team members considered introducing ES store pipelining to improve the query speed. After two rounds of discussions with the ES maintenance team, we found that the ES services provided by the company did not match our business scenarios (see table). After repeated consideration, we finally gave up the plan of introducing ES and directly queried data from the database, adopting the method of setting one query thread for each table to improve the query efficiency.
Technical solution | read | write | Development costs | disadvantages |
---|---|---|---|---|
sharding-jdbc+es | es | Database + es | 1. Write data to Sharding data source and push data to ES; 2. Develop API to read data from ES for query | Es does not support the special multi-field matching search for funds very well. 2. Es supports the limited number of items returned each time, and cannot support the function of individual large-scale query |
sharding-jdbc | The database | The database | 1. Write data to Sharding data source and original data source 2. Determine whether to read data from Sharding data source or original data source according to query conditions | 1. In capital business scenarios, sharding’s support for paging query is not friendly enough, so it needs to implement paging query logic by itself. 2 |
Selection of sub-table basis
There are many ways to divide the table, there are longitudinal table, there are horizontal table, there are several fixed table storage and then take the model to split the table and so on. Generally speaking, the only way suitable for our specific business sub-table is horizontal sub-table. Because for the special data of capital flow, it is impossible to clean up the data, so longitudinal table and split into fixed tables can not solve the problem of unlimited expansion of single table data. And the horizontal table, you can keep the amount of data in each table constant, to a certain time can be financial data archiving.
The basis of table division is generally split according to a certain or several fields of the table, which is actually the result of data and business analysis. In general, the principles are as follows:
- As far as possible, select the most common fields in the query condition, which can reduce the project of method modification
- You need to consider whether the split data based on a field can be evenly distributed and whether it can meet the requirement of 1000W for a single table
- This field must be a mandatory field and null values are not allowed
A comprehensive analysis of our data and business needs, “transaction time” the basis of the table emerged. First, this field must appear as one of the most important fields in the stream; Second, if the table is split according to the transaction month, each table is about 600W-700W of data; In the end, 70% of queries have “transaction time” as a query condition.
Technical difficulties
- Multiple data source transaction issues
Sharding-jdbc needs to use its own independent data source when it is used, so multi-data source transaction problems will inevitably occur. This is solved by custom annotations, custom section opening transactions, and method stack roll-back or commit. For confidentiality reasons, the code details are not expanded.
- Multiple table paging problems
Table disassembly will certainly cause paging queries to increase in difficulty. The original SQL statement limit is no longer applicable because the amount of data detected in each table is not equal. A new method is needed to obtain paging information easily. Here is a pagination of the idea for your reference (the team common results, I dare not privately possession) :
Considering the complexity of business practice and development, the project team decided to use one thread for each table in case of cross-table query to improve query efficiency. The difficulty of this scheme is the conversion of paging rules. For example, the page passes in offset and pageSize as 8 and 20, respectively. The number of eligible items in each sub-table is 10,10,50 respectively. So we need to convert the total paging conditions into the paging conditions for each of the three sub-tables, as shown
PageSize =20, offset=8,pageSize=2, offset=0,pageSize= 10, offset=0,pageSize=8 The whole calculation process is as follows:
1) Multi-threaded query the number of data meeting the conditions in each sub-table
2) The number of each table is accumulated according to the sequence of sub-tables to form the number line in FIG. 8
3) Determine the table where the first data and the last data reside
Table offset=0, pageSize= total number of entries except the first and last entries
5) Calculate the offset and pageSize of the first data
Calculates the pageSize of the last entry and sets offset to 0
Data Migration scheme
Before the data migration, the team discussed two migration schemes: 1) ask the DBA to migrate the data; 2) Handwritten code migration data, they each have their own advantages and disadvantages:
The migration plan | Solution Description | advantages | disadvantages |
---|---|---|---|
Dba Migration data | 1. Switch the traffic to the master database during data migration. 2 | 1. If you encounter problems during migration, you can contact the DBA for assistance | 1. Access traffic is migrated to the master database. Unknown problems may occur, such as database network access. The primary/secondary synchronization delay is large, affecting the entire cluster. 3. The migration speed is slow because it involves scanning all tables without moving indexes |
Code migration data | 2. Run the select * from table where id >? limit ? The script queries the data and inserts it into the database one by one | 1. The data migration speed is controllable. 2 | Prone to major transaction problems 2. Prone to operation errors 3. Inconsistent migrated historical data 4. The migration takes a long time. In practice, the whole migration process takes up to two weeks |
Considering the time cost and the impact on the online database, the team decided to adopt a combination of the two schemes: the transaction time is three months before the cold data, because the update probability is not large, adopt the code migration, artificial control of the number of migration, a small number of times, ants move; The transaction time of hot data is three months. Since update operation frequently occurs before going online, the write operation is stopped before going online, and then the DBA migrates the data as a whole. In this way, the time cost is amortized to normal times. The system cannot be used when the data is migrated for only about 2 hours before the online. At the same time, except for the last DBA data migration, the amount of data can be manually controlled for each migration, avoiding high latency at the database instance level.
Overall on-line process
To ensure the stability of the new table splitting function and the stability of the large table offline, the team divided the whole project into three stages:
The first stage: the establishment of sub-tables, large table data migration sub-table, online data new table old table double write, all query sub-table
(Verification observation)
The second stage: stop writing old data tables and change other business directly connected databases into external interfaces provided by funds
(Verification observation)
Stage 3: large table offline
conclusion
- Further research should be done on sub-table related middleware. Due to the particularity of project table basis, many functions of Sharding-JDBC cannot be utilized, and its help to simplify query logic is lower than expected. Moreover, the feature of Sharding-JDBC independent data source causes the problem of multi-data source transaction, but increases the workload of development.
- Multithreading needs to carefully analyze the size of the core thread of the thread pool, and analyze whether the simultaneous existence of multithreading pool will cause too many core threads to avoid the machine thread overflow.
- If it is an existing project, when reforming the sub-table, it is necessary to list all kinds of scenes clearly, to refine each scene to each class and method in the program, and to cover all business scenes.
- When migrating historical data, it is necessary to make a data migration plan and deal with data inconsistency. Time cost, data accuracy, impact on online functionality and many other factors should be considered.
- When launching a complex scheme, it is necessary to design a rollback scheme and downgrade measures in advance, which can greatly ensure stability.
A little digression
Why do you want to say a little digression, mainly for the continuation of more than 5 months of the project to express my feelings. In the process of the project, it is inevitable that there will be work intersection with the maintenance team of other systems, and other teams need to cooperate. This is a great test of the programmer’s communication skills, and the best programmers are able to draw the other side to their side through their verbal skills, so that the other side feels that the work is good for them. This can make each other willing to cooperate with your work, to achieve a win-win goal. If programming and learning are hard skills for programmers, then communication skills are soft skills for programmers. Hard skills will get you down, but soft skills will get you online. Therefore, many programmers do not pay attention to the cultivation of communication skills, in fact, is equivalent to lame, after all, it is impossible to do things alone.
In addition, at least for our organization, the overall quality of back-end programmers is actually the highest requirement. Back-end programmers are both business and technical. Need to have a relatively strong business control ability, but also have excellent technical quality. At the same time, the main owner of most of the work is the back-end. Generally, the back-end programmer controls the development rhythm of the front-end, back-end and QA, coordinates each time point and provides risk feedback. This requires back-end programmers to understand both business and technology, as well as some management skills. This is actually a great exercise for people.