100 million class large table sub-table actual combat summary (ten thousand words dry goods, actual combat review)

Wechat search [A Pill notes], pay attention to Java/MySQL/ middleware series of original combat notes, full of dry goods.

There are a lot of articles on the Internet, but most of the content is scattered, mainly to explain knowledge points, there is no complete description of a large table segmentation, new architecture design, online complete process.

Therefore, BASED on a large sub-database sub-list project I did last year, I would like to review the complete sub-database sub-list from the architecture design to the launch of the actual summary.

1. Introduction

Why do we need to do sub-database sub-table. I believe that you are somewhat familiar with this.

The storage and access of massive data has become the bottleneck problem of MySQL database. The growing business data undoubtedly causes a considerable load on MySQL database, and puts forward high requirements for the stability and scalability of the system.

Moreover, the resources of a single server (CPU, disk, memory, etc.) are always limited, and the data volume and data processing capacity of the database will eventually encounter bottlenecks.

At the moment there are generally two options.

One is to replace storage without MySQL, such as HBase, polarDB, and TiDB distributed storage.

If, for any reason, you still want to use MySQL, you will usually use the second method, which is to separate libraries and tables.

At the beginning of the article, said that there are many articles on the Internet sub-database sub-table, to explain more knowledge points, therefore, this paper will not be too much about the paradigm of sub-database sub-table scheme.

Instead, the focus is on sorting out the entire process from architecture to launch, along with considerations and best practices. Include:

Business restructuring
Technical architecture Design
Retrofit and go online
Stability guarantee
The project management

In particular, the best practices at each stage are lessons learned from the condensation of blood and tears.

2. Phase 1: Service Reconstruction (Optional)

For reasonably divided microservices, you only need to pay attention to changes in storage architecture or service transformation in individual applications. Service reconstruction is not required. Therefore, this stage is optional.

The first difficulty of this project is business reconstruction.

The two large tables A and B involved in this split project, with nearly 80 million data in A single table, were left over from the era of single application. From the very beginning, there was no good dome-driven /MSA architecture design, and the logic divergence was very serious. Up to now, it has involved the direct read and write of 50+ online services and 20+ offline businesses.

Therefore, how to ensure the thoroughness and comprehensiveness of business transformation is a top priority, and there can be no omissions.

In addition, table A and table B have 20 or 30 fields respectively, and there is A one-to-one correspondence between the primary keys of the two tables. Therefore, in the sub-table project of this sub-database, reconstruction and fusion of the two tables are needed to remove the redundant/useless fields.

2.1 Query Statistics

Online services are queried by the distributed link tracing system. The query condition is based on the table name, and then aggregated by service dimension. All related services are found and related teams and services are recorded in a document.

Special attention should be paid to the fact that many tables are not only used by online applications, but also by many offline algorithms and data analysis businesses. It is necessary to sort out the tables and do well in offline cross-team communication and research so as not to affect normal data analysis after switching.

2.2 Query Splitting and Migration

Create a JAR and work with the service owner to migrate the relevant queries from the service to this JAR (called projected) based on the statistics of 2.1 (version 1.0.0-snapshot here).

Then change all xxxmapper.xxxmethod () in the original service to projectdb.xxxmethod ().

This has two advantages:

This facilitates subsequent query split analysis.
In this way, the query in the JAR package can be directly replaced with RPC call of the modified CMC service. After the jar package is upgraded, the service side can quickly change from SQL call to RPC query.

This step took several months of practice, so it is important to sort out each service for comprehensive migration, and not to omit it, otherwise it may lead to incomplete split analysis and omission of relevant fields.

The migration of query is mainly due to the fact that the split project involves too many services and needs to be collected into a JAR package, which is more convenient for later transformation. This step is not necessary if only one or two services are involved in the actual database and table project.

2.3 Split analysis of joint query

According to the query in the JAR package collected in 2.2, the query will be classified and judged according to the actual situation, and some problems left over from history and abandoned fields will be sorted out.

Here are some things to think about.

1) Which queries cannot be split? For example, paging (modify as much as you can, but you can’t change anything but redundant columns)

2) Which queries can be split by business join?

3) Which tables/fields can be fused?

4) Which fields need redundancy?

5) Which fields can be discarded directly?

6) Identify key sub-table keys based on specific business scenarios and overall SQL statistics. Other queries go to the search platform.

After thinking, get a general idea and scheme of inquiry transformation.

At the same time, the project needs to merge two tables into one table, discarding redundant fields and invalid fields.

2.4 New table design

This step is based on the split analysis of the query in 2.3, and the results of the fusion, redundancy and abandoned fields of the old table are obtained, and the fields of the new table are designed.

After the new table design structure is produced, it must be sent to all relevant business parties for review and ensure that all business parties approve the design of the table. An offline review can be conducted if necessary.

If some fields are discarded during a new table, all business parties must be notified for confirmation.

For the design of the new table, in addition to field combing, but also according to the specific query, redesign, optimize the index.

2.5 First Upgrade

After the design of the new table is complete, the SQL query in the JAR package is modified first, and all the old fields are updated to the fields of the new table.

Here is the 2.0.0-snapshot version.

Then all services are upgraded to the JAR package version to ensure that these deprecated fields are indeed no longer used and that the new table structure fields can fully cover the old business scenario.

In particular, because there are many services involved, you can distinguish non-core services from core services and put them online in batches to avoid serious faults or large-scale rollback.

2.6 Best Practices

2.6.1 Try not to change the field names of the original table

When we did the new table fusion, we started by simply merging the tables of table A and table B, so many fields with the same field name were renamed.

Later in the field streamlining process, many duplicate fields were removed, but the renamed fields were not changed back.

In the later online process, the service inevitably needs to reconstruct the field name.

Therefore, when designing a new table, do not change the field names of the original table unless you absolutely have to!

2.6.2 The index of a new table needs careful consideration

The index of the new table cannot be simply copied from the old table, but needs to be split and analyzed according to the query, and then redesigned.

Especially after some fields are fused, it may be possible to merge some indexes, or design some indexes with higher performance.

2.7 Summary of this chapter

So far, the first stage of sub – database sub – table comes to an end. Depending on the business, this phase can take months or even half a year to complete if it is a historically heavy business.

The quality of completion in this phase is very important, otherwise it may result in the need to rebuild the table structure and rebuild the full data later in the project.

Again, for services that are reasonably divided into microservices, the behavior of dividing databases and tables generally only needs to focus on the change of storage architecture or service transformation in individual applications, and generally does not need to focus on the “service reconstruction” phase.

3. Stage 2: Storage Architecture design (core)

The design of storage architecture is the most important part of any project with separate database and tables.

3.1 Overall Architecture

According to the results of the first stage of sorting out the query, we summed up such a query law.

More than 80% of the queries are queried through or with field PK1, field PK2 and field PK3, among which PK1 and PK2 have a one-to-one correspondence relationship due to historical reasons
20% of the queries are all sorts of wacky, including fuzzy queries, other field queries and so on

Therefore, we designed the following overall architecture, introducing database middleware, data synchronization tools, search engine (Ali Cloud OpenSearch /ES), etc.

The rest of the discussion revolves around this framework.

3.1.1 Mysql table storage

The dimensions of Mysql sub-tables are determined based on the results of query split analysis.

We found that PK1 \ PK2 \ PK3 can cover more than 80% of the main queries. Let these queries go directly to the mysql database according to the table key.

In principle, the full data of a sub-table can be maintained at most, because too much full data will cause storage waste, extra overhead of data synchronization, more instability, and difficulty in expansion.

However, because the pk1 and PK3 query statements in this project have high requirements on real-time performance, two full data of PK1 and PK3 as sub-table keys are maintained.

And pk2 and PK1 due to historical reasons, there is a one-to-one correspondence relationship, you can only keep a mapping table, only store PK1 and PK2 two fields.

3.1.2 Search platform Index Storage

The search platform index can cover the remaining 20% of scattered queries.

These queries are often not based on sub-table keys or have the requirements of fuzzy queries.

Generally, the search platform does not store the full amount of data (especially some large VARCHAR fields), but only stores the primary key and index fields required by the query. After the search results are obtained, the user can go to the mysql storage to get the required records according to the primary key.

Of course, judging from the results of late practice, there are still some trade-offs to be made:

1) Some non-index fields, if not very large, can also be redundant in, similar to overwrite the index, avoid more SQL query;

2) If the table structure is relatively simple and the fields are not large, we can even consider full storage to improve the query performance and reduce the pressure of mysql database.

In particular, there must be a delay in synchronization between search engines and databases. Therefore, for the statements that are queried according to the sub-table ID, try to ensure that the database is queried directly, so as not to bring hidden problems of consistency.

3.1.3 Data Synchronization

Generally, new tables and old tables can be directly processed by data synchronization or double-write. The two methods have their own advantages and disadvantages.

Generally choose a way according to the specific situation on the line.

The synchronization relationship of this project is described in the overall storage architecture, which consists of four parts:

1) Full master table synchronization from old table to new table

In order to reduce code intrusion and facilitate expansion, data synchronization was adopted at the beginning. In addition, because there are too many services, some services that are not counted are not modified in time. Therefore, data synchronization can avoid data loss caused by these situations.

However, in the online process, many new records cannot be read when the delay exists, which has a serious impact on specific service scenarios. (Please refer to 4.5.1 for specific reasons)

Therefore, in order to meet the real-time requirements of the application, on the basis of data synchronization, we changed the form of double write in version 3.0.0-Snapshot.

2) New table full primary table to full secondary table synchronization

3) The new table is fully synchronized from the master table to the mapping table

4) The synchronization of the new table to the search engine data source

2), 3) and 4) are all data synchronization from the main table of the new table to other data sources. Because there is no requirement for strong real-time performance, data synchronization is adopted in order to facilitate expansion without more multi-write operations.

3.2 Capacity Evaluation

Before applying for mysql storage and searching index resources, you need to evaluate the storage capacity and performance indicators.

The specific online traffic assessment can be viewed through the QPS monitoring system, and the storage capacity can be simply considered as the sum of the storage capacity of each online table.

However, during the full synchronization process, we found that the actual capacity required was greater than the estimated demand, as described in 3.4.6.

Specific performance pressure measurement process will not be described.

3.3 Data Verification

As can be seen from the above, there are a lot of business transformation in this project, which belongs to heterogeneous migration.

From some of the past sub-database sub-table projects, most of them are isomorphic/peer split, so there is not a lot of complex logic, so the verification of data migration is often neglected.

In the case of full peer migration, there are generally fewer problems.

However, similar to this more transformation of the heterogeneous migration, calibration is absolutely a top priority!!

Therefore, you must verify data synchronization results to ensure correct service logic transformation and data synchronization consistency. This is very, very important.

In this project, there are a lot of business logic optimization and field changes, so we made a separate verification service to verify the full amount and increment of data.

During the process, we found many inconsistent problems in data synchronization and business logic in advance, which provided the most important premise guarantee for the smooth launch of this project!!

3.4 Best Practices

3.4.1 Traffic amplification problem caused by database and table

There is an important issue to be concerned about when doing capacity estimates. Is the query table to bring traffic magnification.

There are two reasons for this traffic amplification:

Secondary query of index table. For example, according to PK2 query, need to query PK1 through PK2, and then return the result according to PK1 query.
Batch query of in. If a select… in… According to the sub-table key, the database middleware will split the query into the corresponding physical sub-table, which is equivalent to the original one query, enlarged into multiple queries. (Of course, the database will treat ids in the same table as a batch query, which is an unstable merge.)

Therefore, we need to pay attention to:

At the business level, the number of IN queries should be limited to avoid excessive traffic amplification.
In capacity evaluation, such enlargement factors should be considered and appropriate redundancy should be made. In addition, it will be mentioned later that service transformation should be carried out in batches to ensure timely capacity expansion.
Points 64, 128 or 256 tables have a reasonable estimate, the more open, the more theoretically enlarged, so do not unnecessarily divided too many tables, according to the size of the business to do the appropriate estimate;
For the query of the mapping table, because there are obvious hot and cold data, we add a layer of cache in the middle to reduce the pressure of the database

3.4.2 Sub-table key Change scheme

In this project, there is a business situation will change the field PK3, but PK3 as a sub-table key, in the database middleware can not be modified, therefore, can only be modified in the middle of the update logic of PK3, using the way of first delete, then add.

Note here the transactional atomicity of delete and add operations. Of course, you can also use logs for alarm and calibration.

3.4.3 Data Synchronization Consistency Problems

As we all know, a key point in data synchronization is the sequential nature of the data. If the order of the received data and the generated data cannot be guaranteed to be exactly the same, it is possible to overwrite the data due to the out-of-order (message) data, resulting in inconsistent problems.

The underlying message queue used by our own data synchronization tool is Kakfa, and Kafka can only store messages in local order (specifically, the order of each partition). We can route messages with the same primary key to the same partition so that consistency is generally guaranteed. However, if there is a one-to-many relationship, there is no guarantee that each row of changes is in order, as shown in the following example.

Then you need to check the data source to get the latest data to ensure consistency.

However, counter-investigation is not a “silver bullet”, we need to consider two issues.

1) If the message change comes from the read/write instance, and the database is checked against the read/write instance, there will be data inconsistency caused by the read/write instance delay. Therefore, you need to ensure that the message change source is the same as the instance of the backlookup database.

2) Backcheck will bring additional performance overhead to the database, which needs to be carefully evaluated at full time.

3.4.4 Real-time data

Pay attention to the following aspects of the delay, and evaluate and measure it based on the actual business situation.

1) Second-level delay of data synchronization platform

2) If both message subscription and backlookup databases fall on read-only instances, there will be a master-slave database synchronization delay in addition to the second latency of the data synchronization platform mentioned above

3) Second latency from wide tables to search platforms

Only the solution that can meet the business scenario is the appropriate solution.

3.4.5 Storage Capacity Optimization after Table Division

During data synchronization, inserts are not strictly incremental for a single table. Therefore, many “memory holes” are generated. As a result, the total storage capacity after data synchronization is much larger than the estimated capacity.

Therefore, when a new library is requested, the storage capacity is requested 50% more.

For specific reasons, please refer to my article. Why does the total storage size of MySQL become larger after the MySQL database is divided into tables?

3.5 Summary of this chapter

So far, the second stage of sub – database sub – table comes to an end.

There were a lot of potholes in this stage.

One is to design highly available and easily extensible storage architectures. During the progress of the project, many modifications and discussions have been made, including the number of redundant mysql data, index design of search platform, traffic amplification, sub-table key modification and other issues.

On the other hand, “data synchronization” itself is a very complex operation. As mentioned in best practices in this chapter, real-time performance, consistency, one-to-many and other issues need to be paid close attention to.

Therefore, it is more dependent on data verification to verify that the final business logic is correct and data synchronization is correct!

After completing this phase, you can officially enter the service switchover phase. It is important to note that data validation will still play a key role in the next phase.

4. Stage 3: Transformation and launch (cautious)

After the first two phases are complete, the service switchover process starts. The main steps are as follows:

1) Single read and double write mode is adopted in the middle platform service

2) Open data synchronization from old table to new table

3) Upgrade the projectDB version that all services depend on, and put RPC online. If there is any problem, the rollback of the degraded version can be done (after the online is successful, read the new library only, and write the old and new library twice)

4) Check the monitoring to ensure that no service other than the central service accesses the old database and old tables

5) Stop data synchronization

6) Delete the old table

4.1 Query and Modification

How to verify our first two stages of design is reasonable? The ability to fully cover the changes to the query is a prerequisite.

When the new table is designed, you can use the new table as a standard to modify the old query.

In this case, the old SQL needs to be transformed into a new mid-platform service.

1) Read query transformation

Possible queries involve the following aspects:

A) According to the query conditions, the inner join of PK1 and PK2 needs to be changed to the new table table name corresponding to the sub-table key

B) Processing of discarded fields in part OF SQL

C) Change non-sub-table key query to search platform query, pay attention to ensure semantic consistency

D) Pay attention to write single test to avoid low-level errors, mainly at the DAO level.

Only if the new table structure and storage architecture can fully adapt to the query transformation can the previous design be considered safe for the time being.

Of course, there is a prerequisite here, that is, the relevant queries have all been collected, there is no omission.

Write query transformation

In addition to the related field changes, more importantly, you need to modify the old table, the new table to double write mode.

This project may involve the writing logic of specific business. As this project is particularly complex, it is necessary to fully communicate with the business side during the transformation process to ensure that the writing logic is correct.

You can add a configuration switch in the double write, easy to switch. If there are problems with the new library write in the double write, you can quickly shut it down.

At the same time, the data synchronization from the old library to the new library is not closed during the double write process.

Why is that? Mainly due to the particularity of our project. Since we are involved in dozens of services, we have to launch in batches to mitigate the risk. Therefore, there are troublesome intermediate states in which some services are old logic and some services are new logic. Data correctness of intermediate states must be ensured, as detailed in 4.5.1 analysis.

4.2 Service-oriented transformation

Why create a new service to host the transformed query?

On the one hand, it is convenient to upgrade and rollback switch, on the other hand, it is to collect the query, as a service to provide the corresponding query ability.

The modified new query is placed in the service, and the original query in the JAR package is replaced with the client call of the service.

At the same time, upgrade jar package version to 3.0.0-snapshot.

4.3 Services are launched in Batches

In order to mitigate the risk, it is necessary to arrange the launch of non-core services to core services in batches.

Note that during batch launches, the write service is often the core service, so it is scheduled later. A non-core read service may come online, and there will be an intermediate state between reading new tables and writing old tables.

1) Upgrade projectDB version to 3.0.0-snapshot and deploy Intranet environment for all related services using refactoring branch;

2) The business service depends on the middle platform service and requires subscription service

3) Open the reconstruction branch (do not merge with the normal iteration branch), deploy the internal network, and the internal network is expected to test for more than two weeks

A new refactoring branch was used to test on the Intranet for two weeks without interrupting normal business iterations. The weekly update service branch can be merged to the reconstruction branch to deploy the Intranet, and the extranet service branch can be merged to deploy the master.

Of course, if the online and offline code branches are consistent, we can also reconstruct the branches and business branches to test the online, which will put greater pressure on development and testing.

4) In the process of batch launch, if there is a dependency conflict problem, it needs to be solved in time and updated to this document in time

5) Before the service goes online, business development or testing must be required to clearly evaluate specific API and risk points and make regression.

Here again, after the completion of online, please do not miss the offline data analysis business! Don’t miss the offline analytics business! Don’t miss the offline analytics business!

4.4 Process for bringing old tables offline

1) Check the monitoring to ensure that no service other than the central service accesses the old database and old tables

2) Check the SQL audit on the database to ensure that no other services are still reading the old table data

3) Stop data synchronization

4) Delete the old table

4.5 Best Practices

4.5.1 Read Immediately after Writing May fail to be read

In the process of batch online, encountered the situation that read immediately after writing may not be read. Due to a large number of businesses, we adopted the method of launching in batches to reduce risks. Some applications have been upgraded and some applications have not been upgraded. The unupgraded service still writes data to the old table, but the upgraded application reads data from the new table. When the delay exists, many newly written records cannot be read, causing serious impact on service scenarios.

There are two main reasons for the delay:

1) The write service has not been upgraded, and the old table is still being written. At this time, there will be an intermediate state between reading the new table and writing the old table, and there will be synchronization delay between the old and new tables.

2) In order to avoid the pressure of the master database, the data of the new table is obtained from the old table, and then the data of the read-only instance of the old table is checked for synchronization. There is a certain delay in the master and slave database.

There are generally two solutions:

1) Change data synchronization to dual-write logic.

2) Read the interface to make compensation. If the new table cannot be found, check the old table again.

4.5.2 Replacing the Auto-increment Primary Key with the Unique ID of database Middleware

A global primary key conflict may occur if the autoincrement primary key of a single table is used after table splitting. Therefore, you need to use distributed unique ids instead of auto-increment primary keys. There are many kinds of algorithms on the Internet, and this project adopts the generation method of database auto-increment sequence.

The distributed ID generator of database auto-increment sequence is a existence dependent on Mysql. Its basic principle is to store a value in Mysql. Every time a machine obtains an ID, it will add a certain amount to the current ID, such as 2000, and then add the current value to 2000 and return it to the server. Then each machine can continue to repeat this operation to obtain a unique ID range.

But is it enough just to have globally unique ids? Obviously not, because there is also the problem of conflicting ids for the old and new tables.

Because of the large number of services, in order to reduce the risk of the need to launch in batches. Therefore, there is logic that some services are still single-write old tables, and some services are double-write logic.

In this state, the old table id policy uses auto_INCREMENT. If only one-way data flows (from the old table to the new table), you only need to reserve a range for the id of the old table, and the sequence starts with a large value to avoid conflicts.

However, in this project, there is also the double-write of the new table data and the old table data. If the above scheme is adopted, the larger ID is written to the old table, and the auto_increment of the old table will be reset to this value, so that the records of increasing IDS generated by the old table service will inevitably conflict.

The old table starts with a larger auto_INCREMENT value, and the new table increments from the id greater than the maximum number of records in the old table and smaller than the auto_INCREMENT value to be set from the old table. Id conflicts are avoided.

1) Before switching:

Set the start ID of the sequence to the size of the self-added ID of the current table. Then, change the size of the self-added ID of the old table to a larger size. Reserve a period for the self-added ID of the old table to prevent ID conflicts after data written into the old table by unupgraded services is synchronized to the new database.

2) After switching

Disconnect data synchronization without any modification

3) advantages

Just one piece of code;

Switching can be carried out using switches, without upgrading;

If the old table’s AutoIncrement is increased by an exception, it will not cause any problems.

4) shortcomings

If the old table fails to be written but the new table succeeds, log processing is required

4.6 Summary of this chapter

After the completion of the old table offline, the transformation of the entire sub-database sub-table is completed.

In this process, you need to always be in awe of the online business, think carefully about every possible problem, and come up with a fast rollback solution (the jar version iteration of ProjectDB was mentioned in three phases, from 1.0.0-snapshot to 3.0.0-snapshot, which included different changes for each phase, In the process of batch online at different stages, rollback by jar package version played a huge role), to avoid major failures.

5. Stability guarantee

This chapter mainly reemphasizes the guarantee means of stability. As one of the important objectives of this project, stability actually runs through the whole project cycle. Basically, it has been mentioned in the above links. Sufficient attention should be paid to each link, and plans should be carefully designed and evaluated so as to have a clear idea rather than relying on the weather:

1) The design of the new form must be fully communicated with the business side and reviewed.

2) For “data synchronization”, there must be data verification to ensure the correctness of data. Many reasons that may lead to incorrect data have been mentioned above, including real-time and consistency problems. Ensure correct data is the main premise of online.

3) For each stage of change, a fast rollback plan must be made.

4) Launch process, in the form of batch launch, start from non-core business pilot, to avoid failure expansion.

5) Monitoring alarms should be configured comprehensively to receive alarms and respond quickly when problems occur. Do not ignore it. It is very important. There are several minor problems with selected data, which are discovered and resolved in time through alarms

6) Single test and business function test should be sufficient

6. Cross-team collaboration in project management

This article is devoted to “cross-team collaboration” as a chapter.

Because in such a large-scale cross-team project transformation process, scientific team cooperation is an indispensable factor to ensure the completion of the overall project on time and with high quality.

Below, share a few experience and experience.

6.1 All documents come first

Team cooperation is the most avoid “empty words without evidence”.

Whether it’s team work, scheduling, or anything that requires multiple people to work together, there needs to be a document to track progress and control the flow.

6.2 Business communication and confirmation

All table structure transformation, must communicate with the relevant business side, for the possible history of logic, a comprehensive sorting;

All field modifications that have been discussed must be confirmed by the Owner of each service.

6.3 Responsibilities in Place

For multi-team and multi-person-time cooperation projects, each team should specify a contact person, and the project leader should communicate with the only contact person of the team to clarify the complete progress and completion quality of the team.

7. Look forward to

In fact, it can be seen from the length of the full text that the project of sub-database and sub-table costs a lot of time and energy due to the transformation of complex business logic, and it is very easy to cause unstable online problems in the process of transformation.

In this paper, the whole process of splitting, designing and going online is reviewed, hoping to be helpful to everyone.

When we look at this, we want to ask a question. So, is there a better way?

Perhaps, the future still needs to be combined with the industry’s new database middleware technology, can quickly realize the sub-table.

Perhaps, in the future, new data storage technologies and schemes (polarDB, TIDB, hbase) can be introduced, and there is no need for separate databases and tables at all?

Keep up with the new technology and I’m sure we’ll find answers.

See the end, the original is not easy, point a concern, point a like it ~

Reorganize the knowledge fragments and construct the Java knowledge graph: github.com/saigu/JavaK… (Easy access to historical articles)