specifications

For massive data tables with a scale of hundreds of millions or even billions of billions in the future, in the early stage of the project, in order to quickly go online, it is generally only a single table design, without considering the sub-database and sub-table. With the development of business, single table capacity more than ten million or even reach more than 100 million level, at this time we need to consider the problem of sub-table, and non-shutdown sub-table migration, which should be sub-table of the most basic needs, after all, the Internet project can not hang a billboard **” 10:00 tonight ~ 10:00 next day system shutdown maintenance “**, What would the interviewer think of the relocation plan?

Using codis

The author has just encountered this problem and realized the non-stop sub-database and sub-table migration scheme based on some ideas of CODIS. Rebalance: CoDIS is not the focus of this article. Here’s what to learn from coDIS –rebalance:

When data access occurs during migration, the Proxy sends the SLOTSMGRTTAGSLOT migration command to Redis to forcibly migrate the Key to be accessed by the client immediately and then process the request from the client. (SLOTSMGRTTAGSLOT is customized by CODIS based on Redis)

Depots table

After understanding this scheme, it is easier to understand the non-stop sub-database sub-table migration. Next, the author introduces the implementation scheme of installed_APP table in detail. That is, the installed APP information table of the user;

1. Determine sharding Column

Determining sharding column is absolutely the most important part of the database and table. Sharding column directly determines whether the whole sub-database sub-table scheme can be successfully landed; The selection of an appropriate Sharding column can basically enable the majority of traffic interfaces related to this table to access the single table behind the sub-database and sub-table through this Sharding column, without the need for cross-database and cross-table. The most common Sharding column is user_id. The note also selects user_id;

2. Separate database and table scheme

After selecting the most appropriate Sharding column according to its own business, it is necessary to determine the database and table scheme. The author adopts a scheme combining active and passive migration:

  1. Active migrationIs an independent procedure, traverse the need for sub – library sub – tableinstalled_appTable to migrate data to the target table after the sub-table.
  2. Passive migrationEven withinstalled_appThe table related business code itself migrates the data to the corresponding table after the sub-database sub-table.

These two scenarios are described in detail next;

2.1 Active Migration

Active migration is an independent external migration process, its purpose is to traverse the need installed_app depots table table, the data is copied to the depots in the target table after table, due to the migration of active and passive migration will run together, so you need to deal with the issue of migration of active and passive migration collision, the author of the active migration pseudo code is as follows:

public void migrate(a){
    // Displays the maximum ID of the current table to determine whether the migration is complete
    long maxId = execute("select max(id) from installed_app");
    long tempMinId = 0L;
    long stepSize = 1000;
    long tempMaxId = 0L;
    do{
        try {
            tempMaxId = tempMinId + stepSize;
            // According to InnoDB index feature, where id>=? and id
      
            String scanSql = "select * from installed_app where id>=#{tempMinId} and id<#{tempMaxId}";
            List<InstalledApp> installedApps = executeSql(scanSql);
            Iterator<InstalledApp> iterator = installedApps.iterator();
            while (iterator.hasNext()) {
                InstalledApp installedApp = iterator.next();
                // help GC
                iterator.remove();
                
                long userId = installedApp.getUserId();
                String status = executeRedis("get MigrateStatus:${userId}");

                if ("COMPLETED".equals(status)) {
                    // migration finish, nothing to do
                    continue;
                }
                if ("MIGRATING".equals(status)) {
                    When migrating, nothing to do
                    continue;
                }

                // Obtain locks before MIGRATING: Set MigrateStatus:18 MIGRATING EX 3600 nx
                String result = executeRedis("set MigrateStatus:${userId} MIGRATING ex 86400 nx");
                if ("OK".equals(result)) {
                    // After successfully obtaining the lock, first query all the installed apps of the user.
                    String sql = "select * from installed_app where user_id=#{user_id}";
                    List<InstalledApp> userInstalledApps = executeSql(sql);

                    // Migrate all of the user's installed apps to the table behind the sub-table (user_id can be used to get the specific table behind the sub-table)
                    shardingInsertSql(userInstalledApps);

                    // Change the cache status after the migration is complete
                    executeRedis("setex MigrateStatus:${userId} 864000 COMPLETED");
                } else {
                    // If no lock is acquired, the passive migration has acquired the lock, and the migration can be transferred to the passive migration.
                    // You can also enforce the logic here, the "passive migration" process is not likely to last for a long time, you can try to loop several times to get the state to see if the migration is complete
                    logger.info("Migration conflict. userId = {}", userId); }}if (tempMaxId >= maxId) {
                // Update Max (id), because the migration process due to double write, Max (id) will change, so we need to confirm the value of maxId again to determine whether the traversal is complete
                maxId = execute("select max(id) from installed_app");
            }
            logger.info("Migration process id = {}", tempMaxId);
        }catch (Throwable e){
            // If there are any exceptions during execution (such exceptions can only be thrown by Redis and mysql), exit, fix the problem and migrate
            // And set tempMinId to logger.info("Migration Process ID ="+tempMaxId); Indicates the ID of the last log record to prevent duplicate migration
            System.exit(0);
        }
        tempMinId += stepSize;
    }while (tempMaxId < maxId);
}
Copy the code

Here are a few things to note:

  1. The first step to query Max (id) is to minimize the number of queries for Max (ID). If the first query Max (ID) is 10000000, there is no need to query Max (id) again until the id traversed reaches 10000000.
  2. According to theid>=? and id<?Traverse, not followid>=? limit norlimit m, nPerform traversal because limit performance is mediocre and gets worse as the traversal progresses. whileid>=? and id<?This traversal method even if there will be some tread empty, there is no impact, and the whole performance curve is very smooth, there will be no jitter; After all, the migration program is the auxiliary program, can not have too much influence on the business program;
  3. It is queried based on the ID rangeList<InstalledApp>To convertIterator<InstalledApp>Remove userId after each iteration, otherwise GC exception or OOM may occur;

2.2 Passive Migration

Passive migration means that the migration logic is inserted before the normal service logic related to the Installed_APP table. Taking the APP installed by the new user as an example, its pseudocode is as follows:

// The passive migration method is common logic, so it is required to be called before all business logic related to the 'installed_app' table;
public void migratePassive(long userId)throws Exception{
    String status = executeRedis("get MigrateStatus:${userId}");

    if ("COMPLETED".equals(status)) {
        // The user data has been migrated
        logger.info("user's installed app migration completed. user_id = {}", userId);
    }else if ("MIGRATING".equals(status)) {
        When migrating, wait until the migration is complete; To prevent endless loops, you can add the maximum wait time logic
        do{
            Thread.sleep(10);
            status = executeRedis("get MigrateStatus:${userId}");
        }while ("COMPLETED".equals(status));

    }else {
        // Prepare for migration
        String result = executeRedis("set MigrateStatus:${userId} MIGRATING ex 86400 nx");
        if ("OK".equals(result)) {
            // After successfully obtaining the lock, first query all the installed apps of the user.
            String sql = "select * from installed_app where user_id=#{user_id}";
            List<InstalledApp> userInstalledApps = executeSql(sql);

            // Migrate all of the user's installed apps to the table behind the sub-table (user_id can be used to get the specific table behind the sub-table)
            shardingInsertSql(userInstalledApps);

            // Change the cache status after the migration is complete
            executeRedis("setex MigrateStatus:${userId} 864000 COMPLETED");
        }else {
            // If the lock is not acquired, the lock should be acquired elsewhere and the migration is underway. Try to wait until the migration is complete}}}// Business related to 'installed_app' table -- add APP already installed by user
public void addInstalledApp(InstalledApp installedApp) throws Exception{
    // Try a passive migration first
    migratePassive(installedApp.getUserId());

    // insert the user's installedApp information (installedApp) into the target table after the sub-table
    shardingInsertSql(installedApp);

    // Insert logic for single database single table. Whether you need this old business code depends on the rigor of the scheme: if you need a scheme that can be rolled back, the code needs to be preserved;
    insertSql(installedApp);
}
Copy the code

In either case, check MigrateStatus:${userId} in the cache:

  1. If the value is COMPLETED, it indicates that the migration is COMPLETED. In this case, move the request to the table next to the sub-database and sub-table for processing.
  2. If the value is MIGRATING, the migration process is in progress. You can cycle until the value is COMPLETED, and then transfer the requests to the tables following the database and tables for processing.
  3. Otherwise, the value is null, then try to acquire the lock and migrate the data. After the migration is COMPLETED, the cache value is updated to COMPLETED. Finally, the request is transferred to the table next to the sub-database and sub-table for processing.

3. Plan improvement 1

After all data migration is complete, CRUD operations are determined based on MigrateStatus:${userId} in the cache. This step is unnecessary after data migration is complete. After all data migration is complete, the value of this switch will be sent in a similar way to TOPIC. After all services receive the TOPIC, the switch will be local cache. MigrateStatus:${userId} MigrateStatus:${userId} MigrateStatus:${userId}

4. Improvement of the scheme 2

In addition, a major drawback of the example implementation such as addInstalledApp is that the migration code is strongly coupled to the business code, and the business interface takes an increase in time due to double writing. This can be further optimized by subscribing to the binlog of the Installed_app table (see Alibaba Canal) in the following code:

// The business associated with the 'installed_app' table -- the APP installed by the new user -- the old business code remains unchanged
public void addInstalledApp(InstalledApp installedApp) throws Exception{
    insertSql(installedApp);
}
Copy the code

Binlog consumption:

// insertSql(installedApp); Insert type (canal, EventType) the logic on the binlog consumer is as follows:
public void insertBinlogConsumer(InstalledApp installedApp){
    // Try a passive migration first
    migratePassive(installedApp.getUserId());

    // insert the user's installedApp information (installedApp) into the target table after the sub-table
    shardingInsertSql(installedApp);
}
Copy the code

Note: Add, modify, delete operations will generate binlog, these types of interfaces can be optimized in this way; The interface of the query class also does not generate binlog and does not affect the data, so there is no need to make any changes, because the original installed_APP table data is always the full amount of data;

5. Legacy work

After the migration is complete, the active migration program is offline, migratePassive() is removed from the passive migration program, and some third-party sub-library sub-table middleware, such as Sharding-JDBC, can be integrated. For details, see Sharding-JDBC integration

review

Reviewing this scheme, the biggest disadvantage is that if the total number of sharding column (such as userId) is large and active migration is in progress, passive migration may need to wait for a long time (if binlog scheme is used, this disadvantage is not present).

However, according to DB performance, generally the batch insertion of 1000 data is at 10ms level, and the records of the same Sharding column belong to only one table after they are divided into tables in the database, without involving cross-table. Therefore, as long as there is no such abnormal Sharding column in the table to be migrated through SQL statistics before migration, it can be safely migrated;

When I migrated the Installed_APP table, users had no more than 200 apps at most, so there was no need to worry too much about performance issues caused by collisions. There is no universal plan, but there are suitable for their own plan;

If there are sharding columns with thousands of records, these Sharding columns can be cached first, and the migration program will go online at night. The data of these cached Sharding columns can be migrated first, so as to reduce the experience of the migration program for these users as much as possible. Of course, you can use a better solution that you come up with.