Last article we talked about the practice of single library sub-table in new business, relatively simple, this article we continue to talk about.
Single library sub-table – old business
Old business is the original table design and the business has been online for a period of time, there is historical data.
Then there is the problem of data migration. The migration direction is from an old table with a single table to a new table with multiple tables.
There are a few things to note when the table structure does not change:
- Primary key ID problem
- Data migration
Primary key ID problem
The old table primary key is likely to increment, the new table is distributed ID, is likely to use the snowflake algorithm calculated, is not the same. First, ensure that the id of the new table does not duplicate that of the old table. Of course, the probability of the duplicate is relatively small.
The id of the old table can be migrated to the new table according to the situation, and the original ID value can be used, or a new ID can be generated. If your business does not use the old table ID for any business operations, just a primary key identifier, then it does not matter whether to change, if for the sake of unity can be used to generate a new id algorithm. If the old table ID itself is involved in business, such as your SQL has the use of this field associated table, then do not move, because the cost is very high, change a lot of things, I think it is not necessary.
Data migration
Data migration or online capacity expansion, the scenario here only applies to the situation where the former single database single table becomes single database multiple tables.
Stop moving
Of course, it is possible to stop the old table data synchronization shunt to the new table, and then boot, so that the data will not have inconsistent problems. But to see the business situation, a lot of business is 7*24 hours online does not allow shutdown, it can not carry out shutdown migration, and if allow a period of downtime such as a government system, to system users issued a notice shutdown for a period of time, it is ok.
As for the specific data migration operation, you can borrow tools or write your own programs, depending on the performance and impact time of different methods to choose, of course, we expect the downtime to be as short as possible.
Non-stop migration
Common way is to double a, double wrote the business data in the table of class and new, starting from the double write point that can guarantee the new data is consistent, as for the old data and then slowly move through the program or tool, until the migration completed data in a new class consistent application can be completely switch to the new table, stopped for a class of access.
Migration tools provided by ShardingSphere
The data migration schemes mentioned above have nothing to do with ShardingSphere whether they are down or not down. Here, let’s take a look at the schemes provided by ShardingSphere, or how to do data migration with ShardingSphere.
Sharding-proxy +Sharding-Scaling is a component designed specifically to deal with data migration problems in database and table Scaling
Let’s start with the data migration action
The example in this article is to migrate data from a single table to multiple tables with split table rules. If you think about it, the principle is simple: read data from a single table and insert it into a new table according to the table partition rule.
Let’s see how shardingsphere-proxy is combined with Shardingsphere-scaling.
Software Version:
- MySQL 8.0
- 5.0 beta ShardingSphere – proxy
- 5.0 beta ShardingSphere – Scaling
Shardingsphere-proxy is relatively simple, which is similar to the proxy configuration method introduced in the previous article. The following is my config-myapp.yaml configuration:
schemaName: my-app
dataSources:
write_ds:
url: jdbc:mysql://mysql.local.test.myapp.com:23306/test?allowPublicKeyRetrieval=true&useSSL=false&allowMultiQueries=true&serv erTimezone=Asia/Shanghai&useSSL=false&autoReconnect=true&useUnicode=true&characterEncoding=utf-8&zeroDateTimeBehavior=co nvertToNull
username: root
password: nicai
connectionTimeoutMilliseconds: 3000
idleTimeoutMilliseconds: 60000
maxLifetimeMilliseconds: 1800000
maxPoolSize: 50
minPoolSize: 1
maintenanceIntervalMilliseconds: 30000
read_ds_0:
url: jdbc:mysql://mysql.local.test.read1.myapp.com:23306/test?allowPublicKeyRetrieval=true&useSSL=false&allowMultiQueries=tru e&serverTimezone=Asia/Shanghai&useSSL=false&autoReconnect=true&useUnicode=true&characterEncoding=utf-8&zeroDateTimeBehav ior=convertToNull
username: root
password: nicai
connectionTimeoutMilliseconds: 3000
idleTimeoutMilliseconds: 60000
maxLifetimeMilliseconds: 1800000
maxPoolSize: 50
minPoolSize: 1
maintenanceIntervalMilliseconds: 30000
rules:
- ! SHARDING
tables:
Virtual table name
t_order_sharding:
actualDataNodes: write_ds.t_order_sharding_$->{0.. 1}
tableStrategy:
standard:
# shard key
shardingColumn: order_id
shardingAlgorithmName: table_inline
keyGenerateStrategy:
column: id
keyGeneratorName: snowflake # Primary key generation strategy -- Snowflake algorithm
shardingAlgorithms:
table_inline:
type: INLINE
props:
# database table subtable rule
algorithm-expression: t_order_sharding_$->{order_id % 2 }
keyGenerators:
snowflake:
type: SNOWFLAKE
props:
worker-id: 123
Copy the code
Sharding rules are also very simple, that is, 2 tables, modulus by 2.
Since both Proxy and Scaling require zooKeeper to be connected, I deployed a Zookepper locally before starting the proxy
CD apache-zookeeper-3.6.3-bin/ CD conf cp zoo_sample. CFG zoo.cfg... cd bin ./zkServer.shCopy the code
Zookeeper starts normally. My laptop is macbook and zooKeeper client is prettyZoo(github.com/vran-dev/Pr…) . Then start the proxy without any exception and test the connection with MySQL client. This step is done.
Shardingsphere-scaling: First modify server. XML and open the registry configuration to the ZooKeeper we just started.
scaling:
port: 8888
blockQueueSize: 10000
workerThread: 30
governance:
name: governance_ds
registryCenter:
type: ZooKeeper
serverLists: localhost:2181
Copy the code
Then go to the bin directory and run server_start.sh to start normally, but you can see that there are these files in the bin directory:
- server_start.bat
- server_start.sh
- server_stop.sh
- worker_start.bat
- worker_start.sh
- worker_stop.sh
Well, I’m sure you thought workder_start would be useful, so I started it up, but it gave me this prompt:
ERROR: The ShardingSphere-Scaling already started!
PID: 11946
12336
Copy the code
At first I thought I saw the text to know the meaning of that meaning, they have started, do not need to open again, so I ignore it, but when I toss around the configuration for a day, I find that it does not work, so I know, it is useful, but I am a copy of the file directory, Then change the port number to execute. The worker_start.sh and server_start.sh processes are executed respectively
There are two other points to note, which are also mentioned in the official documents:
- If the back end is connected to a MySQL database, download mysql-connector-java-5.1.47.jar and put it in the ${shardingsphere-scaling}\lib directory.
- Binlog must be enabled for MySQL in Row format, and users used during migration must have Replication permission.
After that, I created and started a migration task using the Scaling API. Take a look at my “Create migration Task” request:
curl -X POST \ http://localhost:8888/scaling/job/start \ -H 'content-type: application/json' \ -d '{ "ruleConfig": { "source": { "type": "shardingSphereJdbc", "parameter": " dataSources: write_ds: dataSourceClassName: com.zaxxer.hikari.HikariDataSource jdbcUrl: jdbc:mysql://mysql.local.test.myall.com:23306/test?allowPublicKeyRetrieval=true&useSSL=false&allowMultiQueries=true&serv erTimezone=Asia/Shanghai&useSSL=false&autoReconnect=true&useUnicode=true&characterEncoding=utf-8&zeroDateTimeBehavior=co nvertToNull username: root password: nicai rules: - ! T_order_sharding: actualDataNodes: write_ds.t_order_sharding tableStrategy: standard: # shardingColumn: order_id shardingAlgorithmName: table_inline shardingAlgorithms: default_db_inline: type INLINE props: algorithm-expression: write_ds table_inline: type: INLINE props: # Database table sub-table rule algorithm-expression: t_order_sharding " }, "target": { "type": "jdbc", "parameter": " username: root password: root jdbcUrl: JDBC: mysql: / / 127.0.0.1:3307 / my - app? serverTimezone=UTC&useSSL=false " } }, "jobConfiguration":{ "concurrency":"1" } }'Copy the code
Since I want to migrate data from a single table to a multi-table, the single-table name is configured in rules, and the proxy in target is our proxy. Since the split table rule has been configured in proxy, So scaling can use “source (single table)” and “target (proxy-configured sub-table)” to perform data migration.
After a task is created and executed, you can perform the following requests to view the task progress:
curl -X GET \
http://localhost:8888/scaling/job/progress/655462928387932161
Copy the code
The last number is your mission ID. Can also see all the tasks and stop a task, specific please refer to the official document: shardingsphere.apache.org/document/5….
This is the background log of my worker from the beginning to the end of the task:
From the database results, the data in my single table is indeed divided into different tables according to the table partitioning rules.
The official document says that shardingSphere-UI project can be used to visualize operation migration tasks. I downloaded and started the version of Apache-ShardingSphere-5.0.0-alpha-Shardingsphere-UI, but I don’t know if it is the version. Reported a variety of null pointer exceptions, due to the time of the relationship there is no source code re-analysis, expect the subsequent version can be normal.
What happens if we do it again?
If the operation is repeated, an error message is displayed indicating that the table is not empty. So I guess you can’t just do incremental migration. ShardingSphere supports incremental migration, but the timing is automatically done after the total migration task starts. If new incremental data is entered into the old table during the migration period, Sharadingsphere-scaling migrates data to new tables based on MySQL’s binlog.
Distributed governance
ShadringSphere provides distributed governance solutions, which are implemented with the following motivations:
-
Centralized configuration: With more and more run-time instances, scattered configuration is difficult to manage, and the problems caused by out-of-sync configuration are serious. The configuration is centralized in the configuration center for more efficient management.
-
Configuration dynamics: Distribution of configuration changes is another important capability that configuration centers can provide. It supports dynamic switching between data sources and rules.
-
Stores runtime dynamic/temporary state data, such as available instances of ShardingSphere, data sources that need to be disabled or fushed, etc.
-
Provides choreography governance capability to fuse database access and disable access from slave libraries. The governance module still has a lot of unfinished functionality (flow control, etc.)
We leverage distributed governance to implement a feature to configure dynamic switching and updating (such as sharding rules and data sources)
Software environment:
- MySQL 8
- SpringBoot 2
- ShardingSphere 5.0.0 – beta
First, introduce related dependencies
<! -- Sharding JDBC dependencies -->
<! --<dependency>-->
<! --<groupId>org.apache.shardingsphere</groupId>-->
<! --<artifactId>shardingsphere-jdbc-core-spring-boot-starter</artifactId>-->
<! - < version > 5.0.0 - beta < / version > -- >
<! --</dependency>-->
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>shardingsphere-jdbc-governance-spring-boot-starter</artifactId>
<version>5.0.0 beta -</version>
</dependency>
<! This module is required when using ZooKeeper.
<dependency>
<groupId>org.apache.shardingsphere</groupId>
<artifactId>shardingsphere-governance-repository-zookeeper-curator</artifactId>
<version>5.0.0 beta -</version>
</dependency>
Copy the code
Note that the commented out part is written when using JDBC, if you want to use the governance function to comment out that part of the dependency.
Modify the Spring configuration file
spring:
profiles:
include: common-local
shardingsphere:
governance:
name: governance_ds_test
overwrite: false
registry-center:
server-lists: localhost:2181
type: Zookeeper
Copy the code
Here we use the local ZooKeeper configuration and registry. The old ShardingSphere versions Apollo and Nacos supported configuration centers, but they were removed: github.com/apache/shar…
Write registry data based on the data structure provided by the official documentation (configuration is very similar to the previous one)
So that’s basically the end of it, so you can start the application and see if you can connect to the database and manipulate the data.
Then we can modify the data in the registry, for example, if you did not add sharding rules before, then read a single table. After modifying the sharding rules, we can make the program read multiple tables. This allows us to update the configuration from the ZooKeeper side and dynamically update the data source or sharding rules from the application side.
The overall plan
Based on all the above, my overall plan is as follows, roughly divided into three steps:
- Migrate data first
- Then dynamically update the configuration
- Manual replenishment data
The first step is to modify the application configuration file, introduce the ShardingSphere distributed governance dependency, and disassemble the database configuration into ZooKeeper. However, keep the old table configuration unchanged and do not fragment. Finally go online to update the application. This step is insensitive to both the user and the developer because nothing is changed.
The second step is to evaluate the migration time according to the data volume of the tables to be migrated, and then select a period with the least traffic volume to carry out data migration from single tables to multiple tables using shardingsphere-proxy + Shardingsphere-Scaling scheme above. All the requests are still in the old table. It is just that data is being migrated, and it is possible to migrate both the stock and the incremental data generated during migration to the new table using Shardingsphere-Scaling.
In the third step, modify the zooKeeper configuration after the migration and add sharding rules. All requests will be sent to the new sharding table.
Step 4: During the period from the completion of migration to the completion of configuration switch, it may be several seconds or minutes. Although we have selected a period with the least traffic for operation, there may still be user write requests, so the data generated during this period will be in the old table, not in the new table. For this part of the data we need to manually query and fill in the new table as soon as possible. Because such data is not too much, so the operation will not be very troublesome, it is best to write a good program in advance when the time to run, both sides of the synchronization can be.
The above solution is completely transparent to the application without changing a single line of code.
Other situations
The scenario of this paper is the realization of the sub-table scenario from single-library single-table to single-table multi-table. Other scenarios are as follows:
- Online capacity expansion for single-library and multi-table sharding (e.g., originally divided into 2 tables, now divided into 4)
- Multi-database and multi-table sharding online capacity expansion (for example, it was originally divided into 2 databases and 2 tables, but now it is divided into 4 databases and 4 tables)
With the groundwork laid in this article, these scenarios are treated much the same, just in different configurations.
reference
- Blog.51cto.com/u_15057819/…
- Tech.meituan.com/2016/11/18/…
- Shardingsphere.apache.org/document/5….