1. The background
1.1 Problems with online modification tools based on triggers
Pt-online-schema-change, LHM, and Oak-online-alter -table all use synchronization. When there are changes to the original table, these changes are synchronized to the temporary table using some time between transactions. All of these tools use triggers to identify changes to the original table, with potential problems as follows:
- Triggers are stored in interpreted code. MySQL does not precompile this code. They are called each time in the transaction space and are added to the parser and interpreter before each query behavior of the table being manipulated, incurs additional overhead.
- When the load of the main library increases, we can suspend the row copy operation of the tool, but we cannot suspend the work of the trigger, which needs to be present throughout the operation and may cause system resource usage
- Triggers share the same transaction space in the original table query, which can affect concurrent write performance of the primary library in extreme cases
1.2 Advantages of GH-OST over flip-flop type tools
1. There is no trigger
Gh-ost does not use triggers. It listens for data changes in tables by analyzing binlog logs. So it works in asynchronous mode, synchronizing changes to temporary tables (Ghost Tables) only when changes to the original table have been committed.
2. Lightweight
Because triggers are not required, GH-OST decouples the load of changing table definitions from the normal business load. It does not need to take into account concurrent operations and contention on the modified table, which are serialized in binary logs. Gh-ost only operates on temporary tables, completely irrelevant to the original table. In fact, GH-OST also serializes row copy writes from binary log writes, so that there is only one connection for the master to write sequentially to the temporary table, which is quite different from the common ETL behavior.
3. Can be suspended
Since gh-OST itself is a binary file generation operation in the execution process, it can randomly pause and continue the operation of copying data and applying binlog to the shadow table through flag bit file or socat command
4. Dynamic adjustment
Unlike other DDL tools, which require repeated operations after the configuration is modified, GH-OST can obtain the latest operation parameters, such as chunk-size, max-lag-millis, and max-load, through Unix socket files. The configuration takes effect immediately and does not need to be run again
5. High availability
Gh-ost provides the test function, the — test-on-Replica option, which allows you to modify the table structure from running on the slave, pause the master/slave replication at the end of the operation, leave both tables in sync and ready, then switch tables and back again. In addition, GH-OST also supports switching delay, which delays the last exchange between the new table and the original table until a specific file is deleted, while continuing to synchronize data, keeping the data of the temporary table in sync.
2. Working principle
- Check for foreign keys and triggers.
- Check the primary key information of the table.
- Check whether there is a master or slave library, whether log_slave_updates is enabled, and binlog information
- Check whether temporary tables at the end of GHO and del exist
- Create GHC end table, store data migration information, binlog information, etc
Above verification phase
- Initialize the stream connection and add a binlog listener
The following migration phases
- Create ghO end temporary table, execute DDL on GHO end temporary table
- Start transaction, write source table data to ghO end table by primary key ID, commit, and binlog apply
The following cut-over phase
- Lock source table, rename source table to source _DEL table, ghO table to source table
- Clean up the GHC style table
Let’s focus on two key points in the process:
2.1 Migrating Data Consistency
Gh-ost performs three operations on the original table and shadow table during DDL change: Row copy on the original table (we use A operation instead), DML operation on the original table (B), and apply binlog(C) on the shadow table. In addition, the binlog is generated based on DML operations, so the apply binlog to the shadow table must be followed by the DML of the original table, which has the following sequence:
As you can see from these combinations, the data is ultimately consistent, and when copy ends, only the apply binlog operation has a problem:
Both b-C-A and B-A-C modes are built on gh-OST’s own binlog save/just-in-time application. For cases where data has not been copied to the shadow table and DML operations have been performed on the original table, the gh-OST will display the Backlog: N / 1000 N refers to the backlog of events in the binary log number, time parameters and I haven’t see the changes, should be a program written to death, so if the original the DML operations on the table is very active, the backlog is full, will not be able to record subsequent DML operations into binary log, the loss of data, in order to avoid this situation, Gh-ost sets the priority of apply binlog over row copy
2.2 the cut – switch over
Gh-ost switching is atomic switching, basically through the operation of two sessions to complete.
Use MySQL’s internal mechanisms:
If the lock table is blocked, rename takes precedence over DML. That is, rename table is executed first and then DML is executed
We assume that the sessions for GH-OST operations are C10 and C20, DML requests for other businesses are C1-C9, C11-C19, c21-C29, table B is the table where DDL changes, and _b_GHO is the shadow table for table B
- Session C1-C9: DML operations are performed on table B normally
- Session C10: Created
_b_del
This prevents data loss caused by the early rename of a table - Session C10 executes LOCK TABLES
b
WRITE,_b_del
WRITE
At this point, session C11-C19 will have a new DML request, but will wait because there is a lock on table B
- Session C20: Set the lock wait time and rename
set session lock_wait_timeout:=1; rename /* gh-ost */ table test.b to test._b_del, test._b_gho to test.b
- C20 operates because C10 locks the table
b
and_b_del
While waiting for - C21-c29 waits for new requests from table B because of lock table and rename table
- Session C10 checks with SQL that session C20 is rename and is waiting for an MDL lock
- C10 Based on the judgment in the previous step, run the drop table _b_del command. After the command is executed, data to table B still cannot be written. All DML requests are still blocked
- C10 Runs UNLOCK TABLES. The RENAME command of C20 is executed first
Whether the RENAME table or DML operation is executed first, the blocked rename table is always executed before the DML operation, and then other sessions C1-C9, C11-C19, and C21-C29 can manipulate the new table B
So what happens if any part of the cut-over process fails?
- If c10 create
_b_del
Failure. Gh-ost program exits - If the c10 lock statement fails, the GH-OST program exits because the table is not locked and the DML request can proceed normally
- If an exception occurs on C10 before RENAME is executed on C20
- The lock held by C10 is released, and the query for C1-C9, c11-C19 can be immediately executed on B
- because
_b_del
Rename table B to_b_del
Will fail - Some queries are waiting for a while and may need to be retried
- If c10 fails to exit when RENAME is blocked on C20 and the lock is released, then THE RENAME operation on C20 still fails because the b_del table exists and all requests are restored
- If C20 fails, GH-OST will not capture RENAME, session C10 will continue to run, release the lock, and all requests will return to normal
- If both c10 and C20 fail, the lock in c10 is cleared, and the rename lock in c20 is cleared. C1-c9, C11-C19, and C21-C29 can be executed on B.
Therefore, it can be concluded that:
Impact on the program: During cut-over, the application is prevented from writing to the table until the shadow table swap succeeds or until the operation fails. If successful, the application continues to operate on the new table. If the switch fails, the application continues operating on the original table.
Impact on replication: slave Because lock statements are not copied in the binlog file, atomic operations can only be performed using the RENAME statement.
3. Working mode
Mode 1: Connect to the slave library and modify it on the master library
This is the default working mode of GH-OST, which looks at the secondary libraries, finds the cluster’s primary library and connects to it.
The modification steps are as follows:
- Read and write row data on the primary library;
- Read binary log events from the slave library and apply changes to the master library;
- View tables, fields, primary keys, total rows, etc.
- Read gh-OST internal event logs (such as heartbeat) from the library;
- Complete the table switch on the main library;
Disadvantages: Because the binlog in this mode is read from the slave library, there may be a risk of master/slave inconsistency
Pattern 2: Modify directly on the main library (recommended)
If you do not have a slave library or do not want to operate on a slave library, or are in a highly available architecture (such as MGR), you can make changes directly on the master library by configuring the parameter –allow-on-master, and the binary log format of the master library is Row
Disadvantages: All operations are on the main library, causing some burden, which can be reduced by adjusting the load parameters
Pattern 3: Modify and test on the slave library
This mode is modified on the slave library.
Gh-ost will still connect to the master library, but all operations will be done on the slave library and will not have any impact on the master library. Gh-ost is also paused from time to time during operations so that data from the library can be kept up to date. — Migrate – on-Replica allows GH-OST to modify tables directly on the slave library. The final switch is also done with a normal copy from the library. –test-on-replica Indicates that the operation is for test purposes only. Replication is stopped before the final switch operation. The original table and the temporary table switch back and forth, and eventually the original table has not been touched. When the master/slave replication is suspended, we can check and compare the data in the two tables.
Disadvantages: In mode 3, the operation of Stop slave is performed during cut over. In order to facilitate the comparison between the shadow table before and after switching and the original table, this time may cause service impact, so it is not suitable for online libraries
4. Configure and install the environment
- Install golang:
yum -y install golang
- Install socat:
yum install -y socat
- Download the latest binary package from github publishing address:
Github.com/github/gh-o…
After decompressing only a binary gh-ost is very concise, almost no dependencies
5. Common parameters
- –chunk-size
The number of rows processed in each iteration, that is, the number of rows copied from the original table to the shadow table at a time. Default: 1000
- –max-load
If the load threshold exceeds this value, the copy operation is suspended and application logs continue until the value falls below this threshold. You need to customize the Settings, such as Threads_running=50 and Threads_connected=200
- –critical-load
Major threshold. When the value reaches this parameter, gh-OST is forcibly exited
- –ok-to-drop-table
Whether to delete the old table after cut-over. If this parameter is not added, the _b_del table mentioned above will be retained
- –initially-drop-ghost-table
Whether to delete the go-ost shadow table and log table XX_gho, XX_ghc before the command is executed
- –initially-drop-socket-file
Whether to delete the socket file generated during the last execution
- –allow-on-master
All operations are performed on the main library
- –dml-batch-size
The number of events contained in each transaction of GH-OST in the Apply Binlog phase. The default value is 10. If the value is set to 1, it means that no grouping is carried out
- –postpone-cut-over-flag-file
When the specified file exists, the cut-over phase of GH-OST will be delayed and the data will still be copied until the file is deleted (automatically created after execution begins).
- –nice-ratio
The sleep time after each copy is 0 by default, that is, it does not sleep. If it is 1, each copy takes 1 second, then sleep takes 1 second. If it is 0.7, then each copy takes 10 seconds, then sleep 7
- –panic-flag-file
When the specified file is created, GH-OST will exit immediately
- –verbose
The output log
- –execute
Alter&migrate confirm that alter&migrate is executed, default to noop, just test and exit
6. Use tests
The test table is an online table with a size of about 10W, which is imported into the test environment 192.168.9.186.186 is the read and write node of the test environment MGR
Gh-ost \ --ok-to-drop-table \ --initially-drop-ghost-table \ --initially-drop-socket-file \ --host="192.168.9.186" \ --port=3306 \ --user="ghost" \ --password="1q2wABC" \ --database="ghost" \ --table="key_inventory" \ --verbose \ --alter="add column test_field varchar(256) default '111';" \ --panic-flag-file=/tmp/ghost.panic.flag \ --allow-on-master \ --postpone-cut-over-flag-file=/tmp/wait.flag \ --executeCopy the code
Log output:
- Copy: 6000/109125 of 5.5%;
109125 refers to the total number of rows that need to be migrated, 6000 refers to the number of rows that have been migrated, and 5.5 % refers to the percentage of rows that have been migrated.
- Applied: 0
Refers to the number of events processed in binary logs. In the example above, the migration table has no traffic, so no log event is applied.
- Backlog: 0/1000
Indicates that we are doing well with binary logging and that there are no Backlog events in the binary log queue.
- Backlog: 7/1000
When copying rows, some events are backlogged in the binary log and need to be applied.
- Backlog: 1000/1000
The buffer for 1000 events is full (1000 event buffer for the program is written to death, 100 event buffer for the lower version). At this point, notice that the amount of binlog writing is very large, gh-OST can not process events, so it may need to suspend the binlog reading, and need to apply the events in the buffer first.
- streamer:binlog.000001:95531478
Indicates that it is currently applied to the binlog file location
- ETA
Estimated time required for completion
It can be seen that the shadow table is changed from the beginning of the table structure, and the OSC idea is consistent
The number of copies is the same as the chunk size
After the cut-over hold flag file is added, the Migrate phase will be delayed until the Flag file is deleted
Observe the structure, content and size of the log table (the log table contains some contents of heartbeat detection)
Changes in the log table when the copy is complete and the log table enters the cut Over phase
Now I’m going to add some data
The Change log changes
Meanwhile, the changes have been successfully applied to the new table
Delete the wait flag file, and the log changes. You can see that the information about the lock table at the last stage of cut-over is also printed
Default SOCK files and automatically generated WAIT files
Pause operation:
Dynamically changing rate limiting parameters: Each time you change the rate limiting parameters, the latest parameters are printed
Recovery operation:
Mysql > select * from ‘go’; mysql > select * from ‘go’; mysql > select * from ‘go’
7. To summarize
Advantages: It can delay switchover, dynamically adjust load parameters, pause and continue at any time, the whole DDL process is controllable, the switchover stage is safe and reliable, the rollback cost is small, and it is friendly to o&M personnel
Disadvantages: Only 1000 transactions are written by the application log program. If binlog is not applied in time, the speed of applying change log cannot keep up with copy table, which is not applicable in high concurrency scenarios.
Although GH-Ost sets the priority of apply binlog over copy row, its apply binlog phase is single-threaded, not MTS, so if the original table is under constant pressure, then GH-Ost DDL will not complete
And pt – online – schema – change
- When the table is not written and the parameters are default, the DDL operation time of the two operations is similar, after all, the operation is copy row.
- Table has a large number of write cases, because PT-OSC is multi-threaded processing, can be executed quickly, while GH-OST is simulated “from” single thread applications, extreme cases, DDL operations are very difficult to complete execution.
It can be seen that although GH-OST does not need triggers and has much less impact on the pressure and performance of the master library, DDL efficiency for high concurrency scenarios is still lower than that of PT-OSC, so it still needs to be processed at the time of low business peak.
8. Reference documents
Github.com/github/gh-o…
www.cnblogs.com/zhoujinyi/p…
Opensource.actionsky.com/20190918-my…
Blog.csdn.net/poxiaonie/a…