preface

Relay log similar to the Binary log, Relay log is a group of files that contain database change events, plus related INDEX and MATA files. Refer to the official documentation for details. Enabling relay log for an upstream DM has the following advantages over not enabling relay log:

  • If relay log is not enabled, each subtask will connect to the upstream database to fetch binlog data, which will cause great pressure on the upstream database. However, after relay log is enabled, only one connection needs to be created to fetch binlog data locally. Each subtask can read the local relay log data.
  • Upstream databases typically have an expiration date on the binlog or purge the binlog to clean up space. If the RELAY log is not enabled, if the DM synchronization progress is backward and the binlog is cleared, the synchronization fails and full migration can only be performed again. After the relay log function is enabled, binlog data is pulled and written to the local PC in real time, regardless of the current synchronization progress, effectively avoiding the previous problem.

However, in DM <= V2.0.7, the following problems occur after relay log is enabled:

  • The latency of data synchronization increases significantly compared to that without relay log enabled. The following table shows the test results of a single task benchmark, which shows that the average latency increases significantly. The following table is marked with “. We start with delayed percentile data.

  • After relay is enabled, CPU consumption increases. (Due to latency increases, in some simple scenarios (for example, only one task), the resource usage decreases compared with that without relay log enabled. However, as the number of tasks increases, the CPU cost of enabling relay increases.

Due to the above problems, in the new version, we have made some performance optimizations for DM relay log.

Current Relay implementation

Before the introduction of specific optimization, first of all, a brief introduction of the current DM relay implementation, detailed implementation, details can refer to the DM source read series article (six) relay log implementation, this article does not do too much description.

The relay module can be divided into two parts: Relay Writer and relay Reader. The structure of the relay module is as follows:

Relay writer

Relay writers do the following three things in sequence:

  1. Select * from MySQL/MariaDB; select * from MariaDB;
  2. Convert the read Binlog events using binlog Transformer;
  3. The converted Binlog event is stored in the local relay directory as a relay log file using a binlog writer.

Relay reader

After relay is enabled, Syncer gets binlog events from the relay reader. The relay reader does the following:

  • Read the binlog file in realy Directory and send it to syncer;
  • At the end of the file, the system periodically (currently every 100ms) checks whether the current binlog file size and meta file contents have changed. If they have changed, the system either continues reading (binlog file changes) or switches to a new file (meta file changes).

Relay readers and relay writers are independent of each other. They interact with each other through binlog, meta, and index files in the relay directory.

Test Environment description

Before we get into tuning, let’s take a look at the environment used for tuning

  • Upstream MySQL, version 5.7.35-log;

  • Downstream is single-instance TiDB, version 5.7.25-TIDB-V5.2.1;

  • The DM uses 1 master and 1 worker

    • Select * from master branch where hash = d2dc22d and hash = d2DC22d
    • Relay reconstruction branch with CPU benchmark version 2021-11-01 (commit hash 9f7CE1d6)
  • In small-scale test (<= 4 task), MySQL/TiDB/DM runs on the same host, which is convenient for testing. The host environment is 8-core 16G.

  • In the large-scale test (> 4 tasks), MySQL, TiDB, and DM were respectively running on a host with 8 cores and 16 GB

  • The migration delay is measured using the downstream self-update time column, and the third way to calculate the delay time for DM to synchronize data to TiDB is to refer to various methods in detail.

Latency optimization

As you can see from the “Current Relay implementation” section above, there are two points that can affect latency:

  • The implementation of the current relay reader timing check itself affects latency. In the worst case, a binlog event can be synchronized downstream at least 100ms later.

  • The relay writer writes to disks, and the relay Reader reads from disks. Does this latency affect latency?

    • Research has found that Linux systems (and Mac OS systems have a similar mechanism) have a page cache mechanism, the recently written file is read not from disk, but from the OS memory cache, so the theoretical impact is limited.

After investigation and testing of the above problems, we concluded two solutions:

  • Cache in memory the binlog that the relay writer prepares to write in the recent period. If the relay Reader requests this binlog, the relay Reader reads it directly from memory.
  • The Relay Reader still reads files. The Relay writer notifies the Relay Reader when a new event is written.

Scheme 1 requires switching between read memory and read file according to the downstream write speed, which is relatively complex to implement, and because of the OS layer page cache, adding another layer of cache by the application itself has limited impact on latency.

In solution 2, we do some preliminary tests. When increasing the frequency of relay reader check, enabling relay can achieve latency at least when relay is not enabled. After investigating the MySQL relay log, we find that it is also by reading files. So we chose plan 2.

The implementation is relatively simple. Add a Listener to the Relay Writer, notify the Listener when a new binlog event occurs (send a message to a channel), and then in the Relay Reader, Change the timed check to listen for messages in a channel.

Latency is at least 1 latency in table 4 task. Latency is at least 1 latency in Table 4 task.

CPU optimization

After the Latency optimization is completed, we also test CPU usage after enabling relay log and find that CPU usage is relatively high after enabling relay log. The following is the test result in Table 4 task (Note: It can be seen that CPU consumption increases significantly after relay is enabled, and spikes become larger:

Using golang’s own Pprof to make a CPU profile, it can be seen from the following figure that the main parts are syncer/relay Reader /relay Writer. After comparing the code logic, it can be found that:

  • Relay Reader uses go-mysql’s ParseFile interface, which reopens the file each time it is called and reads the first FORMAT_DESCRIPTION event, the location of the first blue notation in the image below;
  • Since relay readers and writers are independent from each other, at least one channel can be used to notify at least one new binlog write. The new binlog may have been read in the last read. This results in many invalid checks for meta and index files.

For the above problems, we made the following optimization:

  • Use go-mysql’s ParseReader to eliminate repeated open and read costs;
  • Reconstruct the relay module to integrate the relay writer and reader together to facilitate communication between them. After receiving the notification through a channel, the relay reader checks whether the file being written is the same as the file being read, that is, whether the file is in active state, and obtains the position of the file being written. You can avoid checking invalid meta and index files.

It can be seen from the following figure that the CPU is greatly reduced after optimization, but the spikes are still large:

Since the sysbench we tested generated write events at a fairly steady rate, there was no special execution code in DM, and Golang was a compiled GC language, we guessed that the spikes were mainly from GC, but this was not obvious from the CPU profile. See below:

Use GODEBUG=gctrace=1 to enable GC logging.

  • When relay is enabled, resident memory is nearly double in size (239 MB -> 449 MB) and the total Heap space is nearly double in size.

    • After investigation, this problem is due to memory leakage caused by TIDB embedded in DM, which has not been dealt with for the time being.
  • After relay is enabled, the CPU occupied by GC increases significantly, especially the background GC time and idle GC time.

Below is the flame map of the ALloc_space section of the heap profile optimized above:

Note: The heap profile of pprof is a profile of the program up to now, not over a period of time, so you can also see some resident memory applications in the figure below.

Through the Heap profile and comparing the code, we found the following points that can be optimized:

  • When go-mysql parses binlog events from files, each event is reassigned a bytes.buffer, which is expanded as it reads. After optimization, one buffer pool is used to reduce the overhead caused by continuous expansion.

  • After is used for heatBeat events in Local Streamer. This interface is simpler, but the channel created by this interface is released only when the timer is triggered. In addition, the Local Streamer read event is a high frequency call, and creating a Timer channel for each call is also expensive. After optimization, it changed to reuse timer;

  • Relay uses a Timeout Context to read events from upstream. Each read creates an additional channel, which is not necessary in the current scenario. After optimization, the timeout context is removed;

  • Relay readers and Relay writers write debug logs without detecting the log level, and some Field objects are created each time. Although the Field objects are not large, they are invoked frequently and incur some overhead. After optimization, log level determination is added to debug logs that are frequently invoked.

    • Note: THE ZAP Logger used by DM to write logs has good performance. It is generally no problem to call log.debug directly under non-high-frequency calls.

After optimization, the result is as follows, which can be seen that the CPU is reduced quite a lot, and the spikes are also much less:

The following figure shows the test results in the 20 Table 20 Task scenario:

Legacy issues & future work

After the above optimization, there is a small difference in latency between enabling relay and not enabling relay, and CPU growth is also at a relatively low level. However, there are still some points that can be further optimized, which are expected to be gradually added in the subsequent version, as follows:

  • IO.CopyN is used by go-mysql to read files. This function will apply a small object internally.

  • Some optimizations for both no relay and relay are not done this time, such as the timeout Context created when the Streamer reads the Event;

  • Reading multiple readers from the same file is currently expensive, and retuning is possible:

    • Reduce the number of reads, such as one reader reading, other readers reading memory and so on, or increase the memory cache as previously envisaged;
    • Merge tasks in the same downstream to reduce the number of tasks.