preface

Canal is an open source database Binlog log parsing framework of Alibaba. Its main purpose is to provide incremental data subscription and consumption based on incremental log parsing of MySQL database.

In my previous article, QUICK Start of Ali open source MySQL middleware Canal, I have introduced the basic principles and basic uses of Canal.

In the process of deployment to the production environment, as a rookie, and stepped on some pits, during the record and summary, and then solve the next analysis of the reason, then have this article.

Main Contents of this paper

Causes analysis and solutions of three common Canal problems

  • Binlog parsing error: repeated parsing /DML parsing to QUERY
  • Filter invalid: The Filter is invalid
  • Lagging behind: spending is delayed or stuck

Canal trample and cause analysis

Problem: Binlog parsing error repeated parsing /DML parsing to QUERY

This problem is mainly caused by the following typical cases:

  • INSERT/UPDATE/DELETE is resolved as a Query or DDL statement
  • The Binlog is parsed repeatedly, that is, an operation has both QUERY messages and corresponding INSERT/UPDATE/DELETE messages.

These two problems are mainly caused by the fact that Binlog is not a row mode. Let’s review the three modes of Binlog.

Review the three running modes of MySQL Binlog

MySQL uses the Binlog to synchronize data from the master/slave database. However, Binlog can be run in three different modes: ROW, Statement, and Mix.

1. The ROW model

The Binlog records only which record is modified and how it is modified. The details of each row are clearly recorded. Which rows are modified by the Master, and which rows are modified by the slave

Advantages: The row log content clearly records the details of each row modification, making it easy to understand. And there are no specific cases where stored procedures and functions, as well as trigger calls and departures, cannot be copied correctly.

Disadvantages: In row mode, all executed statements are logged as changes per row, which can generate a lot of log content.

2. The Statement mode

Every SQL that modifies data will be recorded in the master’s binlog. During replication, the SLAVE SQL process will parse and execute the same SQL as the original master.

Advantages: The statement mode overcomes the disadvantages of the row mode. It does not need to record the changes of each row, reducing the number of binlog logs, saving I/O and storage resources, and improving performance. Because he only needs to record the details of the statement executed on the master and the context in which the statement was executed.

Disadvantages: In statement mode, it is the execution statement that is recorded. Therefore, to ensure the correct execution of the statement on the slave side, it must also record some information about the execution of each statement, that is, the context information. To ensure that all statements executed on the slave end get the same results as those executed on the master end. In addition, due to the rapid development of mysql, a lot of new functions are constantly added, so that the replication of mysql has encountered no small challenge. The more complex the content involved in natural replication, the more likely bugs will appear. In statement, there are a number of situations that can cause Mysql replication problems. These problems occur when certain functions or functions are used to modify data. For example: The sleep() function is not copied correctly in some versions, the 1680504 function is used in stored procedures, which may result in inconsistent ids on slave and master, etc. Since row logs changes on a per-row basis, this problem does not occur.

3. The Mix mode

MySQL has only used statement based replication until MySQL 5.1.5 started to support row replication. Since 5.0, MySQL replication has solved a number of problems that occurred in older versions of MySQL that did not replicate correctly. However, with the advent of stored procedures, MySQL Replication presents a new and even greater challenge.

In addition to Statement and Row, MySQL offers a third copy mode, Mixed, which is essentially a combination of the two, as of version 5.1.8.

In Mixed mode, MySQL will treat the log form of the record according to the specific SQL statement executed, that is, to choose between statement and row.

Statment, as before, only records statements that are executed. Not all changes are recorded in the ROW mode. For example, statement mode is used to record table structure changes. If the SQL statement is indeed an UPDATE or DELETE statement that modifies data, all row changes will still be recorded.

The following is a look at the effects of the three modes on Canal. In short, they cause problems with Canal’s parse Query.

My client code snippet:

String tableName = header.getTableName();
String schemaName = header.getSchemaName();

RowChange rowChange = null;

try {
    rowChange = RowChange.parseFrom(entry.getStoreValue());
} catch (InvalidProtocolBufferException e) {
    LOGGER.error(Error parsing data changes, e);
}

EventType eventType = rowChange.getEventType();

LOGGER.info("Currently performing operation on table {}.{} = {}", schemaName, tableName, eventType);
Copy the code

After running, you can see the output:

The part marked in red box can be seen as an operation, but it should be because of Mix mode that Canal resolves into two messages, one QUERY and one UPDATE.

The official document actually explains:

Github.com/alibaba/can…

Question 1: Is INSERT/UPDATE/DELETE resolved as a Query or DDL statement?

Answer 1: The main reason for this is that the received binlog is a Query event, for example:

  1. The binlog format is not row. You can view it by running show variables like ‘binlog_format’. For statement/ Mixed mode, DML statements are always SQL statements
  2. After mysql5.6+, use a switch (binlog-rows-query-log-events=true, show variables can also see this variable) for DML statements. Corresponding binlog event for RowsQueryLogEvent, at the same time also has a record of the corresponding row. Ps. The canal to filter can be set through the properties: canal. The instance. The filter. The query. DML = true

When I realized that the problem was Binlog, it wasn’t too big, but it was confusing at first.

Problem: Filter is invalid

Canal provides a filter to filter out tables that do not need to be listened on (blacklists) or to specify tables that need to be listened on (whitelists).

We usually in canal – server side conf/example/instance. To set the properties file:

# table regexcanal.instance.filter.regex=.*\\.. *# table black regex
canal.instance.filter.black.regex=
Copy the code

You can set the rule as follows:

Mysql data parsing concerns tables, Perl regular expressions. Multiple re's are separated by commas (,), and escape characters require double slashes (\\). Common examples: 1. All tables:.* or.*\\.. * 2. All tables under canal schema: canal\\.. * 3. Select * from canal schema where test1 = test1 and test1 = test1 Multiple rules used in combination: canal\\.. *,mysql.test1,mysql.test2 (comma separated)Copy the code

Connector. Subscribe (” XXXXXXX “) can also be used when the client is connected to Canal. To override the Settings at server initialization.

Canal official may have received a little too much feedback that the filter setting was not successful. After canal1.1.3+ version, it will record the filter conditions last used in the log, so that you can compare the filter used to see whether it is consistent with your expectations:

c.a.o.canal.parse.inbound.mysql.dbsync.LogEventConvert - --> init table filter : ^.*\.. *$ c.a.o.canal.parse.inbound.mysql.dbsync.LogEventConvert - --> init table black filter :Copy the code

Possible cause one: The client calls SUBSCRIBE (” XXX “).

If it fails, first check if you have called connector.subscribe(” XXXXXXX “) on the client. Overrides the Settings at server initialization.

Possible cause two: The Binlog mode is not ROW

If Binlog is not in row mode, filter is invalid

The filter condition only applies to data in the row schema (ps. Mixed /statement)

In the case where I received two messages in the screenshot above, the first message was a QURTY and the table name could not be determined, so filtering could not be enabled.

Problem: Consumption lags behind

Canal’s current architecture is single-user consumption. Even if it is a high-availability architecture, in order to ensure the order of binlog consumption, it is still single-user high-availability, that is, after one consumer dies, one of the other consumers who are on standby is started to continue consumption. (This is my understanding of the current version, there may be a new version of concurrent consumption in the future.) You can see below:

In this case, the consuming process may not be able to handle the large amount of Binlog data. Finally will be reflected in the consumption can not keep up, the progress lags behind, even die. In the Issues section of the Canal open Source repository you can see many similar issues reported:

Github.com/alibaba/can…

After I deployed Canal, I ran into data latency issues during peak database write times. Data latency is trivial, but once delayed to the point where it fills up the memory buffer and is not consumed, no new messages can come in.

To further analyze this problem, Canal’s overall architecture is shown as follows:

In the message storage design, Canal uses RingBuffer, with the structure as shown in the following figure:

As can be seen, Canal now caches messages in memory and does not persist data. In addition, the size of cache space must be fixed, so there will be no acknowledgement ACK submitted, resulting in the memory cache is full.

Here are some of the better written and seen analyses of Canal’s consumer backlog, with links to the original text:

Zqhxuyuan. Making. IO / 2017/10/10 /…

Assuming the maximum size of the ring buffer is 15 (16MB in the source code), the above two batches produce a total of 15 elements, just enough to fill the ring buffer. If another Put event comes in, since the ring buffer is full and there are no available slots, the Put operation will be blocked until consumed.

Blog.csdn.net/zhanlanmg/a…

View canal source code, in order to find whether the canal persistence, the file is not in general, only one is found there will be a temporary storage, storage interface CanalEventStore CanalServerWithEmbedded. GetWithoutAck () method. Continue to process the data in the AbstractEventParser class, which starts the thread to submit a copy request to the master until data flows through it. The EventTransactionBuffer add(CanalEntry.Entry Entry) method is called. The put method will cache data in memory. When the cache is full, it will flush. This flush will call the Flush implementation of the TransactionFlushCallback interface. This interface in AbstractEventParser class is an anonymous, it will get rid of the data, will call sink method in consumeTheEventAndProfilingIfNecessary method, It calls the entryeventsink.dosink (List Events) method, which verifies that if the cache is full, it will wait and wait until an empty space is available. So it blocks when the cache is full. This is why canal’s data goes a long way, and if it doesn’t ack it no more data will come in. In addition, as a result of the test methods problems, led to yesterday’s description is not correct, not insert and update has the difference, but my operational problems, because of my operation is the batch update and a single insert, and depends on the size of the cache access to data on the number of article (that is, a master to the slave of the dump is a data), rather than because of the amount of data.

A possible solution is to write the messages to a message queue (such as RabbitMQ/Kafka) and stack the message processing with the message queue to ensure that canal does not get stuck after a large number of messages are piled up, and to support data persistence.

My own guess is that Canal wants professional tools to do professional things. Canal is just a binlog-reading middleware, not a professional message queue, and messages should be handled by professional message queues.

conclusion

Canal’s practical use, especially after a good reading of his documents, shows that there are still many problems and flaws, and that he needs to do a lot of practice and research before he knows what is right for his business. If I encounter more Canal potholes, I’ll keep writing them down.

reference

  • www.jianshu.com/p/fea54a984…
  • www.jianshu.com/p/8259f0813…
  • Blog.csdn.net/keda8997110…
  • Github.com/alibaba/can…
  • ifeve.com/disruptor/
  • Blog.csdn.net/zhanlanmg/a…
  • Zqhxuyuan. Making. IO / 2017/10/10 /…

Pay attention to my

I’m a back-end development engineer. Focus on back-end development, data security, crawler, Internet of Things, edge computing and other directions, welcome to exchange.

I can be found on every platform

  • Wechat official account: A ramble on back-end technology
  • Making:@qqxx6661
  • CSDN: @Pretty three knives
  • Zhihu: @ Ramble on back-end technology
  • Jane: @pretty three knives a knife
  • Nuggets: @ pretty three knife knife
  • Tencent Cloud + community: @ Back-end technology ramble

Original article main content

  • The backend development
  • Java interview
  • Design pattern/data structure/algorithm problem solving
  • Crawler/edge computing/Iot
  • Reading notes/anecdotes/Procedural life

Personal public account: Back-end technology ramble

If the article is helpful to you, you might as well bookmark, forward, in the look ~