This is the third day of my participation in the August Text Challenge.More challenges in August

Service data Collection

process

  1. Users generate service data
  2. Back-end services are stored in Mysql
  3. Open the Mysql binlog
  4. Use Maxwell to send business data to Kafka

Maxwell’s introduction

Maxwell is a Real-Time MySQL capture software written in Java and open source by Zendesk. Read the MySQL binary log Binlog in real time and generate JSON messages that are sent as producers to Kafka, Kinesis, RabbitMQ, Redis, Google Cloud Pub/Sub, files, or other platform applications.

Maxwells-daemon. IO /

Maxwell has the same functionality as Debezium and Canal

Maxwell is selected because a row of data generates a JSON file, which has a simple structure and no table field types. It is clear and meets business requirements.

Maxwell’s theory

It’s very simple to pretend to be a slave and copy data from the master

1. MySQL primary/secondary replication process
  • The Master will write the changes to the binary log

  • Slave sends dump protocol to mysql master to copy binary log events from master to its relay log.

  • Slave reads and redo events in the relay logs from the library and synchronizes the changed data to its own database.

2. MySQL的binlog
  1. What is the binlog

    The binary log of MySQL is the most important log of MySQL. It records all DDL and DML statements (except data query statements) in the form of events, and contains the elapsed time of the statement execution. The binary log of MySQL is transaction safe.

    Generally speaking, there is a performance loss of about 1% when binary logging is enabled. Binary has two most important usage scenarios:

    ø First: MySQL Replication enables binlog on Master and Master passes its binary log to Slaves to achieve master-slave data consistency.

    Data recovery is natural, by using the mysqlBinlog tool to restore data.

    Binary log files are classified into two types: Binary log index files (file name extension:. Index), which record all binary files. Binary log files (file name extension:.00000*), which record all DDL and DML statements (except data query statements) of the database.

  2. The opening of binlog

    1. Find the location of the MySQL configuration file

    2. Linux: /etc/my.cnf

      If the /etc directory does not contain the location, you can locate my.cnf to find the location

    3. In the mysql configuration file, modify the configuration

      In [mysqld] block, set/add log-bin=mysql-bin

    123456. Each time mysql restarts or reaches the threshold for the size of a single file, a new file is created, and the file is numbered in order.

  3. Binlog classification Settings

    Mysql binlog has three formats: STATEMENT,MIXED, and ROW.

    In the configuration file you can choose to configure binlog_format = statement | mixed | row

    Differences between the three formats:

    1. statement

      At the statement level, binlog records the statements that perform write operations one at a time.

      Compared to row mode, it saves space, but can produce inconsistencies, such as

      update  tt set create_date=now()

      If the recovery is performed using binlog logs, different data may be generated due to different execution times.

      Advantages: Saves space

      Disadvantages: Data inconsistency may occur.

    2. row

      At the row level, the binlog records the change of each row after each operation.

      Advantages: Maintain the absolute consistency of data. Because no matter what the SQL is and what function it refers to, it only records the effect of execution.

      Disadvantages: Takes up a lot of space.

    3. mixed

      The updated version of Statement resolves to some extent the problem of inconsistent statement schema

      The default is still statement, in some cases such as:

      When a function contains UUID(); When a table containing the AUTO_INCREMENT field is updated; INSERT DELAYED; When using a UDF;Copy the code

      Will be processed as ROW

      Advantages: space saving, while taking into account a certain consistency.

      Disadvantages: There are still some inconsistencies in very rare cases. In addition, statement and mixed are not convenient for cases where binlog monitoring is required.

    Based on the above comparison, Maxwell wanted to do monitoring analysis and chose row as the appropriate format

implementation

1. Configure the MySQL
  1. Configuration binlog

    sudo vim /etc/my.cnf
    Copy the code
    server-id=1
    log-bin=mysql-bin
    binlog_format=row
    binlog-do-db=tmall
    Copy the code

    Note: Binlog-do-db is modified on its own to specify the specific database to be synchronized

  2. Restart MySQL for the configuration to take effect

sudo systemctl restart mysqld
Copy the code
2. Configuration Maxwell
  1. Download the Maxwell

    Maxwells-daemon. IO /

    Github:github.com/zendesk/max…

    I’m using version 1.25.0

  2. Unpack the Maxwell

    Tar -zxvf /opt/software/ Maxwell-1.25.0.tar. gz -c /opt/module/Copy the code
  3. Initialize the Maxwell metadata database

    1. createmaxwellThe database
      CREATE DATABASE maxwell ;
      Copy the code
    2. Assign an account to operate the database
      GRANT ALL ON maxwell.* TO 'maxwell'@'%' IDENTIFIED BY '666666';
      Copy the code
    3. Assign this account to monitor permissions on other databases
      GRANT SELECT ,REPLICATION SLAVE , REPLICATION CLIENT  ON *.* TO maxwell@'%';
      Copy the code
  4. Configuration Maxwell

    1. Copying configuration Files
      CD/opt/module/maxwell - 1.25.0 / cp config. The properties. The example config. The propertiesCopy the code
    2. Modifying a Configuration File
      # configuration kafka producer = kafka kafka. The bootstrap. The servers = hd1:9092, hd2:9092, hd3:9092 # need to add the topic kafka_topic = ods_base_db_m # Mysql host=hd3 user=maxwell password=666666 client_id=maxwell_1Copy the code

    Note: The default is to output to a single Kafka partition of the specified Kafka topic, as multiple partitions running in parallel may disrupt the order of the binlog

    To increase parallelism, first set the number of partitions in Kafka > 1, and then set the producer_partition_by property

    Optional value producer_partition_by = database | table | primary_key | random | column

  5. Start the Maxwell

    /opt/module/maxwell-1.25.0/bin/maxwell --config  /opt/module/maxwell-1.25.0/config.properties >/dev/null 2>&1 &
    Copy the code
  6. Test Maxwell

    CD /opt/module/kafka_2.11-2.4.1/ bin/kafka-console-consumer.sh --bootstrap-server HD1:9092 --topic ods_base_db_mCopy the code

The column continues to be updated at 👇🏻👇🏻👇🏻👇🏻👇🏻👇🏻👇🏻👇🏻