1 introduction

Datax3.0 is an offline synchronization tool for heterogeneous data sources. It supports multiple types of data sources and can support complex business scenarios. This article mainly describes its installation and various ways to use.

2 Tool installation and deployment

2.1 Installing JDK 1.8

For details, see JDK 1.8 installation and configuration tutorials.

2.2 Installing Python 2.x

For details, see the Python 2.x installation and configuration tutorial.

2.3 Installing Maven 3.x

For details, see Maven 3.x installation and configuration tutorial.

2.4 installation DataX

There are two ways to install DataX, please install according to your personal preference.

2.4.1 Downloading the DataX Tool Package:

Download link below:

Dataxopensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.g…

DataX source code:

2.4.2 Compiling and Installing

The Datax toolkit may have some version incompatibility issues, and you may need to modify the source code if necessary. You can deploy the Datax toolkit by compiling the source code, and the source code address is as follows: github.com/alibaba/Dat… The package is located in the target directory of datax-core. The directory structure is the same as above except that there is no plugin folder. You need to create the plugin folder by yourself and drag it into the directory.

3 DataX using (python command start)

The following steps are required to use DataX:

3.1 Determine the data synchronization scenario and synchronize data

Here is a little

3.2 Compiling a Job Configuration File for a Synchronization Task

Here is a sample file template:

Table 3.3 synchronization

Datax-writer table2 (channel); datax-writer table2 (channel);

3.4 Going to the bin directory in the DataX

After compiling the JSON configuration file, go to the bin directory of the Datax.

3.5 perform datax. Py

#Python datax. Py/Users/zcy/Desktop/copy of the mysql2mysql. Json
Copy the code

3.6 Viewing the Running Result

The following figure shows the result of CPU monitoring and task execution:

! [image.png](https://p1-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/f372a20711cb491eb73e5edd37c12cfd~tplv-k3u1fbpfcp-watermark .image)

DataX source code run and debug locally

4.1 Obtain DataX source code

First, pull datax3.0 source code from Git and find the main method under Engine class, which is the entry of the program.

4.2 Modifying COMMAND Line Parameters

Datax. home(the local installation path of the datax), now is the current time; Specify the input parameters :job(path to the JSON configuration file), jobId(default: -1), and mode(task mode, such as standalone mode). The specific transformation is as follows:

public static void main(String[] args) throws Exception {
    int exitCode = 0;
    / / the path of the datax
    System.setProperty("datax.home"."/Users/zcy/datax");
    // The current time
    System.setProperty("now".new SimpleDateFormat("yyyy/MM/dd-HH:mm:ss:SSS").format(new Date()));// Replace placeholders in job
    //String[] datxArgs = {"-job", dataxPath + "/job/text.json", "-mode", "standalone", "-jobid", "-1"};
    Json configuration file path, jobId and mode configuration
    String[] datxArgs = {"-job"."/Users/zcy/Desktop/ learning target /job/mysql2mysql copy. Json"."-mode"."standalone"."-jobid"."1"};
    args = datxArgs;
    try {
        Engine.entry(args);
    } catch (Throwable e) {
        exitCode = 1;
        LOG.error("\n\n According to DataX intelligence analysis, the most likely cause of error for this task is :\n" + ExceptionTracker.trace(e));

        if (e instanceof DataXException) {
            DataXException tempException = (DataXException) e;
            ErrorCode errorCode = tempException.getErrorCode();
            if (errorCode instanceof FrameworkErrorCode) {
                FrameworkErrorCode tempErrorCode = (FrameworkErrorCode) errorCode;
                exitCode = tempErrorCode.toExitValue();
            }
        }

        System.exit(exitCode);
    }
    System.exit(exitCode);
}
Copy the code

4.3 start the DataX

Start the main method. The JSON configuration file is written in the same way as above.

4.4 Viewing Results

Observe that the information printed by the console is basically the same as above.

5 Java code starts DataX

  1. At present, datax does not support distribution (you can solve this problem by writing a scheduling system later. At present, other programs need to call Datax to run, and need to introduce the JAR package of Datax. Use maven command clean+install to compile and package the datax.
  2. The Maven project introduces the JAR package of datax-core to implement the datax call.
  3. Datax is started by calling the Entry method of the Engine class and introducing arguments in the same way as above. Examples are as follows:
/ / the core packages that were introduced datax import com. Alibaba. Datax. Core. The Engine; import java.text.SimpleDateFormat; import java.util.Date; public class aaa { public static void main(String[] args) throws Throwable{ System.setProperty("datax.home", "/Users/zcy/datax"); System.setProperty("now", new SimpleDateFormat("yyyy/MM/dd-HH:mm:ss:SSS").format(new Date())); / / replace placeholders in the job / / String [] datxArgs = {" - a job, "dataxPath +"/job/text. Json ", "- mode", "standalone", "- the jobid", "1"}; String[] datxArgs = {"-job", "/Users/zcy/Desktop/stream2stream.json", "-mode", "standalone", "-jobid", "-1"}; Engine.entry(datxArgs); }}Copy the code

6 Precautions

  • Datax3.0 does not support mysql 8.0 or later. The main reason is that the mysql driver jar package configured in Datax3.0 is version 5, and the driver name has been changed after 8
  • If you must use mysql 8 or later, you can update the jar packages in the mysqlReader and mysqlWriter plugins. As shown in figure:

In DataX plugin directory, drop mysql into libs folder

  • In this way, no local startup error is reported temporarily, and the subsequent situation is not determined. The full solution is to modify the source code, first change the JAR package version in the POM file, then change the mysql related configuration to mysql8.0.26 configuration, and then repackage. As shown in figure:

Changing the driver name

Change the properties of zeroDataTimeBehavior

You can search mysql globally to find some properties of mysql, and make changes to make it conform to mysql 8 or later, completely solve the problem.