Kettle is an open source ETL tool, implemented in pure Java. It can run on Windows, UNIX and Linux, and provides a graphical interface. It can easily define the data transfer topology by dragging and dropping controls. Basically introduce the Kettle-based MaxCompute plug-in to implement cloud over data.
Kettle version: 8.2.0.0-342
MaxCompute JDBC Driver version: 3.2.8
Setup
- Download and install Kettle
- Download the MaxCompute JDBC Driver
- Place the MaxCompute JDBC Driver in the lib subdirectory under the Kettle installation directory (data-integration/lib)
- Download and compile MaxCompute Kettle plugin:https://github.com/aliyun/ali…
- Place the compiled MaxCompute Kettle Plugin in the lib subdirectory under the Kettle installation directory (data-integration/lib)
- Start the spoon,
Job
We can use Kettle + MaxCompute JDBC Driver to organize and execute the tasks in MaxCompute.
First, you need to do the following:
- The new Job
- A new Database Connection
The JDBC connection string format is: JDBC :odps:? project=
The JDBC driver class: com. Aliyun. Odps. JDBC. OdpsDriver
AliCloud AccessKey ID
Password for AliCloud AccessKey Secret
JDBC configuration to see more: https://help.aliyun.com/docum…
MaxCompute can then be accessed through SQL nodes as required by the business. Let’s take a simple ETL procedure as an example:
The CREATE TABLE node is configured as follows:
Note:
- Here the Connection needs to be selected as configured
- Do not check Send SQL as Single Statement
Load from OSS node configuration is as follows:
The point to note is the Create Table node. For more usage of the Load, see: https://help.aliyun.com/docum…
The Processing node is configured as follows:
The point to note is the Create Table node.
Transformation
We can use the MaxCompute Kettle Plugin to implement data flowing out or into MaxCompute.
Create a new Aliyun MaxCompute Input node with the following configuration:
Create an empty table in MAXCOMPUTE with the same schema as TEST_PARTITION_TABLE.
Create a new Aliyun MaxCompute Output node with the following configuration:
When the Transformation was performed, the data was downloaded from test_partition_table and uploaded to test_partition_table_2.
other
Set the MaxCompute flags
Before executing DDL/DML/SQL, set key=value; Way to configure Flags.
Script mode
Temporarily unable to support
The original link
This article is the original content of Aliyun, shall not be reproduced without permission.