Kettle Real-life
Kettle Introduction: Kettle is an open-source ETL tool written in Pure Java and runs on Windows, Linux, and Unix to extract data efficiently and stably. MATT, the lead programmer on the project, wants to put all kinds of data into a Kettle and then flow it out in a specified format. Kettle is an ETL toolset that allows you to manage data from different databases by providing a graphical user environment that describes what you want to do, rather than how you want to do it. There are two types of script files in the Kettle: Transformation and Job. Transformation converts basic data, and Job controls the entire workflow. (Quoting Baidu Baike)
[TOC]
preface
The KETTLE is not that difficult !!!!
There are a lot of people looking at a huge number of controls is not start, this control is what, that control is what? Why does this thing do these things
KETTLE can be used in all scenarios involving data integration, conversion, and migration. It replaces manual coding to complete data conversion tasks, reducing the difficulty of development. Everyone can be a developer and do big data.
The following is a set of original learning processes
In addition, I also have a more easy to understand video tutorial, full strength level development, focus on [JAVA master] reply [kettle] keyword acquisition
Install and run the first instance
1. Download and install Kettle (the Kettle version used in this document is 7.0.0.0-25)
2. Start the kettle. Before starting the kettle, ensure that the JDK is installed on your PC. Double-click spoon. Bat to go to the kettle main screen.
3. Create a transform
1) Create a transformation
2) After the transformation is created, enter the interface as shown below. The core object is displayed on the left, and the white box on the right is the main interface
3) Click input, enter the interface below, select table input
4) Drag the table input to the main screen
5) Double-click table input to enter the main interface of table input
6) Modify the step name and click Add to modify database connection to enter the new interface. After input, click OK.
During the process, a message may be displayed indicating that the OJDBC JAR package is missing. Download the JAR package and put it in the lib folder.
7) Enter the main interface, double-click the table input, click Preview, click OK to display the table content.
The database automatically generates files
In my daily work, I will encounter the following situations:
1. Automatically generate TXT or EXCEL files (mostly needed in telecom industry) and upload them to a server;
2. Data interconnection between the two parties and database view provided by the other party
1. Automatically generate EXCEL files
1) As shown above, select table input (read view or table data), EXCEL output (prepare EXCEL file for output), and connect the two controls.
2) Configure the database connection (JDBC was described in the previous section), which can then be made through JNDI and implemented through configuration (as in springMVC configuration).
Write the query SQL statement and click Preview to check whether the SQL statement has an error. If no error occurs, the following preview interface is displayed
3) Double-click EXCEL output to enter The EXCEL configuration interface, as shown below
4) Click Browse to configure the path and file name of the generated file, and specify the date format to add the date after the file name, which is very suitable for periodically generating files
5) Select the content page. Append means that the append operation will be performed after the file and the source file will not be overwritten
6) Click “Get the field”, the table fields queried in the previous steps will be automatically filled in, and the generated ID is of type BigNumber. If you want a numeric id, you can change it to Integer. After all the fields are filled in, click “Ok”
7) Hit Run
8) The following EXCEL files will be automatically generated
2, automatically generate TXT files
1) As shown in the figure above, change Excel into TXT. After shift connection, double click the left mouse button to open the configuration and enter the file name
2) switch to the content, more common is to choose the delimiter, this paper use the common | instead,
3) Field options, obtain the field, select the corresponding field format, and click OK
4) As shown in the figure, the generation is complete
Two methods of synchronizing a KETTLE table
In ordinary work, will encounter this situation, and very common:
Reading multiple views or tables of the other party and writing to a table in the target library involves synchronization of multiple tables.
Multi-table synchronization can be implemented in the following two ways, and the author will introduce the implementation methods respectively:
Method 1: After associated query of multiple tables, add one table
1, according to the data of these two tables, table has student table and class table, write the target table: student class table
2, select table input, double-click table input, write associated SQL statement in the SQL column, click preview, query relevant data, click OK
3. In the core object, select Table Output, hold down the Shift key, mouse to connect table input and table output, double-click table Output, select database connection, select target table and commit quantity, and click OK4. Click the “Run” button to write successfully 5. The verification is successful and the data has been written into the relevant table
Method 2: As shown in figure 2, you need to create table input 1 (student), table input 2 (class), recordset join (join student and class table), table output (write to target table)
1, table input 1, get the corresponding write field
2. Enter 2 in the table to obtain the corresponding write fields of the class
3, Table input 1 and table input 2, hold shift connect recordset connect control, click recordset connect control, enter connection field 1 and connection field 2, select connection type inner
4, finally join the table output, select the database field (must be consistent with the target table table structure), click OK
5, double-click the operation, the execution is completed, to the database verification
Data timing automatic (automatic extraction) synchronization operation
1, in order to give you a more intuitive display, first create 4 tables in the database, table structure is as follows:
- T_student_kettle Student data source table;
- T_student_kettle_target Student target data table;
- T_class class data source table;
- T_class_target Class target data table;
- T_tbrz Synchronization log table
Create table Create table T_STUDENT_KETTLE (ID INTEGER, name VARCHAR2(2000), sex VARCHAR2(2000), age INTEGER, cjsj DATE, zhgxsj DATE default sysdate ) tablespace MYSPACE pctfree 10 initrans 1 maxtrans 255 storage ( initial 64K next 1M minextents 1 maxextents unlimited ); -- Create table Create table T_STUDENT_KETTLE_TARGET (ID INTEGER, name VARCHAR2(2000), sex VARCHAR2(2000), age INTEGER, cjsj DATE, zhgxsj DATE default sysdate ) tablespace MYSPACE pctfree 10 initrans 1 maxtrans 255 storage ( initial 64K next 1M minextents 1 maxextents unlimited ); Create table Create table T_CLASS (id NUMBER, class VARCHAR2(100), CJSJ DATE, zhgxsj DATE ) tablespace MYSPACE pctfree 10 initrans 1 maxtrans 255 storage ( initial 64K next 1M minextents 1 maxextents unlimited ); Create table Create table T_CLASS_TARGET (id NUMBER, class VARCHAR2(100), CJSJ DATE, zhgxsj DATE ) tablespace MYSPACE pctfree 10 initrans 1 maxtrans 255 storage ( initial 64K next 1M minextents 1 maxextents unlimited ); -- Create tablecreate table T_TBRZ(id NUMBER, --id TBCGSJ DATE, -- successful synchronization time (end time) TBKSSJ DATE, -- Synchronization start time BM VARCHAR2(100), -- synchronization table name TBJG CHAR(1) Tablespace MYSPACE PCtfree 10 INITrans 1 maxtrans 255 storage (Initial 64K next 1M minextents 1 maxextents unlimited ); Create sequence SEQ_T_TBRZminvalue 1maxValue 99999999start with 81increment by 1cache 20;Copy the code
2. Configure the database connection address. Generally, the project will directly connect to the database by configuring the JNDI data source, not configuring the JDBC data source, which is similar to the Java configuration
3. Set up the database connection through JNDI
4. Overall operation process
It takes one job and six transformations to operate on two tables, one table is three transformations to complete, and the total number of transformations for the tables is N*3 transformations.
The following figure shows the process of the whole operation:
Start is a class table and student table. Start is a class table and student table.
1) SET T_CLASS data variable
As shown in the figure above, the T_CLASS data variable is set. This step is to get the successful time of the last synchronization as the start time of the next synchronization and set it into the environment variable for subsequent data flow calls
2) T_CLASS data variable setting
As shown above, the T_CLASS data update step, which is to get the data since the last successful synchronization time, inserts the update into the table.
3) T_CLASS saves synchronization logs
As shown in the figure above, T_CLASS saves synchronization logs. This step is to save the synchronization success time of the current synchronization and insert it into the synchronization log table to prepare for the start time of the next synchronization. The SQL is as follows
View Code
3) T_CLASS synchronization succeeds
As shown in the figure above, after the T_CLASS synchronization step succeeds, write the successful step, indicating success.
5. Data synchronization in the T_STUDENT_KETTLE table is the same as that in the T_CLASS table. There are three conversion steps
5. Run the conversion, double-click Start to set the scheduled scheduling of jobs, and run the conversion after completing the setting
6. Running results. Different results are displayed for successful running and failure running, which can be used to troubleshoot errors
Note: Due to some limitations, the latest source code and subsequent general configuration implementation data extraction have been placed on the author’s official account. Please follow wechat official account: [JAVA Master], reply keyword: [instance], obtain the kettle installer and running instance (table structure and KJB and KTR files).
At the same time, the author created a set of video tutorials: now free distribution of 100 sets, first come first served. How to obtain: Scan the qr code below, pay attention to obtain