Introduction of Kettle
ETL (Extract-Transform-Load
Data extraction, conversion, loadingFor data developers, we often encounter various data processing, transformation and migration, so it is essential to understand and master the use of an ETL tool. Here we will learn ETL tool is Kettle.
What is the Kettle
Kettle is a foreign open source ETL tool with no restrictions for business users. It is written in pure Java and can be run on Window, Linux and UNIX. There is no need to install Kettle green, and the data extraction is efficient and stable. Kettle, in Chinese, allows you to manage data from different databases, put various data into a Kettle, and then flow it out in a specified format. There are two types of script files in Kettle, Transformation and Job. Transformation does the basic Transformation of the data and Job does the control of the entire workflow. The start module under Job has a timing function, which can be timed daily, weekly, etc.
Kettle’s core components
The name of the | function |
---|---|
Spoon | Allows you to design ETL transformations through a graphical interface. |
Pan | A command-line tool to run the transformation |
Kitchen | A command-line tool for running jobs |
Carte | CARTE is a lightweight Web container for setting up a dedicated, remote ETL Server |
- Jobs and transformations can be performed in a graphical interface, but only during the development, testing, and debugging phases. After development is complete and Spoon needs to be deployed to the production environment, the Kitchen and Pan command line tools are used for the actual production environment.
- The deployment production phase is typically executed from the command line, which needs to be placed in a Shell script and scheduled periodically.
- The Kitchen and Pan tools are Kettle’s command-line executor and are just wraps on the Kettle execution engine. They simply interpret the command-line parameters, call them, and pass them to the Kettle engine.
- Kitchen and Pan are very similar in concept and usage, and the arguments for the two commands are basically the same. The only difference is that the Kitchen is used to perform the job and the Pan is used to perform the transformation.
Kettle conceptual model
Kettle execution is divided into two levels: Job (Job,.kjb) and Transformation (.ktr)
Simply put, a transformation is an ETL process, while a job is a collection of multiple transformations, jobs, in which transformations or jobs can be scheduled, timed tasks, etc.
In the actual process, the writing process should not be very complex. When the data extraction needs multiple steps, it needs to be divided into multiple transformations, which can be integrated into a job in order and then executed.
Directory file function description
Download and Installation
Website each version download address: https://sourceforge.net/projects/pentaho/files/Data%20Integration/ domestic Kettle BBS network: https://www.kettle.net.cn/
Kettle is open source software for pure Java programming that requires JDK installation and configuration of environment variables, and can be unpacked and used directly without installation.
Other things to prepare: database drivers, such as under the bin folder in the Kettle root directory.
To open Kettle, just run spoon.bat (Win)/ spoon.sh (Linux/MacOS) to open the Spoon Graph tool.
Start the Kettle
Execute the./spoon.sh command as shown
The welcome page
HelloWorld
Copy the data from the CSV file to the Excel file
CSV file input
Drag “CSV File Input” to the right workspace, double-click to edit, browse and select the prepared test file, and click “Get Field” to automatically get the header information in the CSV file. Input configuration is completed, and output configuration is carried out in the next step.
Excel output
Drag “Excel Output” to the right workspace, double-click to edit, this step is easy, browse to select the output directory and set the file name, complete the configuration.
Convert file
Hold down the Shift + left mouse button to establish the connection and save the transformation configuration
Run the transformation
View the results
conclusion
Have a preliminary understanding of Kettle core components and their use
- Jobs and transformations can be performed in a graphical interface, but only during the development, testing, and debugging phases. After development is complete and Spoon needs to be deployed to the production environment, the Kitchen and Pan command line tools are used for the actual production environment.
- The deployment production phase is typically executed from the command line, which needs to be placed in a Shell script and scheduled periodically.
- The Kitchen and Pan tools are Kettle’s command-line executor and are just wraps on the Kettle execution engine. They simply interpret the command-line parameters, call them, and pass them to the Kettle engine.
- Kitchen and Pan are very similar in concept and usage, and the arguments for the two commands are basically the same. The only difference is that the Kitchen is used to perform the job and the Pan is used to perform the transformation.
Step through a HelloRold procedure
Welcome to pay attention to the public account: HelloTech, get more content