2 use the Spoon,

After downloading the Kettle file from the official website, decompress the file.

After unpacking, you get the Data-Integration folder. To go in, click on the spoon.bat script file.

At this point, if the JDK has not been configured on your computer before, the program will report an error. Make sure to configure the Java software development kit before running, which is the only requirement for using Kettle.

Initial interface, as shown below:

In Spoon, users can design the Transformation and Job processes in the right pane using the component tree on the left, and View the results in the Log View pane.

2.1 Kettle Resource Library Management

This tutorial is based on the latest version of the current (V8.2) written, I have been using the version (V6.0) skip a full version. During this time, many functions have changed, such as the resource library. In order to save unnecessary explanation, or what is wrong in the introduction, please forgive me.

2.1.1 Resource Library Types

V8.2 has three types, one more than V6.0, namely Pentaho Repository. Database Repostitory and File Repository are classified as Other repositories.

  • Pentaho Repository: A plug-in (Kettle Enterprise Edition) that is actually a content management system (CMS) that has all the features of an ideal Repository, including version control and dependency integrity checking.
  • Database Repository: Stores all designed ETLE information in a Database, including Database connections, jobs, transformations, and configurations, for easy saving, management, and remote scheduling.
  • File Repository: Defines a resource Repository in a File directory because Kettle uses a virtual File system (Apache VFS). So the file directory here is a broad concept, including ZIP files, Web services, FTP services.

The Database Repository resource is often used.

2.1.2 Creating a Database Repository

1) Click “Connect” on the right of the toolbar, and then click “Other Repository” in the popup box:

2) Select Database Repository and click Get Started:

3) Configure connection information:

4) Create a database connection for the first time:

MySQL > create database connection;

Note that when you click test, you should get an error for the first time: the Jar package for the database connection cannot be found.

Just go to the MySQL website and download mysql-connector-java-5.1.47.jar, place it in \data-integration\lib, and restart Spoon.

Some of you might say, why not introduce this part in the first place, otherwise you wouldn’t have to restart Spoon and go through all the steps above? Why is that? Friends, please forgive me, introduce downloading and placing Jar packages in advance, you certainly only know why and why. Since you probably use more than MySQL, you’ll run into this problem again with other databases, and you don’t know why. So please do it again and deepen your impression.

After the restart, test again, should be able to get the following popbox, congratulations ~~

6) After the configuration is successful, select Connect Now:

Before this page is displayed, the Kettle database automatically creates tables starting with R_* in the database. Here is a partial excerpt for illustration. These tables will be explained later if they are used

7) Enter the user name and password. The default value is admin and admin.

You might wonder, what’s the default username and password? I’ll have to change it for safety. So how do you fix that?

Table R_USER (Kettle) = R_USER ();

You can see that there are two users in this table, namely admin (administrator) and guest (read-only user).

The password is encrypted, I do not know the encryption method, so I can not change through SQL, haha.

2) Interface modification is a bit wordy:

  • If you choose Tools on the menu bar, go to Resource Repository. (Remember to log in to Database Repository as user admin; otherwise, the submenu of the resource Repository is dimmed.) => Explore Resource Repository or press Ctrl+E.
  • Click the Security Tab in the dialog box to modify, add, and delete users.

This concludes how to create a Database Repository.

After the connection is successful, the saved Job and Transformation are directly saved in the database.

2.1.3 Creating a File Repository

Creating a File Repository is easier than creating a Database Repository. You only need to fill in the name of the Repository and the corresponding path to save the File:

The directory I selected here is local, in fact, it should be able to choose the server in the LAN (haha ~ I did not verify ~).

Of course, Kettle also initializes something in that directory so that it can read and manage.

There is no verbal validation of the connection, because you only save the Job and Transformation in the directory you created.

2.1.4 Why use a Resource Library

When no repository is connected, defined jobs and transformations can only be saved on local disk as. KJB and. KTR files.

If a resource library is used, the defined Job and Transformation are stored in the resource library. In effect, a repository is a database, such as a MySQL database, that stores metadata related to the elements defined by a Kettle.

In simple terms, is the metadata database, convenient management and coordination work.

If the repository is created, the repository information will be stored in the repositories.xml file in the hidden.kettle directory of your default home directory. In Windows, the path is C:\Users\ username.kettle.

A screenshot of the previous information about creating a Database Repository:

In addition to the different way to save, the menu bar and some options also have some differences, here is not introduced first, to remind the use of careful observation.

** Pentaho Repository ** Not like, or no introduction, because I haven’t used, ha ha ~