Why the Kettle column?

After using Kettle for nearly 5 years, I have deeply realized the convenience brought by the tool, especially for Java engineers, which is almost a welfare. In the spirit of open source, I decided to share my years of experience with fans, hoping to grow together with them.

Why use the Kettle tool? For programmers, most of the work is to use code to solve business problems, the core of the process of solving business problems is the processing of business data. In plain English, processing data through code has achieved the purpose of solving business problems.

As you can see, the programmer’s main job is to process data. The first problem is data format, format differences, that is, a variety of heterogeneous data, which will lead to the need for a lot of coding to solve the problem of different formats (such as XML format, JSON format, EDI format, TXT format, etc.). Unifying data formats is hard to do, so is there another way to do it besides hard coding? Of course there is. That’s using kettle. The second headache is that data sources come from different systems, local documents (Word, Excel, TXT, etc.), or other types of databases (hbase, Access, ElasticSearch, etc.). If the data from these different sources needs to be consolidated and formatted, it will cause a lot of development work, so is there another solution? Of course there is. That’s Kettle. These are the basic capabilities of Kettle, and more powerful uses will be covered in more detail in a future tutorial.

What’s a kettle?

  • What is the ETL

ETL (extract-transform-load) is the process of data extraction, transformation and loading. It is the idea of taking data from different sources, processing it (format, protocol, etc.), and making the processed data available to other systems. Of course, this process is software research and development, especially back-end research and development of the most core work.

ETL process

  • Kettle concept

Obviously, whatever liquid is in the kettle will eventually flow smoothly out of the spout. For example, no matter what type of data the kettle contains, it will eventually flow out of the spout in a specific format. In fact, ETL is an implementation of the idea, it is written through the Java language, adhering to the ETL idea of the tool. Since this is implemented in Java, it must be cross-platform. Kettle Official website

The initial kettle

  • Kettle of

Since the Kettle is a tool, it needs to process heterogeneous data from different data sources. Therefore, it needs to be able to perform graphical operations on the DATA sources and perform data governance on the UI. Finally, all graphical processing must be saved in files that the Kettle can identify.

The kettle generates two types of files: transformation file and task file. Transformation performs basic data conversion, and Job controls the entire workflow.

These two kinds of files can be called each other to achieve the ultimate purpose of data cleaning.

Kettle Development Tool

Kettle conversion

Kettle task

Kettle Application Scenario (Demo)

  • Scenario 1: Obtain REST interface data and save it as a text file

Use the REST interface as the data source, upload the iot data to the KETTLE file for the kettle to convert, and save the result to a TXT file. As you can see from the figure below, there is a node called Get Parameter, which is used to receive data from the REST interface. There is also a node called response result, which is used to save remote sensing data to TXT files.

Process REST interface data

  • Scenario 2: Task Scheduling (Periodically Executed Scenario 1)

If you want to perform the transformation periodically, you need to take a task scheduling approach. Task scheduling is performed for scenario 1. As you can see from the following figure, there is a node called transformation, and the transformation node is configured with the transformation file of scenario 1. There is also a node called START that is used to START the task scheduling process.

Task scheduling process

The above two scenarios are only a demo level attempt, I hope they can arouse your interest, and then I will give you a detailed introduction to the real powerful enough use scenarios in work.

conclusion

This paper introduces the basic concept of KETTLE, the basic composition of KETTLE, and two simple application scenarios. I hope you can get a preliminary understanding of Kettle and think of kettle as a powerful tool when you encounter data processing problems. Through my five years of use experience, I sincerely hope that we use it as I do, for our working people’s coding life, add a little fun. Okay, in the next article, I’ll explain how to set up a Kettle development environment, how to use the Kettle development tool, and how to complete a HelloWorld transformation and task. Stay tuned!