As a back-end engineer, the most important work at ordinary times is actually data processing, which is often referred to as adding, deleting, changing and checking. Doing these redundant tasks for long periods of time, coupled with the normality of 996, can sometimes get boring. So where does the back-end engineer end up? Many backend engineers will feel confused. If you want to become a so-called architect, you will find that the architecture design done by architects is actually copying some common architectural patterns in the Internet field, and even those PPT architects. I believe many students have the same experience as me. Those who are really able to carry out mode innovation are often scholars in research institutes. Therefore, it is necessary to read more books to become an architect, but you will find that you are actually back-end engineers. So as a back-end engineer, can only do add, delete, change and check? If you can use some tools to replace the engineers add, delete, change and check, backend engineers will not be free.
Therefore, the ideal back-end engineer should be like this. When there is a pile of new requirements and new business arrival, the engineer uses the configuration tool for data processing, task scheduling, and interface generation. These requirements were accomplished in a very short period of time. The rest of the work is left to the front end engineers to interface. I can do a month’s work in one day (this is my real work experience). Such a development tool is worth every engineer to have. As the saying goes, “it is better to teach people to fish than to teach people to fish”. I am willing to share my years of experience with all engineers and lead them to complete this development tool step by step. That’s kettle.
The knowledge system
The entire knowledge system of this book will be explained in the following ways: Start from the basic Kettle component, lay a solid foundation, then complete the development of the data integration platform, and finally enter the advanced part to manually develop the Kettle plug-in to adapt to changing services.
-
Kettle Basics: Mainly explain some basic components, by explaining the input component, that is, the source of data, can be a variety of formats of files (TXT, Excel, CSV, XML, JSON, JS) input, can also query the database table for data input, OLAP input, Parsing data from various channels and styles into standard data formats; The output component stores kettle data in different storage media. For example, you can store data in Excel files, text files, databases, elasticSearch, etc. The transformation component is a data processing component, such as string clipping, data mapping, de-duplication, increment sequence, sorting, flattening, and row to column transformation. Most of the data processing work can be completed through the conversion component. A query component can be thought of as a dynamic way to get data from a data source, such as from A REST interface, through HTTP nodes, REST nodes, from a WebService interface, from a database, or better yet, from a streaming query. The query component as well as the input component can process any form of data obtained from any source. The script component is a supplement to the function of the transformation component. The script component supports Java code, JavaScript code, rule engine script, SQL script, regular expression and other scripts. Make the data processing function more powerful. In short, if you learn the basics of kettle, you can solve most data processing problems in business requirements.
-
Kettle data integration platform: This part moves the basic content of the Kettle online to make it a data platform that can be used in production environments. It is mainly divided into service pool, job pool, resource pool, resource library and execution strategy module. A service pool uses the kettle conversion function as a service interface, which is a service gateway. It provides external services using REST, WebSocket, and WebService. This function supports most front-end and back-end interconnection functions. Job pooling, used for task scheduling functions, is also a shortcut for streaming data processing, such as clients (producers and consumers) that act as message queues. Resource pool: Saves the kettle file locally. That is, the file is developed using pDI and uploaded to the platform. The resource library directly stores data to the database through the online function of PDI. Execution policy, an auxiliary module of the job pool, provides Quartz configuration for task execution.
-
Kettle Progress: This is an advanced part of the kettle. After you are familiar with the basic knowledge of the Kettle and the data integration platform based on the kettle, you may find that some functions cannot be implemented. For example, if the kettle does not have a websocket interface through the platform, you need to write plug-ins to improve the functions of the kettle and the platform. The advanced section focuses on how to write custom plug-ins and how to combine MQTT with WebSocket.
The reader to harvest
It is believed that readers will have an in-depth understanding of KETTLE knowledge after learning the whole volume. In future work, no matter how to deal with any business requirements, as long as they properly use the KETTLE tool and data integration platform based on kettle, they can achieve twice the result with half the effort. Of course, my ability is limited, if there is any mistake, I will correct in time, and hope to make progress together with the majority of readers.