From Layered Architecture to Microservice Architecture is a series of articles that introduce the eight architectural patterns mentioned in Fundamentals of Software Architecture. Instead of going into all the details, we’ll pick out the key ones. Read the original book for more details.

Past highlights:

  • From Layered Architecture to Microservices Architecture (1)
  • From layered architecture to microservices Architecture (2) : Layered architecture

preface

Pipeline Architecture, also known as Pipes and Filter Architecture, is one of the most commonly used architectural patterns. Most software engineers first encountered this architectural pattern through Unix terminals, whose Shell language has native pipe-filter support.

For example, you now need to implement a feature that reads the contents of a text file, finds the five most frequently used words, and prints out the words and their frequency in order of their frequency of use.

So, using the Shell, you can do this:

Cat content. TXT | # step1: read the file content tr - cs A Za - z '\ n' | # step2: the words according to the line output tr a-z a-z | # step3: convert all the words to sort | # step4: Prioritize uniq words - c | # step5: calculate the frequency of the word sort - rn | # step6: according to the frequency of words sorted head - n 5 # step7: get five words before ordering#Example output:
   4 to
   4 and
   3 the
   3 networks
   3 linux
Copy the code

The pipeline architecture of Shell code is a simple implementation, including pipeline pipe | said, every step is equivalent to a filter filter. Each filter takes the output result of the previous filter as input data, and then outputs the result to the pipeline after data processing.

In addition to Shell language, MapReduce is also built based on a pipeline architecture. Map and Reduce can be regarded as filters, but they communicate through HDFS.

Shell language and MapReduce programming model can be regarded as low-level implementation of pipeline architecture. Of course, it can also be applied to higher-level system applications. Let’s introduce the architectural view of pipeline architecture pattern.

Architectural view

Pipe architecture consists of pipe and filter:

Pipe, as a data transmission channel between filters, is usually one-way and point-to-point communication. Such design is not only simple to implement, but also achieves good performance. In addition, there is no uniform format for data transferred over PIPES, and each system can choose the appropriate data structure based on its characteristics.

A filter, as a component of data processing, is typically stateless. Each filter should do only one job, satisfying the single responsibility principle, and complex workflows should be composed of multiple filters. Generally, we divide filters into the following types:

  • Producer: Sometimes called a Source, is the start point of a pipeline that receives data from a data Source and outputs it to a pipe.
  • Transformer: Receives input data from pipe, converts some or all of the data in a regular line, and outputs the results to pipe. In functional programming, this step is often referred to asmap.
  • Tester: Receives data from the PIPE, makes some conditional judgments on the data, and selects whether to pass the data downstream to the pipe based on the judgments. Note that Tester does not make any changes to the data.
  • Consumer: Is the end point of the pipeline, typically persisting data read from the PIPE to the database or rendering to the user interface.

There can be multiple producers and consumers in a system. For example, we can receive input data through Kafka and REST interfaces at the same time. After processing, the system stores the result data in MySQL and sends a copy to the data warehouse for data analysis. In summary, the pipeline architecture pattern has a lot of flexibility.

Application example

The pipeline architecture pattern is widely used in many applications. Let’s use an ETL system as an example to understand how the pattern works.

ETL (Extract, Transform, Load) is the process of extracting, cleaning and converting the data of business system and loading it into the data warehouse. It aims to integrate the scattered, scattered and non-standard data in an enterprise and provide an analysis basis for enterprise decision-making.

During the operation of business application systems, various data will be output to Kafka. ETL system will consume relevant data and store the results in the database after processing. In the ETL system shown above, the functions of each filter are described as follows:

  • Service Info Capture: Subscribe to Kafka’s topic, consume the data generated by the business system, and pipe it to the downstream filter.
  • Duration Filter: Determines whether data and calculationService request processing duration(duration) index is correlated, if it is, the data is passed to duration Calculator; otherwise, it is passed to Uptime Filter.
  • Duration Calculator: Calculates the processing time of the service request and passes the calculation to the Database Output.
  • Uptime Filter: Determines whether data and calculationSystem running duration(UPtime) index is related; if so, the data will be transferred to upTime Calculator; otherwise, the data will be considered not related to the ETL system, and the data process will be ended.
  • Uptime Calculator: Calculates the normal running time of the system and passes the result to the Database Output.
  • Database Output: Persists data to MongoDB.

The ETL system mentioned above is composed of 1 Producer filter, 2 Tester filter, 2 Transform filter and 1 Consumer filter. The main data processing logic is to calculate the telemetry index of the system. The system has high scalability in architecture. For example, if we want to add a new index calculation, we can add new Tester and Transform after Uptime Filter, so the original index calculation of the system does not need to be changed. For example, if the system plans to replace MongoDB with HBase later, we can develop a new HBase Output to replace the original Database Output, and other processes of the system do not need to be changed.

Architecture score

The pipeline architecture pattern is usually implemented as a monolithic architecture, which, like the hierarchical architecture, has low ratings on Elasticity, Fault tolerance, and Scalability because of its disadvantages. Simplicity is one of the key advantages of pipeline architecture patterns, while Filter and PIPE are simple to implement and can quickly build a pipe-based architecture style system with high Overall cost scores.

In addition, compared with the hierarchical architecture mode, the pipeline architecture mode has higher scores on Modularity, Evolutionary and Testability, which is due to the loose coupling between the filters. We can easily extend the system’s filters and test a single filter.

conclusion

This article focuses on the pipe architecture pattern, which consists of a pipe and a filter. According to the specific data processing logic, filter is divided into producer, Transformer, Tester and consumer, which is a typical technical partition software architecture style. Pipeline architecture is widely used because of its high scalability, including low-level implementation of Shell language and high-level implementation of ETL system.

While this pattern is usually implemented as a monolithic architecture, distributed system-based programming patterns such as MapReduce are also implemented. In short, if you need to model a data-processing system, you can seriously consider adopting a pipeline architecture pattern.

Each architectural pattern has its own suitable application scenarios. Only by being familiar with several commonly used architectural patterns can we design a better software system. We will continue this in the next articleMicrokernel architecture.