Probably the best article on distributed systems

If you were asked to define a “distributed system,” what would be the first thing that comes to mind? At this time, I think we can use a poem by Mr. Su Dongpo to vividly describe our understanding of distributed system:

The horizontal view of the ridge side into a peak, near and far high and low are different.

I think what pops up in everyone’s head is something very concrete, like the following:

If you immediately think of XX hubs, XX services, you are mistakenly equating service-oriented patterns (SOA, ESB, microservices) with distributed systems.

So what is “servitization”? Servitization is like dividing the staff of the same position into the same department for management in an enterprise, so as to converge the specific work entrance, and then carry out secondary distribution to improve the utilization rate of staff and the reuse of labor achievements. The essence of servitization is “divide and conquer”, and the premise of “divide and conquer” is to dismantle first, and then talk about how to govern. In this case, the idea of high cohesion and low coupling plays a very important role in the split process because it minimizes the complexity of collaboration between different components after the split. So what matters is the “how” and how you do it incrementally, not the servitization patterns you adopt (SOA, ESB, microservices, etc.).

Why is “how to dismantle” the most important? Let me give you an example. The organizational structure of an enterprise consists of three models: functional, project and matrix. You can think of the enterprise here as a “distributed system,” and the following three models as three forms of this distributed system. As the owner of this “system,” you need to think about how to break it up so that the functional components work better with each other. Suppose you want to split a business with 10,000 employees into 20 “functional” divisions. The result is 500 people each.

At this point, if the work is pipeline-like upstream and downstream relationship. When one department finishes, it passes to the next.

So this is high cohesion, low coupling. Because a job is only associated with another job, and only once.

However, if the work needs to be carried out frequently by personnel with different functions at the same time, the same department may be connected with multiple departments.

So, this is low cohesion, high coupling. Because a job needs to be related to many other jobs more than once.

It can be seen that servitization reflects the effect of “divide and conquer”, which is also the core idea of distributed system. Therefore, from the essence of “divide and conquer”, servitization is indeed a distributed system, but distributed system does not stop at those servitization modes.

I’m sure that any software system you develop at work is going to have to be broken down everywhere, unless it’s so minimal that it just needs to compute a 1+1. For example, when we click “Submit order” on the e-commerce platform, we will generate order, deduct points, deduct inventory and other actions. At the beginning of the e-commerce system, all the functions may be in one system, so can these operations be written in one method body? I think most people don’t care how you write your code as long as it works. But what if you need to add a red envelope function? You’ve probably experienced the pain of adding functionality to hundreds or thousands of lines of code.

The solution to this problem is to do the splitting, combing, categorizing, and converging the different closely related parts into a separate logical body, which can be functions, classes, namespaces, and so on. Therefore, from this point of view, the problem of “divide and conquer” already exists in our work, depending on whether we have to pay attention to it. So it’s not just something we need to think about when we do servitization.

So how to do this thing well, better separation ability is what we need to master. If it’s just because I see other people doing this, I will do the same. According to the “80-20 principle”, maybe “copying a gourd” can reach 80% agreement, but often the remaining 20% will be the “big trouble” that consumes 80% of our energy. Knowing the core themes will help you find the ideal high-cohesion, low-coupling solution more quickly.

Is “distributed system” a variety of middleware?

Or, when you hear distributed systems, you think of the XYZ MQ framework, the XYZ RPC framework, the xyz DAL framework, and erroneously equate middleware with distributed systems.

It is important to understand that middleware serves a standardization function. Middleware is just a medium, a tool, to host these standardization ideas, and can act as a guide and a constraint to greatly reduce system complexity and collaboration costs. Let’s take a look at each:

The MQ framework standardizes non-real-time asynchronous communication between different applications.

The RPC framework standardizes the way different applications communicate in real time.

The DAL (Data Access Layer) framework standardizes the way applications and databases communicate.

So, although middleware is used in distributed systems, distributed systems are not just about what middleware is used. You need to know what is being standardized behind each type of middleware, what its purpose is, what side effects it has, and so on. Only then can you really identify the differences between the different technology frameworks and find the one that really fits the current system.

So the criteria are in the head? Definitely not. As I said before, every standardization has a purpose and needs to generate value. For example, most middleware has the following value:

In the iterative process of a software system, it is necessary to avoid spending too much energy on a number of slightly different options under one sub-function.

In reality, this is more often found in technical middleware, for example, database access frameworks are used to standardize the operation of different databases so that upper-layer applications don’t have to worry about how to interact with mysql or SQL SERVER. Because the technical side is much more “stable” than the business side, standardization is more valuable and yields more long-term benefits. But “stable” is relative, even at the pure business level there are relatively stable parts.

For example, you can imagine the scene of “sheng fan”, what is relatively stable and what is not stable in most cases. Take a look at the following example.

. Base class: People inherit from the base class: men and women Base class: bowls Inherit from the base class: large bowls, small bowls, and soup bowls Base class: spoons inherit from the base class: Iron spoon, pottery and porcelain spoon, plastic spoon function more rice (parameters, bowl, spoon) parameters {do people picked up the bowl do people picked up the spoon do people use spoon scoop up the top of the rice do people put a spoon in the bowl and fell}… From this example, we can see that the unstable parts are already variables, so the rest of the method body serves the same function as the middleware mentioned earlier, standardizing and standardizing the meal process. So identifying what the relatively stable parts are, how to extract them, and standardize around those points, is what we need to master. Nor is it limited to distributed systems where this ability is exercised and where it is needed.

Enumerating these phenomena is just to say that when we recognize a distributed system, the inner is more important than the appearance, and it is more important to master a solid theoretical basis. And these training grounds are everywhere.

Mirage of “distributed Systems”

I believe that since the advent of the mobile era, there have been more and more high-flying system architecture diagrams, and you are bombarded with a dizzying array of technical frameworks, both mainstream and non-mainstream. You’re awestruck and say, “Yeah, this is where I want to be. I want to be a part of, or even implement, an awesome distributed system like this. I don’t want to just add, delete, change and check every day.”

What we can’t have is always good, but we tend to overestimate it. Similarly, the system behind the lofty architecture diagram does look like a mature distributed system, but let’s be clear: Rome wasn’t built in a day.

Moreover, the word “distributed” simply means that it is morphologically hashed, and there is no essential difference between “bisecific” and “one divided into N”. Therefore, the basic package “single program + single database” that many small projects or large projects match in the early stage can also be understood as distributed system, and many of the problems encountered also exist in mature distributed system.

Imagine if the following scenario has ever occurred in a “single program + single database” project.

Log records are executed successfully, but database data is not changed.

The in-process cache data was updated, but the database update failed.

Let’s pause here for 30 seconds and think about why these problems are happening.

Here we need to think about what “software” is. The essence of software is a set of code, and the code is just a piece of text, in addition to providing the information expressed by the text, itself cannot “move” up. But for it to “move” so that it can do something we specify, you need a host to give it life. The host is the computer, which can turn code into a series of executable “actions” that can then be triggered by the “fuel” of data. This continuous process of activity is described as a running “process”.

So in addition to the system we develop is software, the database is also software, the former is responsible for computing, the latter is responsible for storing the results of computing (also known as “state”), division of labor cooperation.

Therefore, “single program + single database” is also a distributed system this question is very clear. Because we’re writing a program that’s running in a different process than the database that’s running in it. As a result, it is no longer as simple to let the two processes (systems) do their part, and then finally do a complete thing, as a single individual doing it alone. It’s like the game of “three feet for two”, how to make the outside look as one and move forward naturally as possible.

Therefore, we can understand that a system that involves multiple processes working together to provide a complete function is a “distributed system”.

Going back to the two examples above, we thought about the reasons behind the problems we encountered in the single-program + single-database project and how to solve them in the same way we would in a mature distributed system, such as data consistency. Of course, this is just the tip of the iceberg of the core concepts of distributed systems.

Wikipedia’s macro definition of a “distributed system” looks like this:

A distributed system is one in which components reside on different networked computers and communicate and coordinate by passing messages to each other. These components interact to achieve a common goal.

We can explain it again in terms of size: the science of dividing engineering data into small pieces that require a lot of computation, computing them separately by multiple computers, and then combining the results to make conclusions about the data. This is essentially “divide and conquer”. And “single program + single database” combination system also contains at least two processes, “although a sparrow is small, all the five organs”, this is also a “distributed system”.

conclusion

Now, we know that a “distributed system” is viewed more from the inside than from the outside. And, as long as the collaboration of multiple processes can provide a complete function of the system, is the “distributed system”.

I’m sure there are many other images that come to mind, but most of them are the result of chemical reactions generated by the nature of distributed systems. If we stay on these appearances, we will eventually be unable to find the essence of the “distributed system”, and then we will not be able to get the real “Tao”, let alone really have the ability to control these different forms of “distributed system”.

So, hopefully, when you learn distributed systems, you don’t lose the tao by chasing the “art”. There is no “tao” only “shu” is empty shell, will eventually go astray, the more you learn, the more chaos, contradictions and doubts everywhere.

So, in addition to teaching you best practices in specific scenarios, this series will also show you why and how to weigh different options. We will not talk too much about the specific technical framework, but most of the content is carried out around the theory, so that everyone can master the basic theories and ideas in the distributed system, and cultivate their own internal skills.

I will be in a future article, early in a project to maturity as a roadmap, lead you step by step into the distributed system, a layer to peel away the essence of it, and around the nature to think about (what’s the problem, what are the ways to solve, when to use what kind of way, etc.), let your learning and know the why, Form a complete set of knowledge system, complete the core “skeleton” molding. After that, you can fill in the “flesh and blood” part and gradually enrich yourself when you study outside the classroom. In the future, everyone’s difference depends on a bit fat and a bit thin, but as long as you can finish the job well, fat and thin have why to affect? For more in-depth articles, please pay attention to the wechat public account Xinghe (BDG-Store).

Probably the best article on distributed systems

Related Posts

Open Source Serverless milestone: Knative 1.0 is here

Alfred for MAC does not run PHP plugins after upgrading MAC OS12.

Windows quick installation of Superset 0.37