One, foreword

As we all know, no matter in large companies like BAT, or a variety of small companies, or even the traditional industry just transferred to the Internet enterprises are beginning to use distributed architecture, so what is distributed architecture? What are the benefits of a distributed architecture? How has distributed architecture evolved? Which company started the era of distributed architecture? By the end of this article, you’ll have these answers, and let’s start the magic journey of distributed overview!

Second, the history of distributed architecture

On February 14, 1946, a romantic Valentine’s Day, the world’s first electronic digital computer was born at the University of Pennsylvania in the United States. Her name was ENIAC. The computer, which covers 170 square meters and weighs 30 tons, can perform 5,000 addition operations per second.

After the birth of the first electronic computer, IT means the arrival of a rapidly changing IT era. The performance of individual computers has improved from the earliest 8-bit cpus to the present 64-bit cpus. From the early MB level of memory to the current GB level of memory; From slow mechanical storage to solid-state SSD storage.

After ENIAC, electronic computers entered the mainframe era dominated by IBM. April 7, 1964, in Jean. The first IBM mainframe, the SYSTEM/360, was launched in three years at a cost of $5 billion under the leadership of Amdar, the father of the IBM mainframe and considered one of the greatest computer designers of all time. This enabled IBM to dominate the mainframe computer industry in the 1950s and 1960s, laying the foundation for IBM’s computer empire. IBM mainframes once underpinned the US space mission to the moon, and IBM mainframes have served key areas of core industries such as finance. Even in the context of the rapid development of X86 and cloud computing,IBM’s mainframes still firmly occupy a certain high-end market share due to their superior computing power and high reliability.

In the 1980s, in the era of mainframe hegemony, computer architecture evolved in two directions simultaneously:

  • Personal-oriented, inexpensive PCS based on CISC (Computer language instruction set executed by microprocessors) cpus.

  • Small, expensive enterprise-oriented UNIX servers built on RISC (Reduced Instruction set Computer) cpus.

I personally invite all BATJ architecture masters to create a Java architect community group (group number: 673043639), committed to providing a free platform for Java architecture industry communication, through this platform to let everyone learn and grow from each other, improve their skills, make their level to a higher level, and successfully lead to the development of Java architecture technology masters or architects

Milestone of distributed architecture development

Mainframe with mainframe super computing and I/O processing capacity, security, stability, in a long period of time, mainframe led the development of the computer industry and commercial computing field. And centralized computer system architecture has gradually become the mainstream. However, with the development of society, this architecture is increasingly difficult to meet the needs of enterprises. For example:

Mainframe complexity is high, and it is expensive to train a skilled mainframe operator. Mainframes are expensive, and usually only rich institutions (government, telecom, finance) can afford them.Copy the code

There is a single point of failure, and once the mainframe fails, the entire system becomes unusable. For mainframe users, the loss caused by this unavailability is very large.

Due to the progress of science and technology, the development of technology, PC performance has been continuously improved, so many enterprises give up the mainframe to use minicomputer and ordinary PC to build the system architecture.

Alibaba’s “go IOE” campaign has ushered in a new era

IOE refers to IBM minicomputers, Oracle databases, EMC high-end storage. Alibaba in 2009 “to IOE” strategy technology director revealed that as of May 17, 2013 alibaba’s last IBM minicomputer in Alipay offline.

Why go to IOE?

With the rapid development of business, Alibaba’s business volume and data volume show explosive growth, and the traditional centralized Oracle database architecture has encountered a bottleneck in system scalability. Traditional commercial database software (Oracle,DB2) is mainly centralized architecture, so the biggest characteristic of these traditional database software is to centralize all the data in a database, can only rely on large-scale high-end equipment to provide high processing capacity and scalability. The scalability of the centralized database is mainly scaled up, which increases the processing capacity of the system by increasing CPUS, memory, and disks. This centralized database architecture makes the database become the bottleneck of the whole system, and has been increasingly unable to adapt to the requirements of massive data on computing power.

The significance of distributed system

The reason for the development of distributed system architecture is that standalone systems have the following disadvantages that need to be addressed:

1. The cost performance of upgrading the processing capacity of a single machine is getting lower and lower. We know that the processing capacity of a single machine mainly depends on CPU, memory and disk. It becomes more and more expensive to upgrade hardware to improve performance by scaling up vertically. It’s going to get cheaper and cheaper.

2, there is a bottleneck in stand-alone processing capacity, and there is a bottleneck in stand-alone processing capacity, CPU, memory, disk will have their own performance bottlenecks, even if you are rich at cost to improve the hardware, but the development speed and performance of hardware is still limited.

3, the stability and availability of these two indicators is difficult to achieve the last is the single system availability and stability problems, these two indicators are we urgently need to solve the problem.

Common concepts of distributed architecture

1. The cluster

I opened a small restaurant. At the beginning, there was only one cook in the restaurant, who cut, washed, prepared and cooked vegetables. Later, because the food was sweet and delicious, the flow of people became more and more, and one cook was too busy, so Xiao Zhang invited two more cooks. Then at this time, three cooks cooked the same dishes, did the same work, such as cutting vegetables, washing vegetables, preparing vegetables and cooking vegetables, the relationship between the three cooks was a cluster. Which means that when a customer comes in, only one of the chefs will serve that customer.

2. The distributed

Again after a period of time, the store’s business has become more popular, chamberlain in order for chefs to concentrate on cooking, the dish perfectly, and I took a side dish division is responsible for cutting food, prepared food, inventories, then cook and dishes is the relationship between distributed, then a side dish also busy not over, zhang took a two dishes, three dishes relationship is also a cluster.

3. The node

A node is an individual program that can independently complete a set of logic according to a distributed protocol. In a specific project, a node represents a process on an operating system. So every single one of these garnishes and cooks is a node.

4. Duplicate mechanism

Replica /copy refers to the redundancy of data or services in a distributed system. Data copy refers to persisting the same copy of data on different nodes. When data on a node is lost, data can be recovered from the copy. Data copy is the only way to solve the problem of data loss in distributed system. Service copy refers to a scheme in which multiple nodes provide the same service and realize high service availability through the master-slave relationship.

5. The middleware

Middleware is located in the operating system to provide services, but does not belong to the application, he is located between the application and system layer, convenient for developers to deal with communications, input and output of a class of software, can let users only care about their own applications.

Changes of von Neumann model in distributed domain

This is the classical theory. – Von. In Neumann system, computer hardware consists of arithmetic unit, controller, memory, input device and output device. No matter how the architecture changes, computers remain within it.

Input device changes

In distributed system architecture, input devices can be divided into two types: the first type is interconnected multiple nodes receiving information from other nodes as the input of the node; The other is the traditional human-computer interaction input device.

Output device changes

In distributed system architecture, output is also divided into two types. One is when a node in the system transmits information to other nodes, the node can be regarded as an output device. The other is the output device of traditional interpersonal interaction, such as the user’s terminal.

Controller changes

In a single machine, the controller refers to the CPU controller, in the distributed system, the main role of the controller is to coordinate or control the action and behavior between nodes; Hardware load balancers; LVS soft load; Rule servers and so on.

Arithmetic unit

In distributed system, arithmetic unit is composed of many nodes. The computing capacity of multiple nodes is used to complete the whole computing task cooperatively.

memory

In distributed system, we need to organize the storage function of multiple nodes together to form a whole memory; For example, database and REDIS (key-value storage).

Seven, the difficulty of distributed system

There is no doubt that distributed systems are more complex to implement than centralized systems. Distributed systems will be harder to understand, design, build, and manage, which means the root causes of applications will be harder to find.

Three states

In a centralized architecture, calling an interface returns only two results, success or failure. In distributed architectures, however, there is a “timeout” state.

Distributed transaction

This is an age-old question actually, we all know that the transaction is a series of atomicity guarantee of operation, in the case of a single machine, we can rely on the native database connection and components easily do transaction control, but under the distributed architecture, the business atomic operation is likely to be across services, this will result in a distributed transaction. Such as A and B are respectively in different operation service under the same operation within A transaction, A call to B, if A can know clearly the B submitted successfully to commit or rollback control themselves, but we know that in A distributed system call will be A new state is timeout, is A and cannot know B is success or failure, Does A commit the local transaction or perform A rollback? This is actually a very difficult problem, if you want to forcibly ensure the consistency of transactions, you can take distributed locks, but that will increase the system complexity and will increase the overhead of the system, and the more services across the transaction, the greater the consumption of resources, the lower the performance, so the best solution is to avoid distributed transactions. Another solution is to retry mechanism, but try again if not query interface, is certain to be referred to for the change in the database, if the first call to successful but did not return A result, the caller calls for the second time for the caller is still try again, but at this point to be is repeated calls to the caller, for example A to B transfer, A – 100, B plus 100, which would result in A losing 100 and B gaining 200. This is not the desired result, so idempotent design is required on the interface to be written (multiple calls have the same effect as a single call). You can usually set a unique key to check if it already exists at write time to avoid repeated writes. However, a prerequisite of idempotent design is that the service is highly available. Otherwise, no matter how many retries are made, the call cannot return a clear result, and the caller will wait forever. Although the number of retries can be limited, this has already gone into an abnormal state, and even extreme cases require human compensation processing. In fact, according to CAP and BASE theory, it is impossible to achieve consistency in the case of high availability distribution, which is generally the ultimate consistency guarantee.

Load balancing

In order to achieve high availability of services, each service is deployed at least two machines, because Internet companies generally use reliability not high general machine, long running the outage probability is very high, so the two machines can greatly reduce the possibility of a service is not available, and large projects tend to adopt a dozen or even hundreds of set to deploy a service, This is not only to ensure the high availability of the service, but also to improve the QPS of the service, but it also brings up the question, which machine is a request routed to? There are many routing algorithms, such as DNS routing. If the session is on the local machine, it will be routed to a fixed machine based on user ID or cookie information. Of course, nowadays application servers are designed to be stateless for the convenience of expansion, and the session will be saved to a dedicated session server. So you don’t normally get session problems. Are the routing rules random? This is a method, but as far as I know, the actual situation is more complicated than this, within a certain range of random, but in large range will be divided into many domains, such as if to assure long distance more living room, kua room called spends too much, will certainly give preference to with room service, this distribution to refer to a specific machine to consider.

consistency

When data is distributed or replicated to different machines, it is difficult to ensure data consistency among hosts.

Failure independence

Distributed system is composed of multiple nodes, and the probability of the whole distributed system completely going wrong exists. However, in practice, it is more likely that a node goes wrong, while other nodes are ok. In this case we need to think more comprehensively when implementing distributed systems.

Write in the last: welcome to leave a message to discuss, pay attention to, continue to update!!

To- Molin Java architecture To share the latest Internet articles pay attention To the latest development of the InternetCopy the code