With more and more scenarios of high concurrent access and massive data processing of large websites, it becomes more and more important to realize the goals of high availability, scalability, extensibility and security of websites. In order to solve such a series of problems, the architecture of large websites is constantly evolving. To improve the highly available architecture of large websites, distribution is a must. This paper briefly introduces the concept of distributed system, the characteristics of distributed system, the common distributed scheme and the difference between distributed and cluster.

Centralized system

Before we look at distribution, let’s look at the equivalent of a centralized system.

Centralized system is summed up in a word: a host with multiple terminals. The terminal has no data processing ability, only responsible for data input and output. Computing, storage and so on are all carried out on the host. Most of the current banking system is such a centralized system. In addition, it is also distributed in large enterprises, scientific research institutions, the army, the government and so on. Centralized systems were popular mainly in the last century.

The biggest characteristic of centralized system is that the deployment structure is very simple, and the underlying layer is generally used from IBM, HP and other manufacturers to buy expensive mainframe. Therefore, there is no need to consider how to deploy the service on multiple nodes, and there is no need to consider the problem of distributed collaboration among nodes. However, single-node deployment is used. The system may be large and complex, difficult to maintain, a single point of failure (the failure of a single point will affect the entire system or network, resulting in the breakdown of the entire system or network), poor scalability and other problems.

2. Distributed System

In the book Distributed System Concept and Design, distributed system is defined as follows:

A distributed system is one in which hardware or software components are distributed on different network computers and communicate and coordinate with each other only through messaging

Simply put, it is a collection of independent computers that provide services externally, but for the users of the system, it is just like a computer providing services. Distributed means that more ordinary computers (as opposed to expensive mainframes) can be used to provide services in distributed clusters. The more computers there are, the more CPU, memory, storage resources, etc., and the more concurrent visits they can handle.

We know from the concept of distributed system, communication and coordination between different host mainly through the network, so the distributed system of computer almost no restrictions on the space, the computer may be placed on the different cabinets, could also be deployed in a different room, also may be in different cities, For large sites may even be distributed in different countries and regions. However, no matter how distributed in space, a standard distributed system should have the following main characteristics:

distribution

Distributed system in a number of computers in the space between the location can be randomly distributed, the system in the number of computers between the master, from the points, that is, there is no control of the whole system of the host, there is no controlled from the machine.

transparency

System resources are shared by all computers. Users of each computer can use not only the resources of the local computer, but also the resources (including CPU, files, printers, etc.) of other computers in the distributed system.

identity

Several computers in a system can cooperate to complete a common task, or a program can be distributed across several computers to run in parallel.

Communication sex

Any two computers in the system can exchange information through communication.

Compared with the centralized system, the distributed system has higher cost performance, stronger processing capacity, higher reliability, and good scalability. However, while solving the problem of high concurrency of web sites, distribution also brings some other problems. First, the necessary condition for distribution is network, which may have some impact on performance and even service capability. Second, the greater the number of servers in a cluster, the greater the probability of server outages. In addition, because services are distributed and deployed in a cluster, users’ requests only land on one machine, so data consistency issues can easily arise if not handled properly.

Common distributed solutions

Distributed applications and services

Applications and services are layered and segmented, and then application and service modules are distributed. This not only improves concurrent access, reduces database connections and resource consumption, but also enables different applications to reuse common services, making it easier for businesses to scale.

Distributed static resource

Distributed deployment of static web resources such as JS, CSS, and images can reduce the load on application servers and improve access speed.

Distributed data and storage

Large web sites often need to process large amounts of data, and a single computer often cannot provide enough memory space for distributed storage of the data.

Distributed computing

With the development of computing technology, some applications need very large computing power to complete, if the use of centralized computing, need to spend a long time to complete. Distributed computing breaks down the application into many small parts that are distributed to multiple computers for processing. This can save the overall calculation time and greatly improve the calculation efficiency.

Distributed and clustered

Distributed means that different service modules are deployed on different servers and work together through remote invocation to provide services externally.

A cluster is a cluster in which the same application or service modules are deployed on different servers to provide external services through load balancing devices.