Distributed architecture

One, foreword

In the big data system, distributed system has become an unavoidable component, such as ZooKeeper has become the industry standard. Therefore, the research on big data must also study the characteristics of distributed systems.

Centralized system

A central node consisting of one or more computers in which data is centrally stored, all service units of the entire system are centrally deployed, and all functions of the system are centrally processed. It is easy to deploy and does not need to consider the problem of distributed collaboration among multiple nodes. * distributed system *

A distributed system is one in which hardware or software components are distributed on different network computers, communicating and coordinating with each other only through messaging. It has the following characteristics

3.1 distribution

In a distributed system, multiple computers are randomly distributed in space, and the distribution of machines can change at any time.

3.2 equivalence

No master/slave of computer in a distributed system, neither control the host of the whole system has not been controlled from the machine, all computer nodes of a distributed system is equal, refers to a copy of the distributed system for data and services provide a way of redundancy, in order to provide high availability of services, we tend to copy the data and services.

Data copy refers to the persistence of the same copy of data on different nodes. When the data stored on a node is lost, the data can be read from the copy. This is the most effective method to solve the problem of data loss in distributed systems.

A service replica is one in which only multiple nodes provide the same service, and each node has the ability to accept requests from outside and process them accordingly.

3.3 concurrency

Multiple nodes in the same distributed system may concurrently operate some shared resources, such as database or distributed storage, etc. How to efficiently coordinate distributed concurrent operations has become one of the biggest challenges in distributed system architecture and design.

3.4 Lack of a global clock

A typical distributed system consists of a series of random on the space distribution of multiple processes, has obvious distribution, to communicate with each other by exchanging information between these processes, therefore, in a distributed system, it is difficult to define who first who, after two time because of distributed control system is lack of a global clock sequence.

3.5 Failures always happen

The distributed system of all computers, are likely to occur in any form of failure, any design considered in the abnormal situation, will occur in the actual operation of the system.

Fourth, distributed environment problems

4.1 Communication Exception

The transition from centralized to distributed inevitably introduces network elements, which introduce additional problems due to the unreliability of the network itself. The network communication between nodes of distributed system can be carried out normally, and its delay is much greater than that of single operation. In the process of sending and receiving messages, message loss and message delay become very common.

4.2 Network Zones

When network abnormal situation occurs, resulting in part of the nodes in the distributed system of network latency between the increasing, eventually leading to constitute distributed all the nodes of the chest, only some can normal communication between nodes, and the other nodes is not, this phenomenon is called network partition, when a network partition occurs, a distributed system, there will be a local small cluster in extreme cases, These small local clusters can independently complete functions that the entire distributed system could otherwise complete, including transaction processing of data, which poses a great challenge to distributed consistency.

4.3 three states

Because the network may appear a variety of problems, so the distributed system each request and response, there is a unique three-state concept: success, failure, timeout. When the network is abnormal, timeout may occur in the following situations: 1. Due to network problems, the request was not successfully sent to the receiver, but message loss occurred during the sending process. 2. The successful recipient receives the request and processes it. However, message loss occurs when the response is sent back to the sender.

4.4 Node Faults

Node faults refer to the breakdown or death of server nodes in a distributed system. Each node may become faulty.

From ACID to CAP/BASE

5.1 the ACID

Transaction is a program execution unit consisting of a series of operations to access and update data in the system. In the narrow sense, transaction refers to database transaction.

On the one hand, when multiple applications concurrently access a database, transactions can provide a means of isolation between those applications to prevent their operations from interfering with each other.

Transactions, on the other hand, provide a way for a sequence of database operations to recover from a failure to a normal state, while also providing a way for the database to maintain data consistency even in an abnormal state.

Transactions have Atomicity, Consistency, Isolation, Durability (ACID).

atomic

A transaction must be an atomic operation sequence unit. Each operation contained in a transaction can only be in one of the following two states during a single execution. All operations are successfully executed and none are executed.

If any operation fails, the entire transaction will fail, and all other operations that have been performed will be undone and rolled back. The entire transaction is considered to have completed successfully only if all operations succeed.

consistency

Refers to the execution of the transaction can not destroy the database data integrity and consistency, a transaction before execution and execution, the database must be in a consistent state, namely the results of the transaction must be the transformation of the database from one consistent state to another consistent state, so when the result of the transaction commit database contains only a success, Can say database in a consistent state, but if the database system failure occurs in the operation process, some transaction has not been completed, was forced to interrupt the unfinished transaction on the database part of the changes have been written to the physical database, the database is in a state of incorrect, or inconsistent state.

Isolation,

Refers to the concurrent environment, concurrent transaction is isolated, the execution of a transaction cannot be distracted by other transactions, namely different transaction concurrency operation of the same data, each transaction has its full data space, that is, a transaction internal operation and use of data to other concurrent transactions are isolated, concurrent execution between individual transactions cannot interfere with each other.

persistence

Refers to a transaction once submitted, he changes to the status of the corresponding data in the database should be permanent, at the end of the once a transaction is successful, then it did to the database update must be permanently preserved, even if the system crash or downtime fault occurring, as long as the database to be able to restart, then will be able to return it to the transaction success at the end of the state.

5.2 Distributed Transactions

Distributed transaction means that the participants of a transaction, the server supporting the transaction, the resource server and the transaction manager are located on different nodes of a distributed system. Usually, a distributed transaction involves operations on multiple data sources or business systems.

A distributed transaction can be regarded as consisting of multiple distributed sequences of operations, which can often be called sub-transactions. In distributed transactions, the execution of each subtransaction is distributed, so it is particularly complex to implement a distributed transaction processing system that can guarantee ACID properties.

5.3 CAP

CAP theory tells us that a distributed system cannot simultaneously satisfy the three basic requirements of consistency, availability and fault tolerance of partitions, or at most two of them.

consistency

Whether data can be kept consistent among multiple copies. In accordance with the requirement of consistency, when a system performs an update operation in a consistent state, it should ensure that the data of the system is still in a consistent state.

For a copy of the data distribution in different system to distributed nodes, if the data to the first node, after a successful update operation and did not get the data on the second node corresponding updates, so the data of the second node during read operations, access to the remains of the old data (dirty data), This is a typical case of distributed data inconsistency. In a distributed system, a system is considered to have strong consistency if all users can read the latest value after a successful update operation on a data item.

availability

It means that the services provided by the system must always be available and can always return results within a limited time for each operation requested by the user.

Partition fault tolerance

When a distributed system encounters any network partition failure, it still needs to be able to provide consistent and available services, unless the entire network environment fails.

5.4 the BASE

BASE is a shorthand for Basically Available, Soft state, and Eventually consistent.

Basic available

A distributed system that allows a partial loss of availability, such as loss of response time or loss of functionality, in the event of unpredictable failures.

Weak state

Also known as soft state, it allows the intermediate state of the two data in the system, and considers that the existence of the intermediate state will not affect the overall availability of the system, that is, it allows the delay in the process of data synchronization between the data copies of different nodes.

Final consistency

It means that all data copies in the system can reach a consistent state after a period of synchronization. Therefore, the essence of final consistency is that the system needs to ensure data consistency, but does not need to ensure strong consistency of system data in real time.