Distributed theory

Distributed system: a system in which hardware or software is distributed among computers on different networks, communicating or coordinating with each other through messages

1. Problems solved (shortcomings of single architecture)

  1. Limited processing capability for mass users
  2. The more complex the program, the less efficient the development
  3. A major BUG occurs in the production environment that will cause all services to stop
  4. As the amount of code increases, compilation efficiency decreases
  5. There is only one technology stack to focus on

2. Explanation of nouns

  1. Distributed – multiple people doing [different] things together

  2. Cluster – a group of people doing the same thing together

  3. Network partition (split brain) – The network is disconnected, causing small clusters in the distributed system. The network between small clusters is abnormal, and the network within a small cluster is normal

Third, architecture evolution

  1. Single application Architecture
  2. The application server is separated from the data server
  3. Application Server Cluster
  4. Database read/write separation
  5. Add search engine to relieve library reading pressure
  6. Add cache to relieve library read pressure
  7. Database split horizontally/vertically
  8. Application servers are split vertically
  9. Application servers are split horizontally

4. Consistent classification

  1. Strong consistency – what is required to be written and read by the system – performance impact
  2. Weak consistency – There is no promise of how long it will take for data to be consistent, but at a certain level (seconds, minutes, hours)
  3. Read and write consistency – The first time you see your updated content, others do not guarantee
    • Specific content is read from the main library – the main library is under pressure
    • The newly updated content is read from the master library and, after some time, from the slave library
  4. Final consistency – only ensure that the data of all copies in the final system is correct

5. CAP theorem

  1. Consistency – All copies are consistent and the data read from any node is the latest
  2. Availability – Services provided externally are normal and no response timeout or error occurs
  3. Fault tolerance – Can still provide services when network partitions are used

Only two of the three requirements can be met

Proof: The user makes a request to N1 to change the Value from V0 to V1. At this time, the network between N1 and N2 is interrupted. However, another user makes a request to N2 to obtain the Value

(1) return V0 to AP mode, sacrificing consistency

(2) Wait for the network to recover, and then return to V1 [CP mode, sacrificing availability]

(3) Combine N1 and N2 [CA mode, abandon distributed technology]

Sixth, BASE theory

If strong consistency cannot be achieved in the balance results of CAP theorem, final consistency should be reached in an appropriate way according to business characteristics

  1. Basically Available – Allows for a partial loss of availability when a distributed system fails
    • Time: normally within 0.5 seconds to respond to results, increased to 1~2 seconds in case of failure
    • Function: When the traffic surge, some users will be directed to the degraded page
  2. Soft State Soft state – Allows data to exist in intermediate states (some data has not been synchronized) but does not affect the overall system availability
  3. Eventually consistent – Data is Eventually consistent after a period of synchronization

7. Consistency Protocol (handling database transactions)

1. 2PC Two-phase submission

process

  1. Preparation phase – The coordinator sends a Prepare message to each participant, runs the local transaction but does not commit
  2. Commit phase – The coordinator sends a Rollback message to the participant if the participant fails to run or times out. Otherwise, the coordinator sends a Commit message

disadvantages

  1. Synchronous blocking – Participant transactions are blocked before phase 1 reaches phase 2
  2. Single point of problem – If the coordinator crashes before running phase 2, the participant transaction is locked
  3. Inconsistent data – The coordinator crashes before the Commit message has been sent, resulting in inconsistent data
  4. Too conservative – If any node fails, the entire transaction fails

2. 3PC Three-phase submission

process

  1. CanCommit – The coordinator sends a request containing the transaction to each participant, asking if it can be run
  2. PreCommit – The coordinator asks the participant to run a transaction
  3. DoCommit – The coordinator asks the participant to commit a transaction – The timeout defaults to commit a transaction if the participant does not receive a message from the coordinator at this stage

Reduced the transaction blocking range of 2PC, but did not completely solve the data inconsistency problem

8. Consistency algorithm (Selecting the final result or Leader)

1. The Paxos algorithm

role

  1. Client Sends requests to distributed systems
  2. Proposer: a Proposer that negotiates acceptors to reach an agreement
  3. Acceptor decision makers – approve proposals
  4. Learner – Learns the final decision

specification

  1. An Acceptor must accept the first proposal it receives
  2. Each proposal received must have the same value as the first
  3. A proposal is selected and must be accepted by more than half of acceptors

** Process 1 **

  1. Proposer sends a prepare request numbered N with no Value to more than half of the acceptors
  2. If an Acceptor has not accepted the proposal, Null is returned
  3. The Proposer then sends an accept request numbered N with a Value of its own Value
  4. An Acceptor accepts a proposal with the number N and Value Value

Flow 2

  1. Proposer sends a prepare request numbered N+1 with no Value to more than half of the acceptors
  2. If an Acceptor has accepted a proposal numbered N, the Value of proposal N is returned
  3. A Proposer then sends an accept request numbered N+1 with a Value
  4. An Acceptor accepts proposals numbered N+1 with the Value Value

Extreme case: Two proposers submit successively numbered proposals, resulting in an endless loop

Solution: States that only a master Proposer can submit a proposal

2. The Raft algorithm

role

  1. Leader – Interacts with the client, only one
  2. Candidate – Responsible for nominating yourself during the election process and becoming the leader when the election is successful
  3. Followers – voters, waiting to be informed of the vote

process

  1. The election begins and all nodes are followers
  2. Maintain Follower status if RequestVote (vote for me) and AppendEntries (elected Leader) requests are received
  3. If no request is received within a period of time (random 150-300ms), the Candidate will change his identity to a Candidate and run for the Leader. If he gets more than half of the votes, he will become the Leader
  4. If the Leader is not elected at last, Term + 1 starts the next round of election

Idempotence

Running multiple operations has the same impact as running an operation only once

1. Need idempotent case

  • The user submits the form repeatedly

  • Users repeatedly cast malicious votes

  • The interface retried the request due to timeout

  • Messages are re-consumed

    Unknown problems with the system caused by retries must be avoided

2. Solutions

  • Database field unique

    • Apply to the operation
      • add
      • delete
    • limit
      • The data table must contain unique fields
    • Operation process
      1. The upstream service generates the ID (field unique), which is passed in when the downstream service is invoked
      2. The downstream service reported an error when adding data if the data already existed
  • Optimistic database locking

    • Apply to the operation
      • update
    • limit
      • Additional fields need to be added to the data table
    • Operation process
      1. The upstream service will pass the current version value of the record to the downstream service

      2. Before modifying the record, the downstream service checks whether the version matches, updates the record after the match is successful, and adds the version value to 1

  • Token Token

    • Apply to the operation
      • add
      • update
      • delete
    • limit
      • Redis is used for verification
    • Applicable scenario
      • User submits order
    • Operation process
      1. When the database record is added successfully, the user’s Token value is used as the Key of Redis, and a record with a lifetime (10 seconds) is inserted into Redis.
      2. When adding data to the database, look for Redis, and if the record exists, it indicates repeated addition
  • Unique serial number

    • Apply to the operation
      • add
      • update
      • delete
    • limit
      • Redis is used for verification
    • Applicable scenario
      • The service is invoked indirectly
    • Operation process
      1. The upstream service generates a unique serial number, which is stored in Redis as a Key, and passes the serial number to the downstream service
      2. Before adding data, the downstream service checks whether there is a record in Redis. If there is a record, it indicates that it is added for the first time, and deletes the record in Redis after adding data

X. Distributed system design strategy

1. Heartbeat detection

Usually carries status and metadata information for easy management

  • Periodic heartbeat detection – Response time out, dead
  • Cumulative failure detection – Initiates a limited number of retries on a dying node

2. High availability

  • Active/Standby mode [Common] n/A When the host is faulty, the standby host takes over all the work of the host. N/A After the host is recovered, the services are switched back to the host in automatic (hot backup) or manual (cold backup) mode, such as MySQL and Redis
  • Mutual backup mode (multi-master) – Two hosts run simultaneously and monitor each other
  • Cluster mode – multiple nodes run at the same time and share requests through the primary node – high availability of the primary node is addressed

3. Load balancing

The solution

  • Hardware – F5
  • Software – LVS, HAProxy, Nginx

strategy

  • random
  • polling
  • The weight
  • The minimum connection
  • IP hash

11. Network communication

1. RPC

Remote Procedure Call Remote Procedure Call

role

  1. Client Client – Service caller
  2. Client Stub – Packages Client requests into network messages and sends them to the Server Stub over the network
  3. Server Stub – Receives the Client Stub message, unpacks the message, and invokes the local method
  4. Server Server – Service provider

2. RMI

Remote Method Invocation

Java native support for communication between different Java virtual machines

role

The client

  1. Stub Stub – Client proxy
  2. Remote Reference Layer – Parses and runs the Remote Reference protocol
  3. Transport Transport layer – Calls remote methods to receive run results

The service side

  1. Transport – Receives client requests and forwards them to the Remote Reference Layer
  2. Remote Reference Layer – Call the Skeleton method
  3. Skeleton – Calls the actual method

Registry – Registers remote objects with a URL and replies a reference to the remote object to the client

Twelve, IO

1. Noun explanation

zhuanlan.zhihu.com/p/22707398

  1. Block – After a call is made, it is immediately suspended until the system reads the data and returns it to the application layer

  2. Non-blocking – Returns immediately after the call is made

  3. Synchronization – Actively asks the system kernel if the data has been read

  4. Asynchronous – do not ask the system kernel, but the system kernel read after completion, active notification


Type 2.

  1. BIO synchronously blocks IO – one thread at a time

    • simple
    • High resource overhead

  2. NIO synchronous non-blocking IO – connect one channel at a time – registers the connection to a multiplexer, which polls the channel and starts thread processing only when there is data in the channel

  3. AIO asynchronous non-blocking IO – connection operations are delegated only to the OS, which is responsible for notifying the service and calling the service completion callback

Chat 🏆 technology project stage v | distributed those things…