This is the interviewer's favorite "distributed" core design question

What technology is hot right now?

Big data, artificial intelligence, blockchain, edge computing, microservices, but the bottom of so many cutting-edge technologies all depend on distribution.

Distributed core: Disassembly

The difference between microservices and distribution

Distributed: whether horizontal split or vertical split, split on the line
Micro services: vertical split (split according to business logic, e-commerce: users, payment, shopping…) , minimize split

Horizontal split: JSP /servlet, Service, DAO split at different levels, vertical split: according to the business logic split into independent small projects

Theory of CAP

Principles that must be considered in any distributed system.

C: Consistency (strong consistency) : data on all child nodes is consistent at all times
A: Usability: overall availability
P: partition fault tolerance: partial failure is allowed

CAP theory: in any distributed system, C\A\P cannot coexist, only two can exist.

Basics: In general, at least P should work, because distributed networks often have weak networks. So you have to choose between C and A.

Here’s an example:

When computer A fails, the fault tolerance of the partition is satisfied. If the consistency is satisfied, the partition must be rolled back. Otherwise, computer B has data and Computer A does not, so data consistency is not satisfied.

The BASE theory of

The purpose of BASE theory: to make up for the deficiency of CAP.

Concepts to understand:

Strong consistency (consistent all the time, consistent in a short time)
Final consistency (as long as the final consistency is ok)
Soft state: When multiple nodes are deployed, data inconsistency is allowed.

Try your best to approximate CAP 3: final consistency instead of strong consistency C

BASE theory: the preference satisfies A\P and therefore does not satisfy C. But you can replace C with final consistency.

BASE: Basically Available

Distributed cache

The cache problem

Cache breakdown:

A hotspot data expires, resulting in a large number of user requests directly to the DB.

Solution:

Monitoring thread
Set the time in advance to ensure that hotspot data does not expire during peak hours

Cache avalanche:

A large number of caches all fail (a large number of caches are set to expire at the same time; Cache server failure)

Solution:

Allocate proper cache expiration time
Setting up a Cache Cluster

Cache penetration:

Malicious attack. In general, meaningless data is not cached. But a malicious tool can make repeated requests with meaningless data.

Solution:

Meaningless data is also cached, and the expiration time is set relatively short.

The essence of the above is “cache invalidation”, a common solution: level 2 caching (distributed caching).

In general, local cache is level 1 cache and distributed cache is level 2 cache.

Consistency of the hash

Hash algorithm: Mapping

Strings, pictures, and objects are converted to numbers

Hash (a.png) converts to 12312313

Consistent hash was originally used to solve distributed caching problems

One problem with mapping data to a cache server is that the cache is invalidated when the number of servers changes.

Resolve problem caused by Server Number Change: Consistent hash

Hash skew problem

Resolve hash skew: virtual nodes

Generate multiple virtual nodes in the replacement, the subsequent request to find the virtual node first, and then through the virtual node to find the corresponding real node.

Cache consistency

The easiest thing to think about, but not recommended: update the cache as soon as DB is updated; Delete the cache first, then update the DB.

Recommendation: Delete the cache immediately after DB is updated

Analysis :(counter example)

Update the cache as soon as DB is updated (not recommended)

Question:

If thread A updates first and thread B updates later, the end result may be the result of THREAD A. Cause: Thread execution speeds may be inconsistent.

Recommendation: Delete the cache immediately after DB is updated. Because: For deletes, there is no speed sequencing problem (there is no speed inconsistency problem)

(With this recommended approach, there is still a very small probability of error.)

This is an error case of this recommendation, but the probability of occurrence is extremely small: 1. 2. The occurrence of the premise: the writer thread is faster than the reader thread, almost impossible.

Q: Can this extremely small error situation be avoided?

A: can!!!! The main premise of the above errors is that the read and write operations are interleaved.

Solution: Don’t let the two intersect. Lock, or read/write separation. But general advice, don’t solve.

Can switch the order: delete the cache, and then DB update after

Example:

The premise of the above error: write slowly, read fast. Therefore, the probability of occurrence is greater.

A distributed lock

Distributed locks: database unique constraints, Redis, Zookeeper

In architecture development: there is no best, only the right.

Use Zookeeper to implement distributed locks

Zookeeper: Distributed coordination framework consisting of a tree (similar to a DOM tree)

The tree species contains many leaf nodes, the type of leaf nodes used in distributed locks: temporary sequential nodes.

Zookeeper provides the following two types of support:

Listener: can listen to the state of a node, when the state changes to trigger the corresponding event
Timing of temporary node deletion: when the client disconnects from the temporary node accessed

Lock: Only one thread/process can access it at a time

Distributed transaction

Local transactions: One transaction for one database connection. (Transaction: Connection = 1:1)

Local transactions cannot be used if they do not match the 1:1 relationship described above. For example:

1. A node has multiple database instances. Two databases have been set up locally: order database and payment database.

Order operation = Order database + payment database

2. In a distributed system, multiple services deployed on different nodes access the same database.

The usefulness of local transactions is very limited.

Distributed transaction implementation

The sample

Order operation = Order database + payment database

Server A: Order operation, order database

Server B: Payment database

Simulate distributed transactions: single order operation

Begin the transaction:

1. Execute the order database;

2. Execute the payment database

commit/rollback ;

Above, there are problems in a distributed environment:

begin transaction:

1. Execute the order database locally (server A);

2. Remotely invoke the payment database on server B

commit/rollback ;

If the local order service is successful and the remote payment is successful, but the response cannot be made due to network problems, the user will think that the order has failed.

Above, distributed transactions can be used when local transactions cannot operate.

Use 2PC to implement distributed transactions

2PC: 2 Phase Commit, consisting of a coordinator and multiple participants (similar to master-slave structure)

Coordinator: Transaction manager,TM

Participant: Resource manager,RM

Two-stage refers to:

Prepare phase: When the transaction begins, the coordinator sends a Prepare message to all participants, asking them to execute the transaction. Participants receive the message and either agree or reject it. If agreed, the transaction is executed locally, logged, but not committed.

Commit phase: If all participants agree, the coordinator sends the commit request to all participants again. Otherwise, rollback is performed.

The general idea is as follows: Perform a task in two steps: 1. Request each node to perform the task. 2. If everyone agrees, ask to submit it together

Disadvantages of 2PC: 1. Distributed transactions are blocking during execution, thus introducing latency 2. If coordination encounters a weak network environment at step 5, some nodes may not commit. 3. Centralized architecture, single point of disaster.

Other distributed transaction solutions: Three-phase commit, TCC for distributed transactions, message queues for distributed transactions

Note: If you want to ensure strict transaction consistency: PaxOS algorithm. Google chubby author

Distributed authentication & Distributed authorization

Authentication method: system development, tripartite platform

System development: distributed authentication

Third-party Platform: Distributed Authorization (SSO SSO)

Distributed authorization: OAuth2.0 authorization protocol, the flow of which is as follows:

Original link: blog.csdn.net/qq_41620020… Wenyuan network, only for the use of learning, such as infringement, contact deletion.

I’ve compiled the interview questions and answers in PDF files, as well as a set of learning materials covering, but not limited to, the Java Virtual Machine, the Spring framework, Java threads, data structures, design patterns and more.

Follow the public account “Java Circle” for information, as well as quality articles delivered daily.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

This is the interviewer’s favorite “distributed” core design question

Theory of CAP

The BASE theory of

Distributed cache

The cache problem

Consistency of the hash

Cache consistency

A distributed lock

Distributed transaction

Distributed transaction implementation

Use 2PC to implement distributed transactions

Distributed authentication & Distributed authorization

This is the interviewer’s favorite “distributed” core design question

Theory of CAP

The BASE theory of

Distributed cache

The cache problem

Consistency of the hash

Cache consistency

A distributed lock

Distributed transaction

Distributed transaction implementation

Use 2PC to implement distributed transactions

Distributed authentication & Distributed authorization

Related Posts

MYSQL: index

Sort/mean/variance for C++ struct arrays and STD ::vector

WaitForSingleObject (WaitForSingleObject, WaitForSingleObject