Redis data sharding

“This article has participated in the call for good writing activities, click to view: the back end, the big front end double track submission, 20,000 yuan prize pool waiting for you to challenge!”

First, talk about the problems existing in single-node Redis

A single point of failure
Data capacity Issues
Connection number, request pressure issues

The master-slave + sentry architecture mentioned above solves the single point problem and request pressure problem, but the data capacity is still 1:1 clone data. The data capacity problem still exists, and the data is not distributed to each node.

How to solve the problem of single point data capacity

A: Client – based solution

1. Split service data

From the business point of view, different modules fall into different Redis nodes according to the agreed logic.

For example, one Redis stage is used for reviews, another node is used for merchandise information, and another node is used for shopping vehicles.

2. Route using the Hash algorithm

2.1 Modula: Hash (ID)%redis number of nodes

Disadvantages: When the number of redis nodes changes, the data allocation rules are broken.

2.2 Random: Random allocation

Usage scenario: Message queue.

Data randomly falls into different nodes. For clients, it doesn’t matter which node the data falls into, as long as they know the key, they can get the data.

2.3 Ketama: The consistent Hash algorithm allocates data to different nodes.

A virtual ring node is planned and the node and data are involved in the location allocation algorithm.

Advantages: Adding nodes can share the pressure of other nodes without causing global shuffling. The original data is still in the physical nodes originally planned.

Disadvantages of consistent Hash:

The risk of breakdown: the original data is in node A, but the additional node causes subsequent data sources to obtain data from node E, but E fails to obtain data, thus sending requests to the database, which may lead to the risk of breakdown of cache and data breakdown.

Solution: When the calculated nearest node does not obtain data, try to obtain data from the two physical nodes nearest to the calculated value.

Data skew -> cache avalanche

Hypothesis: when we first launched Redis only two nodes, and my data is the key may be based on a certain benchmark growing up to do, so will cause data about leaning on a node, the most extreme can lead to all at the request of the nodes, so as to bring down the node, triggering an avalanche cache.

Solution: The key point of the above problem is that there are few physical nodes, and the data points are either black or white. Can we add logical nodes? There are only two physical nodes, but there are different ports. Ok, then add random numbers after each IP to generate logical nodes.

B: Assign someone to act as a data route

In the above solution, we put the logic of data routing on the client side, but for the client, the connection can be treated as several types of data directly on the server side. This is also costly to the server. The revolution has not yet succeeded, and efforts are needed.

Client based routing

The renters all go to the landlord to show the house how can they stand it, right? Then find an intermediary

Proxy routing

All we need to focus on is the performance of the proxy. Is there a Proxy wheel already built?

Twitter is open source: TwemProxy

Predixy: a high performance full feature Redis agent

C.R edis Cluster

The defect of the scheme we mentioned before is that adding and removing nodes need to rehash all the data, and then migrate the data according to the result of rehash. Can you preplan your data? Suppose there are only two nodes at the beginning. The data belongs to 10 slots. In the Redis Cluster, 16,384 slots are allocated to the data. The data is then rehash and migrated to the new node.

In this case, we just add nodes. Does the data still have high frequency rehash? And if you want to interchange and complement two data of the same type, Redis can’t do it either. For example, mysql data from two physical libraries cannot be inner joined.

Redis doesn’t want to be teased for using hash tags instead of simply using keys. It’s up to the user, and if you want to do calculations and transactions on this part of the data then you should partition it as much as possible on a physical machine. You tag the data yourself so that you can hash the same tag on the same physical machine.

One copy for each nodeData-nodeFor example, redis1 stores hash data corresponding to 0, 1, and 2. Redis1 is 5, 6, and 7. In this case, the user requests key1 and its hash value is 3.

The request is then routed to redis2/redis1 nodes, which do not store the data.
In redis2, the hash value of key1 is3, and the relation to the mapping table is obtained. Oh, the data is in Redis3, and the client is informed to fetch it in Redis3.
The client goes to Redis3 to get data.

If the article is helpful to you, move your little hands and point a compliment, and let me know that shanhai is not a person fighting.

I am mountain hai ge, the rest of my life, you and I roll together.

First, talk about the problems existing in single-node Redis

How to solve the problem of single point data capacity

A: Client – based solution

1. Split service data

2. Route using the Hash algorithm

B: Assign someone to act as a data route

C.R edis Cluster

Related Posts

Java 13, 8 new features you still can’t use

Data Analysis Practical case: Pandas’ use in Titanic passenger data

Business Development Must-see – Entity mapping tool recommended