In the microservices world, caching with Redis is not an easy task.

Such as Sina, Twitter applications, a lot of hot data are stored in Redis this layer, hit the DB layer of the request is not much, can be said to rely on the cache. If the cache fails, all the traffic penetrates to the DB layer, which is bound to be unbearable and the whole system will be paralyzed, with very serious consequences. Due to the large amount of cached data, Redis is fast in its fast access based on memory, and computer memory resources are very limited, so distributed cache cluster is facing the requirements of scalability.

The existence of consistent Hash

Each instance in the Redis cluster does not know each other, so the routing method needs to be implemented on the client side to route keys to different Redis nodes.

The routing algorithm is the key. It must minimize the impact of the newly online cache server on the entire distributed cache cluster, so that the cached data in the entire cache server cluster can be accessed as much as possible after capacity expansion.

If the common hash algorithm for keys is used, the hit ratio after capacity expansion is very low. As shown in the following table, when a cluster is expanded from three nodes to four nodes, 75% of keys cannot be hit.

hash(key) hash(key)/3 hash(key)/4 Whether to hit
1 1 1 is
2 2 2 is
3 0 3 no
4 1 0 no
5 2 1 no
6 0 2 no
7 1 3 no
8 2 0 no
9 0 1 no
10 1 2 no
11 2 3 no
12 0 0 is

This is too bad, when the number of servers is 100, adding a new server will fail to hit 99%, which is the same effect as the entire cache service.

Consistent Hash is introduced to solve this problem, and the routing algorithm achieves the highest hit ratio possible by introducing a consistent Hash ring and further adding virtual node layers. Using this algorithm, when the node is expanded from N to N +1, the hit ratio can be kept around N /(n+1).

There have been some very thorough articles about the specific principles of this algorithm on the Internet, which will not be described here. The following mainly from the code implementation and operation of the way to show the effect of this algorithm.

Multiple Redis nodes are deployed locally

To validate a consistent Hash, you need to start with a Redis cluster. Here I simulate this pattern by deploying multiple Instances of Redis on the machine to point to different ports.

Create a project directory: $mkdir redis-conf Copy the redis configuration and make five copies. Name them as redis-6379.conf to redis-6383.conf.

You need to make some changes to its contents to start properly. Find the following two lines in the configuration file and modify the numbers accordingly.

port 6379
pidfile /var/run/redis_6379.pid
Copy the code

Redis-server./ redis-6379&can then be started separately

You can use redis-cli -p 6379 to specify the redis-server to connect to. Try setting key 1 and 2 at 6379 and getting nil at 6380 to get 1 means they are working independently and are ready to test.

Code implementation

Deploy 4 nodes from 6379 to 6382 using a consistent Hash algorithm to select key: A total of 100,000 keys from 0 to 99999 are set to the four servers respectively, and then a node 6383 is deployed. At this time, get is performed again from 0 to 99999, and the number of get times is counted to verify whether the hit ratio is about 80% (4/5) as expected.

The implementation of consistent Hash algorithm is heavily borrowed from this article. Red-black tree is used as data structure to achieve log(n) search time complexity, FNV1_32_HASH algorithm is used to distribute keys and nodes more evenly as much as possible, and virtual nodes are introduced to do load balancing.

Readers are advised to read this article in detail, which is very detailed and easy to understand.

Here is the code I rewrote:

package org.guerbai.io.jedistry;

import redis.clients.jedis.Jedis;
import java.util.*;

class JedisProxy {

   private static String[][] redisNodeList = {
           {"localhost"."6379"},
           {"localhost"."6380"},
           {"localhost"."6381"},
           {"localhost"."6382"}};private static Map<String, Jedis> serverConnectMap = new HashMap<>();

   private static SortedMap<Integer, String> virtualNodes = new TreeMap<>();

   private static final int VIRTUAL_NODES = 100;

   static
   {
       for (String[] str: redisNodeList)
       {
           addServer(str[0], str[1]);
       }
       System.out.println();
   }

   private static int getHash(String str)
   {
       final int p = 16777619;
       int hash = (int)2166136261L;
       for (int i = 0; i < str.length(); i++)
           hash = (hash ^ str.charAt(i)) * p;
       hash += hash << 13;
       hash ^= hash >> 7;
       hash += hash << 3;
       hash ^= hash >> 17;
       hash += hash << 5;

       // If the value is negative, take the absolute value
       if (hash < 0)
           hash = Math.abs(hash);
       return hash;
   }

   private static String getServer(String node)
   {
       // Get the Hash value of the routed node
       int hash = getHash(node);
       // Get all maps greater than the Hash value
       SortedMap<Integer, String> subMap =
               virtualNodes.tailMap(hash);
       // The first Key is the node closest to the node clockwise
       if (subMap.isEmpty()) {
           subMap = virtualNodes.tailMap(0);
       }
       Integer i = subMap.firstKey();
       // Returns the name of the corresponding virtual node with a slightly truncated string
       String virtualNode = subMap.get(i);
       return virtualNode.substring(0, virtualNode.indexOf("&"));
   }

   public static void addServer(String ip, String port) {
       for (int i = 0; i < VIRTUAL_NODES; i++)
       {
           String virtualNodeName = ip + ":" + port + "&&VN" + String.valueOf(i);
           int hash = getHash(virtualNodeName);
           System.out.println("Virtual node [" + virtualNodeName + "] added with hash value" + hash);
           virtualNodes.put(hash, virtualNodeName);
       }
       serverConnectMap.put(ip+":"+port, new Jedis(ip, Integer.parseInt(port)));
   }

   public String get(String key) {
       String server = getServer(key);
       Jedis serverConnector = serverConnectMap.get(server);
       if (serverConnector.get(key) == null) {
           System.out.println(key + "not in host: " + server);
       }
       return serverConnector.get(key);
   }

   public void set(String key, String value) {
       String server = getServer(key);
       Jedis serverConnector = serverConnectMap.get(server);
       serverConnector.set(key, value);
       System.out.println("set " + key + " into host: " + server);
   }

   public void flushdb(a) {
       for (String str: serverConnectMap.keySet()) {
           System.out.println("Empty host:"+ str); serverConnectMap.get(str).flushDB(); }}public float targetPercent(List<String> keyList) {
       int mingzhong = 0;
       for (String key: keyList) {
           String server = getServer(key);
           Jedis serverConnector = serverConnectMap.get(server);
           if(serverConnector.get(key) ! =null) { mingzhong++; }}return (float) mingzhong / keyList.size(); }}public class ConsistencyHashDemo {

   public static void main(String[] args) {
       JedisProxy jedis = new JedisProxy();
       jedis.flushdb();
       List<String> keyList = new ArrayList<>();
       for (int i=0; i<100000; i++) {
           keyList.add(Integer.toString(i));
           jedis.set(Integer.toString(i), "value");
       }
       System.out.println("target percent before add a server node: " + jedis.targetPercent(keyList));
       JedisProxy.addServer("localhost"."6383");
       System.out.println("target percent after add a server node: "+ jedis.targetPercent(keyList)); }}Copy the code

The above code makes some improvements to the reference article.

If the key is greater than the maximum hash value of the virtual node, the tailMap method will return null and an error will be reported if the node cannot be found. In fact, in this case, the virtual node with the smallest hash value should be searched. I added the processing and connected the ring.

The following getHash method is the FNV1_32_HASH algorithm, so don’t worry too much about it.

The value of VIRTUAL_NODES is important. When the number of virtual nodes is small, the greater the number of virtual nodes, the higher the hit ratio.

There is also a big difference in the program design, I wrote JedisProxy class, as a client to access the middle layer of Redis, in this class static block using server nodes to generate virtual nodes to construct a good red-black tree, getServer according to the tailMap method to take out the actual node address, Then get the Jedis object directly from the address of the actual node, provide simple GET and set methods, first get the specific Jedis object according to the key, and then carry out get and set operations.

AddServer static method gives it the ability to dynamically expand, as can be seen from the main method, by calling jedisproxy. addServer(“localhost”, “6383”) to directly add nodes, without stopping the application. The targetPercent method is used to count hit ratios.

When the virtual node is 5, the hit ratio is about 60%. When it is 100, the expected hit ratio of 80% can be achieved.

Ok, perfect.