Article content Output source:
Pull education Java high salary boot camp
Hash algorithm application scenarios
- Application scenarios of the Hash algorithm in distributed cluster architecture
Hash algorithm has been applied in many distributed cluster products, such as distributed cluster architecture Redis, Hadoop, ElasticSearch, Mysql sub-library sub-table, Nginx load balancing, etc.
- The main application scenarios can be summarized as two:
- Load balancing of requests (such as Nginx’s IP_hash policy)
Nginx uses the IP_hash policy to route requests sent by clients to the same destination server when the CLIENT IP address remains unchanged, which implements sticky sessions and avoids session sharing. How can session stickiness be achieved without the IP_hash policy? For example, you can maintain a mapping table between client IP or sessionID and a specific target server, such as' < IP,tamcat1> '. What are the downsides to this? 1) When there are a large number of clients, the mapping table will be very large, wasting storage space. 2) When clients and target servers go online or offline, re-maintenance of the mapping table will result in high maintenance costsCopy the code
- Distributed storage
Take distributed cluster architecture Redis as an example. There are three Redis servers, redis1, Redis2 and ReDIS3 in the cluster. Which Redis server should be locked to access data? For example, when data <key1,value1> is stored, the key can be hash(key1)%3=index. The remainder index is used to lock the specific server node and store the data. Hash (key1)%3=index. The remainder index is used to lock the specified server node and fetch the data.Copy the code
Problems with common Hash algorithms
The common Hash algorithm has a problem. For example, if there are tomcat1, Tomcat2, and Tomcat3 Tomcat servers in the ip_hash cluster,
Assuming that the fixed client IP has not changed, now Tomcat3 has a problem and is down, the number of servers has been changed from 3 to 2, and all the previous modules need to be recalculated.
In the real production environment, if there are many background servers and the number of clients is very large, the impact is very large;
A large number of client requests are routed to other servers, and the client sessions on the original server are lost.
So how to solve such a problem?
The answer is the consistent Hash algorithm that we’ll cover next.
Consistent Hash algorithm
- Design idea of consistent Hash algorithm
First, we have a line, and the beginning and end of the line are defined as0and2 to the 31st minus 1, bending such a straight line into a circular loop calledHash ring.
For the server, theIP address or host name of the serverFind the hash value and apply it to the hash ring;
For the client user, hash is also performed according to its IP address, corresponding to a location on the ring;
To determine which server to route a client request to, search the nearest server node clockwise on the Hash ring.
- Performance of the consistent Hash algorithm for server capacity reduction
If server 3 goes offline, the client that originally routed to server 3 searches for the nearest server clockwise and is routed to server 4, other client requests are not affected.
The migration of requests is minimal, and such an algorithm is very suitable for distributed clusters, avoiding mass migration of requests.
- Performance of the consistent Hash algorithm for server capacity expansion
After server 5 is added, some clients originally routed to server 3 are routed to server 5, and other client requests are not affected.
The migration of requests is minimal, and such an algorithm is very suitable for distributed clusters, avoiding mass migration of requests.
- Advantages of the consistent Hash algorithm
In the implementation process of the consistent Hash algorithm, each server is responsible for a segment, and the increase or decrease of nodes in the consistent Hash algorithm only needs to relocate a small part of the data in the ring space, which has good fault tolerance and scalability.
Problems and solutions of consistent Hash algorithm
So what are the downsides of consistent Hash algorithms? Well, there is.
- Data skew problem
When there are too few service nodes, the consistency hashing algorithm is easy to cause the data skew problem because the nodes are not evenly distributed.
For example, if there are only two servers in the system, the loop distribution is as follows. Node 2 is responsible for only a very small segment, and a large number of client requests fall on node 1Data skew problem.
- The solution
In order to solve the problem of data skew, the consistent Hash algorithm introduces the virtual node mechanism, that is, it computes multiple hashes for each service node and places one service node for each computed result, which is called the virtual node. This can be done by adding a number after the server IP address or host name. For example, three virtual nodes can be calculated for each server, so the hash values of “ip#1 of node 1”, “ip#2 of node 1”, “ip#3 of node 1”, “ip#1 of node 2”, “ip#2 of node 2”, and “ip#3 of node 2” can be calculated respectively to form six virtual nodes. When a client is routed to a virtual node, it is actually routed to the real node corresponding to the virtual node.
Nginx configures a consistent Hash load balancing policy
The ngx_HTTP_upstream_consistent_hash module is a load balancer that uses an internal consistent hash algorithm to select the appropriate backend node.
The module can map requests evenly to back-end machines in different ways depending on configuration parameters:
consistent_hash $remote_addr
: Can be mapped by client IP addressconsistent_hash $request_uri
: Uri mapping based on client requestconsistent_hash $args
: Maps according to the parameters carried by the client
The first two are more commonly used.
The ngx_HTTP_upstream_consistent_hash module is a third-party module that needs to be downloaded and installed.
- Github downloads the Nginx consistent Hash load balancing module
Click to download
- Upload the downloaded package to the nginx server and decompress it
- Go to the nginx source directory and execute the following command
/configure -add-module =/root/ngx_http_consistent_hash-master make make installCopy the code
- Nginx is ready to use. Configure it in the nginx.conf file