preface

I remember sharing an article titled “Analysis of Consistent Hash Algorithm” a year ago. At that time, I only analyzed the implementation principle of this algorithm and what problems it solved.

But there is no actual implementation of such an algorithm, after all, to deepen the impression also have to masturbate again, so this time on the current needs of a routing to begin to achieve a.

background

Those of you who have read Build Yourself a Distributed IM(Instant Messaging) system should remember the logins.

Let me give you a brief introduction of what CIM does:

In one scenario, after the client logs in successfully, a service node needs to be selected from the available server list and returned to the client.

And this selection process is a load strategy process; The first version is relatively simple and only supports polling by default.

Useful, but not elegant 😏.

Therefore, MY plan is to build multiple routing policies for users to choose according to their own scenarios, and to provide simple API for users to customize their own routing policies.

Let’s take a look at some characteristics of the consistent Hash algorithm:

  • To construct a0 to 2 ^ 32-1The size of the ring.
  • The service node hashes itself into the index in the ring.
  • Clients are also positioned in the ring based on some hash of their own data.
  • Find the nearest node clockwise, that is, the service node of this route.
  • Considering the number of service nodes and the uneven distribution of data in the ring caused by the hash algorithm, virtual nodes are introduced.

User-defined ordered Map

Given these objective conditions it is easy to think of a custom ordered array to simulate the ring.

So our process is as follows:

  1. Initialize an array of length N.
  2. The positive integer obtained by the hash algorithm of the service node and the data of the node itself (hashcode, IP, port, etc.) are stored here.
  3. After the node is stored, the entire array is sorted (there are several sorting algorithms).
  4. When the client obtains the routing node, it hashes itself to obtain a positive integer.
  5. Iterating through the array until it finds a hash value greater than or equal to that of the current client, the current node is used as the node to be routed by that client.
  6. If no data larger than the client is found, the first node is returned (satisfying the ring’s property).

Let’s ignore the sorting time and look at the time complexity of this route:

  • It’s better to find it the first timeO(1).
  • The worst case is that it is found after traversing the array, and the time complexity isO(N).

After the theory, let’s look at the concrete practice.

I have a custom class: SortArrayMap

His methods and results are as follows:

If hashcode = 101 is passed in, hashcode = 1000 will be returned clockwise.


Let’s look at the implementation.

The member variables and constructors are as follows:

The core of this is a Node array that holds the hashcode and value values of the service Node.

The internal Node class structure is as follows:


Write data as follows:

The write logic is similar to ArrayList, which I’m sure you’ll remember from the source code.

  • Determine whether the data needs to be expanded before writing the data. If necessary, copy the original 1.5 times group to store the data.
  • And then I’m going to write an array, plus 1.

But the storage is in accordance with the order of writing storage, traversal of nature will not be orderly; So we provide a Sort method that sorts the data by key, which is hashcode.

Sorting is also relatively simple. The array tool is used for sorting, which actually uses a TimSort sorting algorithm with high efficiency.

Finally, the corresponding node needs to be searched clockwise according to the standard of consistent Hash:

The code is relatively simple and clear; If the array is found to be larger than the current key, the first key is returned.

This basically fulfills the requirements for consistent Hash.

Ps: This does not include specific hash methods and virtual nodes and other functions (see below for detailed implementation). This can be decided by users. SortArrayMap can be used as a low-level data structure to provide ordered Map capabilities, and the application scenarios are not limited to consistent hash algorithms.

TreeMap implementation

Although SortArrayMap implements the function of consistent hash, its efficiency is not high enough, mainly reflected in sort.

The following figure shows the time complexity of the current mainstream sorting algorithm:

The best I can do is order N.

There’s a whole other way of thinking about it, instead of sorting the data; Instead, they are sorted as they write, which makes writing less efficient.

Binary lookup trees, for example, are data structures that are already implemented in the JDK; TreeMap, for example, is implemented using a red-black tree, which naturally sorts keys by default.


Let’s see how we can achieve the same effect using TreeMap.

127.0.0.1000
Copy the code

The effect is the same as using SortArrayMap above.

Only some of TreeMap’s apis are used:

  • When data is written,TreeMapA natural ordering of keys is guaranteed.
  • tailMapA portion of data larger than the current key can be retrieved.
  • When this method returns data taking the first node is the first node in the clockwise.
  • If it doesn’t return you just take the whole thingMapThe first node also implements the ring structure.

Ps: There are also no hash methods and no virtual nodes (see the implementation below) because TreeMap, like SortArrayMap, is used as an underlying data structure.

The performance comparison

To make it easier for you to choose which data structure to use, I wrote a million pieces of data each with TreeMap and SortArrayMap for comparison.

First SortArrayMap:

It took 2237 milliseconds.

TreeMap:

It took 1316 milliseconds.

The result is nearly twice as fast, so TreeMap is recommended because it doesn’t require any additional sort wastage.

Practical application of CIM

Let’s take a look at how cim is used in this application, including the virtual nodes and hash algorithms mentioned above.

Template method

In the application process, considering that even the consistent hash algorithm has multiple implementations, I define an abstract class for the convenience of its users to extend their own consistent hash algorithm. There are some template methods defined so that you can complete your algorithm by simply implementing it differently in a subclass.

AbstractConsistentHash, the main methods of this abstract class are as follows:

  • addMethods naturally write data.
  • sortMethods are used for sorting, but subclasses don’t necessarily need to be overridden, for exampleTreeMapSo you don’t have to have your own sort container.
  • getFirstNodeValueGet the node.
  • processIs client-oriented, and ultimately only needs to call this method to return a node.

Let’s see how this is done using SortArrayMap and AbstractConsistentHash.

It implements several abstract methods. The logic is the same as above, but it is extracted into different methods.

We just added a few virtual nodes to the add method, as you can see.

The control of virtual nodes in subclasses rather than abstract classes is also for flexibility. Different implementations may have different requirements on the number of virtual nodes, so it is better to customize.

But hash methods are in abstract classes, so subclasses don’t have to be overridden; Because this is a basic function, all you need is a common algorithm to make sure it hashes evenly.

Therefore, AbstractConsistentHash defines a hash method.

The algorithm here is copied from xxl_job, and there are other implementations on the web, such as FNV1_32_HASH, etc. Different but the same purpose.


This makes it very simple for the user:

He simply builds a list of services and passes the current client information into the process method to get the return of a consistent hash algorithm.


The same goes for TreeMap:

He doesn’t need to override sort because it’s already sorted when it writes.

For the client, only one implementation class needs to be modified, and nothing else needs to be changed.

The effect of running is the same.

So if you want to customize your own algorithm, you just need to inherit AbstractConsistentHash and rewrite the methods, without changing the client code.

Routing algorithm scalability

But the real extensibility for CIM is for routing algorithms, such as polling, hash, consistent hash, random, LRU, etc.

However, there are multiple implementations of consistent hash, and their relationship looks like this:

The application also needs to support flexible routing policies of this type. For example, I want to customize a random policy.

Therefore, an interface is defined: RouteHandle

public interface RouteHandle {

    /** * Route to another batch of servers *@param values
     * @param key
     * @return* /
    String routeServer(List<String> values,String key) ;
}
Copy the code

There is only one method, the routing method; The input parameters are the service list and client information.

And for a consistent hash algorithm is only need to implement this interface, at the same time, choose to use in this interface SortArrayMapConsistentHash TreeMapConsistentHash still can.

There’s also a setHash method that takes AbstractConsistentHash; This is used to specify what kind of data structure the client needs to use.


The RouteHandle interface is also implemented for the existing polling strategy.

I’m just moving the code over here.

Let’s take a look at how the client is used and how to choose which algorithm to use.

To keep the client code almost motionless, I put this selection process into the configuration file.

  1. If you want to use the old polling policy, the configuration is implementedRouteHandleFully qualified name of the polling policy for the interface.
  2. If you want to use a consistent hash policy, you just need to configure the implementationRouteHandleFully qualified name of the consistent hash algorithm of the interface.
  3. Of course, there are multiple implementations of consistent hash, so once you have configured consistent hash, you need to add another configuration to decide to use itSortArrayMapConsistentHashorTreeMapConsistentHashOr other customized solutions.
  4. Again, you need to configure inheritanceAbstractConsistentHashFully qualified name of the.

No matter how the strategy changes here, it remains the same in use.

Just inject RouteHandle and call its routeServer method.

@Autowired
private RouteHandle routeHandle ;
String server = routeHandle.routeServer(serverCache.getAll(),String.valueOf(loginReqVO.getUserId()));

Copy the code

Since injection is used, this policy switch is actually done when the RouteHandle bean is created.

It is also relatively simple, requiring reading the previous configuration file to dynamically generate the concrete implementation class, mainly using reflection.

In this way, it is more flexible. For example, if you want to create a random routing policy, the same routine is used. All you need to do is change the configuration.

Interested friends can also submit PR to add more routing policies.

conclusion

Hope to see friends here can understand this algorithm, and some design patterns in the actual use can also be helpful.

I believe in the golden three silver four interview process or can let the interviewer in front of a bright, after all, according to my this period of time to see the interview process heard this noun in a few 😂 (may also be and candidates are in 1~3 years of this level).

All the above source code:

Github.com/crossoverJi…

Please do not hesitate to forward this article if it has helped you.