Hi, I’m crooked.

Today I take you to roll the time wheel, this thing is actually quite practical.

It is often used in various frameworks, sometimes in the interview process, it is a little difficult to understand, but after knowing the principle, it also feels:

When most people talk about time wheels, they start with Netty.

I’m going to start with Dubbo, because it was in Dubbo that I first encountered the time wheel, and it blew me away.

Also, Dubbo’s time wheel is taken from Netty’s source code, basically the same.

The time wheel is used several times in Dubbo, such as heartbeat packet sending, request call timeout detection, and cluster fault tolerance policy.

I’ll start with this class in Dubbo:

org.apache.dubbo.rpc.cluster.support.FailbackClusterInvoker

Failback, one of the cluster fault-tolerant policies:

It doesn’t matter if you don’t know about Dubbo, you just need to know that Dubbo is described on the website:

What I want to highlight is the word “timed retransmission”.

Without looking at the source code, what comes to mind when you think about timed retransmissions?

Do you have a scheduled task in mind?

So how to achieve a scheduled task?

ScheduledExecutorService and Timer are two classes that most people think of in the JDK.

Not to mention Timer, performance is not high enough and it is no longer recommended.

ScheduledExecutorService has three main types of methods:

A few words about scheduleAtFixedRate and scheduleWithFixedDelay.

ScheduleAtFixedRate: indicates that each execution time is an interval for pushing back the start of a previous task.

ScheduleWithFixedDelay: indicates that each execution time is an interval between the start and end of the last task.

The former emphasizes the start time of the previous task, while the latter emphasizes the end time of the previous task.

ScheduleAtFixedRate schedules tasks based on fixed time interval. ScheduleWithFixedDelay schedules tasks based on unfixed time interval, depending on the duration of each task execution.

Therefore, if ScheduledExecutorService is used to implement the scheduled retransmission function, ScheduleWithFixedDelay is better, which means that the next retry should be performed after the previous retry.

Serialize the entire retry function.

So how does Dubbo implement the need for timed retries?

Lu source code ah, no secret under the source code.

Get ready to start.

Lu source

Some students see here may be worried: not to say about the time round, how began the source ah?

Don’t worry, I have to take steps.

I will take you to tear up the source code of Dubbo, let you know what the problem is, and then I will talk about the solution.

Besides, you can’t take it if I just, like, poof, throw the solution in your face.

I prefer a softer approach to teaching.

Well, first look at the source code below.

It doesn’t matter if you don’t understand these lines of code, just focus on the logic of the catch.

I matched the code to the description on the official website.

This means that the call failed and there is an addFailed.

What did addFailed do?

This is the thing called “timed recurrence” :

org.apache.dubbo.rpc.cluster.support.FailbackClusterInvoker#addFailed

This method can answer the question we raised earlier: how does the Dubbo cluster fault tolerance implement the timing retry requirement?

(1) ScheduledExecutorService (scheduleWithFixedDelay) is used.

More specifically, if the cluster fault tolerance policy is failback, retries are initiated every RETRY_FAILED_PERIOD second after the RETRY_FAILED_PERIOD request fails until the retry succeeds.

What is RETRY_FAILED_PERIOD?

Look at line 52. It’s 5 seconds.

In addition, you can see the place labeled ③ in the previous addFailed method, which is putting something into failed.

What is failed?

If you look at line 61, it’s a ConcurrentHashMap.

In the area labeled ③, the key to failed put is the request to be retried, and the value is the server that processes the request.

When is failed map used?

See the retryFailed method labeled ② :

In this method, the map failed will be iterated, all of which will be taken out and called again.

If successful, the remove method is called to remove the request. If unsuccessful, an exception is thrown, a log is printed, and the request is tried again next time.

Here we have unveiled the mystery of Dubbo’s FailbackClusterInvoker class.

Hidden under the veil is a map and ScheduledExecutorService.

It doesn’t seem that difficult. I can think of a conventional solution.

So you slowly type a message on the screen:

But, guys, hold on tight. It’s a but. It’s a turn.

In fact, there is a problem, the most intuitive is the map, there is no limit to the size, because there is no limit to the size, in some high concurrency scenarios, it is possible to run out of memory.

Ok, so the question is, how do I keep from running out of memory?

It’s easy. First we can limit the size of the map, right?

For example, limit its capacity to 1000.

When it’s full, what do I do?

There could be an elimination strategy, FIFO, or LIFO.

And then you can’t keep retrying. If you retry more than a certain number of times, you should be killed.

The above mentioned memory overflow and solution, I am not talking nonsense.

I have proof, because I saw the evolution process of the FailbackClusterInvoker class from its submission record. The code in the previous screenshot is the code of the previous version optimized, not the latest code:

This time, I submitted an issue named 2425.

Github.com/apache/dubb…

The problems and solutions mentioned here are exactly what I said earlier.

Finally, the foreshadowing is complete, and the story of the time wheel begins in earnest.

Time wheel principle

Some friends start monkey anxious again.

I need the source code for the time wheel.

You don’t worry ah, I directly tell you the source code, you will certainly see mengbi.

So I decided to draw you a picture, see how it works.

Give you draw the basic appearance of the time wheel, understand the working principle of the time wheel, the following source code analysis to understand it is relatively easy.

First of all, the basic structure of the time wheel is actually an array, such as the following array of length 8:

How does it become a wheel?

End to end:

If each element represents a second, then the array would represent 8 seconds in a circle, like this:

Notice I said one lap, 8 seconds.

So 2 turns is 16 seconds, 3 turns is 24 seconds, 100 turns is 800 seconds.

Does that make sense?

Let me give you another picture:

Even though the array is only 8 in length, it can be superimposed over and over again, so it can represent a lot of data.

For example, if I draw the first three circles of the diagram above like this:

I hope you can see that. It doesn’t matter if you can’t see that. The main thing I want you to know is that there’s a concept of “laps.”

Ok, so I’m going to beautify this array, visually turning it into a wheel as well.

What about the wheels?

Wheel, so we now have an array called wheel:

And then, I’m going to fill it in and it’s going to look something like this.

For the sake of illustration, I only filled in the positions with subscripts 0 and 3, and the other places also have the same meaning:

So here’s the problem. Suppose I have a task that needs to be executed in 800 seconds?

800 mod 8 =0

What about another task that needs to be executed in 400 seconds?

For the same reason, continue to add later:

Don’t make the mistake of thinking that the winding number in the linked list corresponding to the subscript must be in descending order. This is not necessary.

Ok, now another task to execute in 403 seconds. Where should I hang it?

403 mod 8 = 3, then this is it:

Why do I bother to tell you how to calculate, how to hook into the corresponding subscript?

Because I need to elicit one more thing: a queue of tasks to be assigned.

When I drew the 800-second, 400-second, and 403-second tasks above, I left out another step.

It should look like this:

Tasks are not attached to the time wheel in real time, but are first placed in a queue to be allocated and then attached to the time wheel at a specific time.

When exactly?

I’m going to talk about the source code.

In addition to the queue to allocate, there is also a queue to cancel the task.

Because tasks put into the time wheel can be cancelled.

For example, in Dubbo, the time wheel mechanism is also used to verify whether the call times out.

Assume that the timeout period of an invocation is 5s, after which the task needs to be triggered to throw a timeout exception.

However, if the request is received within 2s without a timeout, then the task needs to be cancelled.

The corresponding source code is this, do not understand the matter, take a look at the line, I just to prove that I did not cheat you:

org.apache.dubbo.remoting.exchange.support.DefaultFuture#received

That’s what the schematic looks like, and then I need another diagram.

Give you the names of the fields in the source code corresponding to the figure above.

The main object to give you the corresponding, the back of the source code will not be too laborious:

The correspondence looks like this:

Notice that the “scope of worker’s work” in the upper left corner wraps the whole time round. When you look at the source code later, you will find that there is no thread safety problem in the core logic of the whole time round, because worker, a single thread, has finished all the work.

Finally, one more word: for example, in the previous FailbackClusterInvoker scenario, the time wheel triggers the retry task, but still fails, what happens?

So if you look at the source code, there’s a method called rePut that does just that:

org.apache.dubbo.rpc.cluster.support.FailbackClusterInvoker.RetryTimerTask#run

The idea here is that if the retry fails and the specified number of retries is not exceeded, the task can be returned to the time wheel again.

Wait, what else can I do if I know the number of retries?

For example, if you have connected to wechat Pay, its callback notification has this interval:

I know the current number of retries, so I can set the time to 10 minutes on the fifth retry and throw it into the time wheel.

The time wheel can fulfill the above requirements.

Of course, MQ’s delay queue works as well, but it is beyond the scope of this article.

However, there is another problem with using the time wheel to do the above requirement: the task is in memory, and if the service hangs, it is gone, which is a concern.

In addition to FailbackClusterInvoker, in fact, I think the time wheel is more suitable for heartbeat.

It’s fitting that Dubbo’s heartbeat is made with a time wheel.

org.apache.dubbo.remoting.exchange.support.header.HeartbeatTimerTask#doTask

As you can see from the figure above, the doTask method sends heartbeat packets, calls reput after each packet is sent, and then returns the task of sending heartbeat packets back to the time wheel.

Ok, no more extending the application scenario.

Next, enter the source code analysis, keep up with the rhythm, do not mess, everyone can learn.

Open book!

Time wheel source code

In front of the principle of understanding in place, next you can look at our source code.

First of all, in order to facilitate my screenshots, I have moved the source code in the following part of the screenshots, so it may be a little different from when you look at the source code.

Let’s look again at the use of time wheels in Dubbo’s FailbackClusterInvoker class.

First, the failTimer object is a familiar double-checked singleton pattern:

The failTimer initialized here is the HashedWheelTimer object and the key logic is that its constructor is called.

So, let’s start with the way it’s constructed and start tearing it up.

Let’s start by saying what the input parameters are:

  • ThreadFactory: a threadFactory that can specify the name of a thread and whether it is a daemon.
  • TickDuration: Indicates the interval between two ticks.
  • Unit: tickDuration Time unit.
  • TicksPerWheel: Indicates the number of ticks in the time wheel.
  • MaxPendingTimeouts: indicates the maximum number of waiting tasks in a time round.

So this is what the Dubbo time wheel means:

Create a daemon thread named failback-cluster-timer to execute tasks every second. The size of this time wheel is 32. The maximum number of waiting tasks is failbackTasks, which can be configured. The default value is 100.

But many other use cases, such as Dubbo checking whether the call times out, do not send maxPendingTimeouts:

org.apache.dubbo.remoting.exchange.support.DefaultFuture#TIME_OUT_TIMER

It didn’t even send ticksPerWheel.

Both of these parameters have default values. TicksPerWheel defaults to 512. MaxPendingTimeouts The default value is -1, which means that there is no limit on the number of tasks to be processed:

Ok, now let’s take a look at the time wheel constructor as a whole. I’ve commented on the action of each line:

There are a few points that I’ll single out for you.

For example, the createWheel method, if you’re familiar with it, is the same core code that checks capacity in HashMap.

This is also what I mentioned in the source code comment, the size of the array in the time wheel must be 2 to the n.

Why? You ask me why?

Don’t ask, ask is to do bit operation behind, operation SAO, fast speed, force lattice high.

I believe I don’t need to explain the following code snippet, if you don’t understand, go back to HashMap:

Mask = wheel.length – 1

Because we already know wheel.length is 2 to the n.

So assuming that our timed task has a latency of x, which cell should it be in the time wheel?

Should I mod the length by x? X % wheel.length.

However, the mod operation is actually not very efficient.

So how do you make this faster?

It’s wheel.length-1.

Wheel. length is 2 to the n, and all the lower levels of wheel.length are 1 after subtracting by 1.

So x % wheel.length = x & (wheel.length – 1).

In the source mask =wheel.length – 1.

So where does mask work?

One such place is in the run method of the Worker class:

org.apache.dubbo.common.timer.HashedWheelTimer.Worker

The idX calculated here is the subscript of the current array to be processed.

I’m just telling you that mask does participate in ampersand, so it doesn’t matter if you can’t read this part of the code, because I’m not there yet.

So don’t panic if you didn’t catch up, let’s move on.

We already have a time wheel, so how to call this time wheel?

Call its newTimeout method:

This method takes three inputs:

The meaning is clear, that is, the specified task (task) starts to trigger after the specified time (delay, unit).

Next, read the newTimeout method:

The key code is the start method, and I’ll show you what it does:

It’s divided into two parts.

The above is to maintain or judge the current state of HashedWheelTimer. From the source code, we know that the state has three values:

  • 0: initialization
  • 1: Started
  • 2: Closed

In the case of initialization, the status is updated to started with a CAS operation and the workerThread.start() operation is performed to start the worker thread.

The next part is a little bit more confusing.

If startTime equals 0, which is not initialized, call await of CountDownLatch and wait.

And the await is still await on the main thread, which is waiting for startTime to be initialized. What logic is that?

First, we need to find out where startTime is initialized.

This is in the Worker’s run method, which is triggered by the previous workerThread.start() :

org.apache.dubbo.common.timer.HashedWheelTimer.Worker

As you can see, after initializing startTime, it also determines whether it is equal to 0. That said, it is possible for the system.nanotime () method to return zero, a small detail that is interesting if you want to dig into it, but I won’t expand it here.

StartTime initialization is completed, immediately performed startTimeInitialized. CountDown () operation.

Doesn’t that echo back here?

Won’t the main thread be ready to run soon?

So the question is, what’s the point of doing a startTime initialization and not being able to proceed until you get the main thread?

Of course it works. Go back to the newTimeout method and continue:

Let’s analyze this equation up here.

First, System.nanotime () is the real-time time when the code executes to this place.

Because delay is a fixed value, ununit.tonanos (delay) is also a fixed value.

So system.nanotime ()+ ununit.tonanos (delay) is the number of nanoseconds the task needs to be triggered.

Let me give you an example.

Suppose system.nanotime () = 1000 and ununit.tonanos (delay)=100.

The time at which the task is triggered is 1000+100=1100.

Does that follow me?

So why subtract startTime?

System.nanotime () is initialized with a fixed value.

Isn’t system.nanotime ()-startTime almost zero?

What does the equation system.nanotime ()+ unit.tonanos (delay)-startTime mean?

Yes, that was one of the questions I had looking at the source code.

However, I later analyzed that only system.nanotime () was a variable in the entire equation.

System.nanotime ()-startTime does approach zero on the first count, but when it fires a second time, when the second task comes and its deadline is calculated, System.nanotime () is much larger than the fixed value of startTime.

Therefore, the execution time of the second task should be the current time plus the specified delay time minus the starting time of the worker thread, and so on.

This is where the main thread completes its time wheel logic.

What should I analyze next?

It must be time for the worker thread of the time wheel to take over.

The worker thread logic is all in the run method.

The core logic is in a do-while:

The loop ends if the current time wheel is not in the start state.

That is, the thread will keep running as long as the stop logic is not called for the time wheel.

Now let’s look at the logic of the loop line by line, which is the core logic of the time wheel.

Final Long Deadline = waitForNextTick()

First you look at the method name and you know what it does.

Is to wait in it until the next moment.

So the first line of the method is to figure out what the nanosecond is at the next moment.

In the for loop, the first part of the loop is confusing, except for the part marked ③, which tells the current thread to sleep for a specified time.

So the previous part is just figuring out what this particular time is.

How do we do that?

I can still make sense of the first part,

Deadline-currenttime tells us how long it will take to get to the next time scale.

I can’t read the rest of it.

The number 1,000,000 is easy to understand. It’s in nanoseconds, or a millisecond.

What the hell is 999999?

Actually, 999999 is just to add one millisecond to the value.

For example, if deadline-currentTime is 1000123 nanoseconds, then 1000123/1000000=1ms.

But (1000123 + 999999) / 1000000 = 2 ms.

That is to say, let the place marked ③ sleep an extra 1ms.

Why is that?

I don’t know, so I’m just going to leave it there for now, but it’s not a problem, and I’m going to keep going.

Here we go to step 2, which looks like a special treatment for the Windows operating system. SleepTimeMs is converted to a multiple of 10.

Why?

Here I have to criticize Dubbo, taking the Netty implementation and hiding the key information, which is not appropriate.

The Netty source code for this place looks like this:

Here is a clear indication of the way:

Github.com/netty/netty…

But follow the road, all the way down, and you’ll find a place like this:

www.javamex.com/tutorials/t…

I didn’t expect a bonus.

When a Thread calls thread. sleep, the JVM makes a special call that sets the interrupt period to 1ms.

The implementation of thread. sleep relies on interrupt checking provided by the operating system, which checks for any threads to wake up and provide CPU resources at each interrupt. So I think the reason for sleeping extra 1ms can be explained by this reason.

The hole left in front, so quickly filled, comfortable.

The second underlined line indicates that, on Windows, the interrupt period may be 10ms or 15ms, depending on the hardware.

So, for Windows, you need to sleep for a multiple of 10.

A piece of useless knowledge for you.

Now that we know what we’re talking about, the waitForNextTick method is in place, and all it does is wait, wait for a time scale, wait for a tick length.

And when we get there?

So I have this line of code int idx = (int) (tick & mask)

We have analyzed in front, calculate the current time corresponding to the subscript, bit operation, operation, fast, force lattice high, not to say more.

The code then executes to this method processCancelledTasks()

This is a queue that processes cancelled tasks:

The logic is straightforward: clear the cancelledTimeouts queue.

So this is remove, this is cleaning.

So where are we adding? Where are we adding?

It is in the following method:

org.apache.dubbo.common.timer.HashedWheelTimer.HashedWheelTimeout#cancel

If the HashedWheelTimeout cancel method is called, the task is canceled.

This method was mentioned in the previous drawing, and the logic is very clear, so THERE is no more explanation.

But notice where I underlined it: MpscLinkedQueue.

What is this?

This is an awesome lockless queue.

Dubbo cancelledTimeouts cancelledTimeouts cancelledTimeouts cancelledTimeouts cancelledTimeouts

What’s going on?

Because the comment here is from Netty, which uses MpscLinkedQueue.

Let me show you the difference between Netty and Dubbo here:

Therefore, the annotation here is misleading. If you have time, you can send Dubbo to PR for modification.

One more little detail.

Okay, so let’s scroll down to HashedWheelBucket bucket=wheel

At a glance, there’s nothing to say.

Retrieves the bucket with the specified index from the time wheel.

Mainly to see it below this line of code transferTimeoutsToBuckets ()

I’ll add a comment on each line:

So the core logic of this method is to dispatch the tasks waiting to be assigned to the specified bucket.

This answers the question I was left with when I was drawing: when do I put the tasks in the waiting queue onto the time wheel?

This is the time.

Next, analyze bucket.expiretimeouts (deadline).

This method is called by a bucket, which means it is ready to process the tasks in the list inside the bucket:

Finally, there is one line of code, tick++

Indicates that the current tick has been processed and the next time scale is ready.

Critical code analysis is done.

Read it again if you don’t understand it, but I suggest you read it yourself against the source code, and you’ll soon understand it.

In the future, when the interviewer asks about the time round, you can fight him or her for a round.

Why is it a round?

Because after you answer this time round, the interviewer will typically ask:

Well, that’s a good point. Why don’t you tell me more about hierarchical time cycles?

And you’re like, what, what the hell is a hierarchical time wheel?

Yeah, it’s my fault. I didn’t write it down. Next time. Next time.

But I can point you in the direction of kafka’s optimization of the time wheel. You’ll see it and clap.

Several related issues

Finally, there is a discussion about the Dubbo time wheel in issues:

Github.com/apache/dubb…

If you are interested, you can go and have a look.

It raises an interesting question:

Netty makes heavy use of HashedWheelTimer in 3.x, but in 4.1, we can see that Netty retains HashedWheelTimer but does not use it in its source code. But chose the ScheduledThreadPoolExecutor, don’t know what it means.

This question was answered by the Netty maintainer himself:

Github.com/netty/netty…

What he meant was that there was nothing wrong with the time wheel, I didn’t use it because we wanted to be on the same thread as the EventLoop for the channel.

In Netty, there was a guy who realized that the time wheel was no longer being used and tried to kill it:

I thought it was a tool. You keep it. It’s always useful.

In addition, the previous issue also mentioned another problem:

Github.com/apache/dubb…

This was also optimized after Dubbo introduced the time wheel.

Take a look, above is the optimized, below is the previous writing:

Start a thread in the background and loop through the collection over and over again:

This scheme can also achieve the requirements, but compared to the time wheel writing method, it is inferior.

Operation SAO, fast speed, force grid high.

One last word

Ok, see here, forward, look, like any arrangement, if you arrange it, I don’t mind. Writing is tiring and needs some positive feedback.

Here’s one for readers:

This article has been collected from personal blog, welcome to play.

www.whywhy.vip/