Start with a refactoring
It all started with a refactoring optimization.
Recently, I was reconstructing a routing function. Due to the complexity of routing and the change of requirements, I wanted to reconstruct it through the chain of responsibility mode. I happened to see relevant source code in Sentinel-Go during this period.
The biggest benefit of using the responsibility chain mode is that the routing capability can be flexibly inserted and removed for each request, such as:
This implementation would new out the entire chain of responsibility each time a request came in, anticipating that objects would be created and destroyed frequently.
Object pooling is not recommended in Java, and unless an object is created with a lot of effort, like a connection object, locking contention between threads is more expensive than directly allocating memory
Go, however, has a built-in Sync.pool cooperative scheduling model (GMP) that just circumvent lock contention.
We know that Go object pool is very awesome on the line, the specific principle is not the focus of this article, nor one or two words can explain clearly, have the opportunity to write an article in detail ~
But theory is theory. It’s a mule or a horse
Benchmark timeout!
Benchmark is definitely the best way to test this performance, so I’ve written two examples to compare directly New objects with pooled objects using Sync.pool.
func BenchmarkPooledObject(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
object := pool.Get().(*MyObject)
Consume(object)
// Put it back into the object pool
object.Reset()
pool.Put(object)
}
})
}
func BenchmarkNewObject(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
object := &MyObject{
Name: "hello",
Age: 2,
}
Consume(object)
}
})
}
Copy the code
It was these test parameters
go test -bench=. -cpu=4 -count=2 -benchtime=10s
Copy the code
It seems that the New object is still faster.
BenchmarkPooledObject-4 1000000000 6.25 ns/op
BenchmarkNewObject-4 1000000000 0.374 ns/op
Copy the code
So I thought, is there something wrong with the way I tested it?
Pooling reduces the cost of object creation and destruction, in large part by reducing the number of GC runs.
So I looked up when Go would trigger GC and got the following answer:
- Take the initiative to call
runtime.GC
To trigger - Passive trigger, divided into two types:
- If there is no trigger for more than 2 minutes, GC is forced to trigger
- GC is triggered when memory grows by a certain percentage, such as initializing the heap size to 4MB and triggering a GC when it grows by 25%, or 5MB
Obviously active trigger is not suitable, passive trigger can not confirm the growth ratio, that can only rely on 2 minutes forced trigger GC to achieve the purpose, so I lengthened the basic test time, changed to -benchtime=150s.
After executing, I made a cup of tea, went to the bathroom… After a long time, the execution was finally finished, and the result was this:
*** Test killed with quit: ran too long (11m0s).
Copy the code
The execution failed and lasted for 11 minutes
Go unit test and Benchmark have timeout times, default is 10 minutes, can be modified by -timeout.
But that’s not the point. The point is why did I set 150s and execute for 11 minutes?
There are no secrets under the source code
My gut tells me this is not easy, either I am wrong or Go is wrong. Fortunately Go is open source, there are no secrets under the source code.
After checking the Debug and code, this code is found first
func (b *B) runN(n int) {
benchmarkLock.Lock()
defer benchmarkLock.Unlock()
defer b.runCleanup(normalPanic)
// See here, help us GC
runtime.GC()
b.raceErrors = -race.Errors()
b.N = n
b.parallelism = 1
// Reset the timer
b.ResetTimer()
// Start the timer
b.StartTimer()
// Execute the benchmark method
b.benchFunc(b)
// Stop the timer
b.StopTimer()
b.previousN = n
b.previousDuration = b.duration
b.raceErrors += race.Errors()
if b.raceErrors > 0 {
b.Errorf("race detected during execution of benchmark")}}Copy the code
This code executes the Benchmark method we defined, where n is an attribute passed into the * testing.b structure of the method parameter we defined. It also takes a reasonable amount of time, counting only the execution time of the method we defined, i.e. -benchtime is just the execution time of the function, not the time of the Benchmark framework.
Even more reasonably, the framework also triggers a GC before executing the method, which means that only the garbage generated by our function during execution is counted in our Benchmark time.
But that has nothing to do with our failure to execute
For one thing, the total time of Benchmark execution must be greater than the time set by -benchTime.
Is it really so? In two experiments, I broke the rule:
Go test-bench =. -cpu=4 -count=1 -benchtime=5s benchmarkPooledobject-4 793896368 7.65 NS /op benchmarkNewObject-4 1000000000 0.378 ns/op PASS OK all-in-one/go-in-one/samples/object_pool 7.890sCopy the code
Go test-bench =. -CPU =4 -count=1 -benchtime=10s benchmarkPooledobject-4 1000000000 7.16 NS /op benchmarkNewObject-4 1000000000 0.376 ns/op PASS OK all-in-one/go-in-one/samples/object_pool 8.508sCopy the code
The second group is set to execute 10 seconds, but the total test time is only 8.508 seconds. It is strange that the second column of test results are executed 10 million times.
In doubt, Benchmark’s core code looks like this:
func (b *B) launch(a){.../ / label (1)
if b.benchTime.n > 0 {
// We already ran a single iteration in run1.
// If -benchtime=1x was requested, use that result.
if b.benchTime.n > 1 {
b.runN(b.benchTime.n)
}
} else {
d := b.benchTime.d
/ / note (2)
for n := int64(1); ! b.failed && b.duration < d && n <1e9; {
last := n
goalns := d.Nanoseconds()
prevIters := int64(b.N)
prevns := b.duration.Nanoseconds()
if prevns <= 0 {
prevns = 1
}
/ / mark (3)
n = goalns * prevIters / prevns
// Run more iterations than we think we'll need (1.2x).
/ / (4)
n += n / 5
// Don't grow too fast in case we had timing errors previously.
/ / mark (5)
n = min(n, 100*last)
// Be sure to run at least one more than last time.
/ / 6
n = max(n, last+1)
// Don't run more than 1e9 times. (This also keeps n in int range on 32 bit platforms.)
/ / indicate all landowners
n = min(n, 1e9)
/ / mark today
b.runN(int(n))
}
}
b.result = BenchmarkResult{b.N, b.duration, b.bytes, b.netAllocs, b.netBytes, b.extra}
}
Copy the code
The cores are numbered. Here’s the explanation:
Note ① : The Go Benchmark performs two kinds of pass parameters, the execution times and the execution time limit, I used the execution time, also can use -benchtime=1000x to indicate that 1000 tests are required.
Note (2) : Here is the condition to judge whether the execution time is sufficient when the execution time limit is set. It can be seen that in addition to the judgment of the time, there is also the limit of N < 1E9, that is, the maximum number of executions is 1E9, that is, 1000000000, which explains the above confusion. Why is the execution time smaller than the benchtime set? Because Go limits the maximum number of executions to 1E9, it’s not just a set number, there’s an upper limit.
Note ③ to ⑧: how do Go know when n is set to the benchtime? The answer is temptation!
N Perform the test for 1 time and estimate n based on the execution time. N = Goalns * prevIters/prevns, this is the estimation formula, Goalns is the set execution time (in nanoseconds), prevIters is the last execution time (in nanoseconds), prevns is the last execution time (in nanoseconds)
Calculate the number of times you need to execute based on the last execution time and the total execution time set by the target, something like this:
Target execution times = Target execution time/(last execution time/last execution times)
The following simplification can be obtained:
Target execution times = target execution time * last execution times/last execution time, this is not the above formula ~
Target times n calculation, the source code also do some other processing:
- Note 4: Let the actual number of executions be roughly the target number of executions
1.2
Bei, isn’t it a little embarrassing if we don’t hit our target time? Just run a little longer - Note 5: Also can not let n increase too fast, set a maximum increase range of 100 times, when n increases too fast, the test method must be short execution time, error may be large, slow growth to measure the true level
- ⑥ : n can’t stand still, how can also +1
- Note ⑦ : n has to be capped at 1e9 in order not to overflow on 32-bit systems
We’ve got a rough idea of how Go Benchmark works, but we haven’t found the answer yet.
Then I did breakpoint debugging with Benchmark.
The first is – benchtime = 10 s
It is found that the trial increase of n is 1,100, 10,000, 1000000, 100000000, 1000000000, and finally n is 1000000000
This indicates that our execution method takes a very short time and reaches the maximum number of executions.
Benchtime =150s, start normal:
N growth is 1,100, 10,000, 1000000, 100000000, but the latter one has a problem:
N became a negative number! Obviously this is spilling over.
N = Goalns * prevIters/prevns This formula, when the target execution time (GoalNS) is large and the test method execution time (prevns) is short, will result in n overflow!
What are the consequences of overflow?
N = min(n, 100*last) is negative, but n = Max (n, last+1) is guaranteed, so n is still increasing, but slowly, only +1 each time, so the n sequence of subsequent trials is 100000001,100000002, 100000003…
As a result, n is hard to reach the upper limit of 1e9, and the total execution time is hard to reach the expected time set, so the test program will run until time out!
This is probably a Bug?
The author who wrote this Benchamrk logic added this upper limit for the number of executions of 1e9, taking into account the overflow, but not the overflow of n in the calculation process.
I think it’s a Bug, but I’m not entirely sure.
There was no relevant Bug report on the Internet, so I submitted an issue and corresponding repair code to Go official. Due to the complicated and lengthy development process of Go, the official did not clearly indicate whether it was a Bug or something else when this paper was published.
If there is any official reply or other changes, I will let you know
Search attention wechat public number “bug catching master”, back-end technology sharing, architecture design, performance optimization, source code reading, problem solving, practice.