More dry goods, public concern: Qiya cloud storage

[toc]

Sync, the Go concurrency related library, has an interesting package Pool. Sync. Pool is an interesting library that implements clever functionality with very little code.

At first glance, the name Pool brings to mind pools, and element pooling is a common tool for performance optimization (the axes of performance optimization: concurrency, preprocessing, caching). For example, you can create a pool of 100 elements and then get the elements directly from the pool, eliminating the need for application and initialization and greatly improving performance. Releasing an element is also a way to drop it back into the pool without the overhead of actually releasing the element.

But a closer look at the sync.pool implementation turned out to be even more interesting than I expected. Sync.pool Aside from the most common idea of pooling to improve performance, the most important is to reduce GC. Often used to create expensive scenarios with object instances. Note that Pool is Goroutine concurrency safe.

Use the pose

Example Initialize Pool instance New

The first step is to create a Pool instance. The key is to configure the New method to declare the Pool element creation method.

bufferpool := &sync.Pool {
    New: func(a) interface {} {
        println("Create new instance")
        return struct{} {}}}Copy the code

Application Object Get

buffer := bufferPool.Get()
Copy the code

The Get method returns an object that already exists in the Pool. If not, it initializes an object by taking the slow path of the New method defined at the time of the initialization call (which is the initialization behavior defined in the first place).

Release object Put

bufferPool.Put(buffer)
Copy the code

After the object is used, the Put method declaration is called to Put the object back into the pool. Note that this call simply puts the object back into the pool. It is not clear when the object in the pool is actually released from the outside world.

You see, the Pool user interface is three interfaces, very simple, and is a universal Pool mode, for all object types can be used.

thinking

Why Pool instead of instantiating objects directly at run time?

Essential reason: Go memory frees are handled automatically by the Runtime, with GC.

Here’s an example:

package main

import (
	"fmt"
	"sync"
	"sync/atomic"
)

// Count the number of instances actually created
var numCalcsCreated int32

// Create instance function
func createBuffer(a) interface{} {
	// A very important point to note here. Atomic addition must be used here, otherwise there will be concurrency problems;
	atomic.AddInt32(&numCalcsCreated, 1)
	buffer := make([]byte.1024)
	return &buffer
}

func main(a) {
	// Create an instance
	bufferPool := &sync.Pool{
		New: createBuffer,
	}

	// Multiple Goroutine concurrent tests
	numWorkers := 1024 * 1024
	var wg sync.WaitGroup
	wg.Add(numWorkers)

	for i := 0; i < numWorkers; i++ {
		go func(a) {
			defer wg.Done()
			// Apply for a buffer instance
			buffer := bufferPool.Get()
			_ = buffer.(*[]byte)
			// Release a buffer instance
			defer bufferPool.Put(buffer)
		}()
	}
	wg.Wait()
	fmt.Printf("%d buffer objects were created.\n", numCalcsCreated)
}
Copy the code

The above example can be copied directly and run to see the console output:

➜ pool# go test_pool.go 3 buffer objects were created. ➜ pool# go test_pool.go 4 buffer objects were created.Copy the code

The program Go Run ran twice, once with a result of 3 and once with a result of 4. What is the cause of this?

First of all, this is a normal case. I don’t know if you have noticed that when creating a Pool instance, you only need to fill in the New function. You don’t declare or limit the size of the Pool at all. So, remember that the programmer, as the user, cannot make assumptions about the number of elements in the Pool.

Again, if I applied for the instance directly instead of using Pool, I would change the above code to just one line:

Add the following code:

// Apply for a buffer instance
buffer := bufferPool.Get()
Copy the code

Modified to:

// Apply for a buffer instance
buffer := createBuffer()
Copy the code

If we execute go run test_pool.go, what will we find?

➜  pool go run test_pool_1.go
1048576 buffer objects were created.
➜  pool go run test_pool_1.go
1048576 buffer objects were created.
Copy the code

Notice that there are two differences:

  1. Again, I run it twice, and I get the same result.
  2. The number of objects created is the same as the number of concurrent workers, which is equal to 1048576 (this is 1024*1024).

The reason is simple, because each time the createBuffer function is directly called to apply for buffer, there are 1048576 concurrent Worker calls, so the result will be 1048576.

In fact, another difference is that during a program run, the process allocation consumes a lot of memory. Because Go memory requisition is triggered by the programmer, collection is performed by the Go internal Runtime GC collector, which is an asynchronous operation. This irresponsible use of memory by the business can place a significant burden on the GC, which in turn affects overall program performance.

An analogy to reality

When a programmer drinks milk tea, he needs a straw, and then he throws away the straw, that’s plastic Garbage. Cleaning Lao Li (GC recycler) needs to keep up with the cleaning, now 1,048,576 programmers drink milk tea at the same time, each of them on the spot to drink a new straw, then throw away, immediately there is 1,048,576 plastic straw waste on the ground. Old Li, the janitor, is probably dead tired.

What if there was a recycling bin (sync.pool, for example) in a hidden corner, and when the programmer drank the milk tea, the straw was thrown into the recycling bin. When the next programmer wanted to use the straw, he reached into the bin to see if there was a tube? If there is, use it. If not, ask for a new straw. This will greatly reduce the number of new straws used ah, how nice.

And, in the extreme case, if everyone drinks milk tea fast enough to ensure that there is at least one used straw in the box at all times, 1,048,576 programmers estimate that one straw would be enough… (A little nauseous)

Return to the chase

This explains why there are only 3 or 4 with sync.Pool. But consider further: why is the output of sync.pool different between uses?

Because the rate of reuse is different. We cannot make any assumptions about the number of cache elements in the Pool. Again, if you go fast enough, you can actually serve 1048,576 concurrent Goroutines with just one element.

sync.PoolIs it thread-safe?

Sync. Pool is of course thread-safe. The official document clearly states:

A Pool is safe for use by multiple goroutines simultaneously.

But why am I mentioning it here?

Since sync.Pool only has its own Pool data structure that is concurrency safe, it is not necessarily the pool. New function that is thread-safe. The pool. New function can be called concurrently, and if the implementation inside the New function is not concurrency safe, this can be a problem.

If you are careful, you will notice that in my createBuffer implementation above, the numCalcsCreated count is atomic: atomic.addint32 (&numCalcsCreated, 1).

func createBuffer(a) interface{} {
	// A very important point to note here. Atomic addition must be used here, otherwise there will be concurrency problems;
	atomic.AddInt32(&numCalcsCreated, 1)
	buffer := make([]byte.1024)
	return &buffer
}
Copy the code

Since numCalcsCreated is a global variable, concurrent calls to pool. New (createBuffer) will result in a data race, so only atomic operations can ensure data correctness.

AddInt32(&numcalcscreated, 1); numCalcsCreated++; go run -race test_pool.go An alarm must be reported, similar to the following:

WARNING: DATA RACE
Read at 0x000001287538 by goroutine 10:

Previous write at 0x000001287538 by goroutine 7:

==================
==================
WARNING: DATA RACE
Read at 0x000001287538 by goroutine 9:
  main.createBuffer()

Copy the code

Essential reasons:Pool.NewFunctions may be called concurrently.

whysync.PoolNot suitable for things like socket long connections or database connection pools?

Because we can’t make any assumptions about the elements stored in sync.pool, the following things can happen:

  1. Pool Elements in the Pool can be released at any time. The release policy is completely managed by the Runtime.
  2. The element object obtained by Get may be newly created or may be previously occupied by a cache. Users can’t distinguish;
  3. Pool You cannot know the number of elements in the Pool.

Therefore, you can only use Pool correctly if your scenario meets the above assumptions. Sync.pool is essentially used to increase the reuse rate of temporary objects and reduce the GC burden. Highlight: Temporary objects. Therefore, resources with state, such as sockets, are not suitable for pools.

More dry goods, public concern: Qiya cloud storage

conclusion

  1. Sync.pool is designed to increase the reuse rate of temporary objects and reduce GC burden.
  2. Do not anticipate objects from pool. Get. They may be new or old (previously used and then Put in).
  3. You can’t make assumptions about the number of elements in the Pool, you can’t;
  4. Sync. Pool’s own Get and Put calls are concurrency safe,sync.NewPointer to the initialization function will be called concurrently, inside the security of only their own know;
  5. When you run out of an instance fetched from a Pool, be sure to call Put or the Pool won’t be able to reuse the instance. Normally this is done by defer;

Today, I first learned sync.Pool package from the use of posture and overall combing, and then made an in-depth analysis from the implementation principle. Please look forward to it.

More dry goods, public concern: Qiya cloud storage