origin
Recently, Go 1.15 was released, and I updated this version as soon as possible. After all, I had some confidence in the stability of Go, so I directly produced it in the company.
As a result, within a few minutes of being online, the OOM appeared, so Pprof looked at the heap and quickly rolled back to find that a chunk of memory that should have been freed at the end of a request was retained and growing, as shown in the figure (linkBufferNode) :
This time, only the Go version was updated, without any other changes, so I started testing locally, and found that it could be 100% reproduced locally.
The screening process
Looking at the Release Note of Go 1.15, I found two highly suspected things:
The binary size is reduced by 5% by removing some GC Data. 2. New memory allocation algorithm.
So change the Runtime, turn off the new memory allocation algorithm, switch back to the old one, and so on, the operation is like a tiger down, found that the problem is still not solved, the phenomenon still exists.
GODEBUG=”allocfreetrace=1 “, GODEBUG=”allocfreetrace=1 “, GODEBUG=”allocfreetrace=1” (Sad process omitted)
Finally, my gut tells me that this problem is probably related to the sync.Map change in Go 1.15 (don’t ask me why, it’s a gut feeling, I can’t tell).
The sample code
For the sake of explanation, I have written a minimum reproducible code, as follows:
package main
import (
"sync"
)
var sm sync.Map
func insertKeys() {
keys := make([]interface{}, 0, 10)
// Store some keys
for i := 0; i < 10; i++ {
v := make([]int, 1000)
keys = append(keys, &v)
sm.Store(keys[i], struct{}{})
}
// delete some keys, but not all keys
for i, k := range keys {
if i%2 == 0 {
continue
}
sm.Delete(k)
}
}
func shutdown() {
sm.Range(func(key, value interface{}) bool {
// do something to key
return true
})
}
func main() {
insertKeys()
// do something ...
shutdown()
}
Copy the code
Sync. Map changed in Go 1.15
In Go 1.15, sync.Map added a method LoadAndDelete, the specific issue is here: sync: add new Map method LoadAndDelete, CL is here: CL.
Why am I sure it is caused by this change? Simple: I reversed this change locally and the problem is gone. Turn it off…
Of course, it’s not that simple. You have to know why, so you start to see which piece of decai… (100,000 words omitted)
It turns out that the key code is this:
// LoadAndDelete deletes the value for a key, returning the previous value if any. // The loaded result reports whether the key was present. func (m *Map) LoadAndDelete(key interface{}) (value interface{}, loaded bool) { read, _ := m.read.Load().(readOnly) e, ok := read.m[key] if ! ok && read.amended { m.mu.Lock() read, _ = m.read.Load().(readOnly) e, ok = read.m[key] if ! ok && read.amended { e, ok = m.dirty[key] // Regardless of whether the entry was present, record a miss: this key // will take the slow path until the dirty map is promoted to the read // map. m.missLocked() } m.mu.Unlock() } if ok { return e.delete() } return nil, false } // Delete deletes the value for a key. func (m *Map) Delete(key interface{}) { m.LoadAndDelete(key) } func (e *entry) delete() (value interface{}, ok bool) { for { p := atomic.LoadPointer(&e.p) if p == nil || p == expunged { return nil, false } if atomic.CompareAndSwapPointer(&e.p, p, nil) { return *(*interface{})(p), true } } }Copy the code
In this code, we can see that when we Delete, we don’t actually Delete the key. Instead, we remove the entry from the key and set it to nil…
So, in our scenario, we put a connection in as a key, and the memory associated with that connection, such as the buffer, will never be freed…
So why was there no problem in Go 1.14? Here is the code for Go 1.14:
// Delete deletes the value for a key. func (m *Map) Delete(key interface{}) { read, _ := m.read.Load().(readOnly) e, ok := read.m[key] if ! ok && read.amended { m.mu.Lock() read, _ = m.read.Load().(readOnly) e, ok = read.m[key] if ! ok && read.amended { delete(m.dirty, key) } m.mu.Unlock() } if ok { e.delete() } }Copy the code
In Go 1.14, if the key is in dirty, it will be deleted. As it happens, we “misused” sync.Map. There was no read operation during our use, so all the keys were actually in the dirty, so when we called Delete, they were actually deleted.
Note that in any version of Go, once the key is upgraded to read, the key will never be removed until the miss reaches a certain value for the dirty to become read. In other words, in extreme cases, keys can leak.
conclusion
In Go <= 1.15, the keys in sync.Map are not removed in extreme cases, and memory leaks can result if a large object is placed in the key or if memory is associated with it.
I have raised an Issue with the Go official regarding this Issue, so far this behaviour definition is considered a bug (because it violates Go 1 compatibility promise and is different from the behaviour in 1.14). The PR has been fixed by @Changkun Ou, and the backport is in 1.15.1.
The issue of the read key not being deleted if the dirty is not promoted is currently a tradeoff. If there is a problem with a real-world program, issue it again and see if you can fix it.