This article first in my blog, if you feel useful, welcome to like collection, let more friends see.

Today, let’s talk a little bit about how Go prevents goroutine leaks.

An overview of the

Go’s concurrency model is different from other languages in that while it makes it easier to develop concurrent programs, goroutine leaks are a common problem if you don’t know how to use them. Although Goroutine threads are lightweight and consume very few resources, if they are not released and continue to create new coroutines, there is no doubt that there is a problem, and it will not be discovered until the program runs for a few days or more.

As for the problems described above, I think we can solve them from two aspects as follows:

One is prevention, and to do that, we need to know what code causes leaks and how to write the right code.

The second is monitoring. Although prevention reduces the probability of leaks, no one can say that they are not wrong. Therefore, we usually need some monitoring methods to further ensure the robustness of the program.

Next, I will introduce them from these two perspectives in two articles. Today, I will talk about the first point.

How to monitor leaks

This article focuses on the first point, but for the sake of the demonstration, I’ll start with one of the simplest ways to monitor. Get the number of goroutines currently running from Runtime.numgoroutine () to confirm if a leak has occurred. It’s so simple to use that I won’t write an example for it.

A simple example

Language-level concurrency support is one of Go’s great strengths, but it can also be easily abused. When we first started learning about Go concurrency, we often heard people say that Go concurrency is very simple, just call the function before the Go keyword to start the Goroutine, which is a unit of concurrency, but many people may only hear this sentence, and then appear code like the following:

package main

import (
    "fmt"
    "runtime"
    "time"
)

func sayHello(a) {
    for {
        fmt.Println("Hello gorotine")
        time.Sleep(time.Second)
    }
}

func main(a) {
    defer func(a) {
        fmt.Println("the number of goroutines: ", runtime.NumGoroutine())
    }()

    go sayHello()
    fmt.Println("Hello main")}Copy the code

If you’re familiar with Go, you can easily see the problem with this code. SayHello is an infinite loop and there is no exit mechanism, so there is no way to release the goroutine you created. We implemented defer at the beginning of the main function to print the number of goroutines currently running when the function exits. Unsurprisingly, this output looks like this:

the number of goroutines: 2
Copy the code

However, because the above program is not permanent, leakage is not a problem, the system will automatically reclaim runtime resources after the program exits. But if this code is executed in a resident service, such as an HTTP server, and sayHello is fired once every time it receives a request, your service will get closer and closer to crashing as time passes and the goroutine is not released each time.

This example is relatively simple, and I’m sure anyone who knows a little about Go concurrency will not make this mistake.

Classification of leakage

The previous example was leaked due to running an infinite loop in Goroutine. Next, I will analyze the various cases of leakage in terms of concurrent data synchronization. It simply falls into two categories, namely:

  • Leakage caused by channel
  • Leakage caused by traditional synchronization mechanism

Traditional synchronization mechanism mainly refers to shared memory synchronization mechanism, such as exclusive lock, shared lock, etc. Leaks in both cases are relatively common. Go Due to the existence of defer, the second situation is generally easier to avoid.

A leak caused by Chanel

Let’s start with channel. If you’ve read the official concurrent article, the translated version, you’ll see that the use of channel was accidentally leaked. Let’s summarize in detail what situations might lead to.

Send not receive

We know that senders are usually accompanied by a corresponding recipient. Ideally, we want the receiver to always receive all the data that is sent, so there won’t be any problems. But the reality is that once the receiver has an abnormal exit and stops receiving upstream data, the sender is blocked. This situation has been described in great detail in the previous article.

Sample code:

package main

import "time"

func gen(nums ...int) <-chan int {
    out := make(chan int)
    go func(a) {
        for _, n := range nums {
            out <- n
        }
        close(out)
    }()
    return out
}

func main(a) {
    defer func(a) {
        fmt.Println("the number of goroutines: ", runtime.NumGoroutine())
    }()

    // Set up the pipeline.
    out := gen(2.3)

    for n := range out {
        fmt.Println(n)               / / 2
        time.Sleep(5 * time.Second) // done thing may be interrupted
        if true { // if err ! = nil
            break}}}Copy the code

In the example, the sender sends data to the downstream through out chan, the main function receives the data, and the receiver usually does some specific processing based on the received data, instead of using Sleep. If an exception occurs during this time, the processing is interrupted and the loop exits. Goroutine started in the gen function does not exit.

How to solve it?

The main problem here is that when the receiver stops working, the sender doesn’t know and continues to stupidly send data downstream. Therefore, we need a mechanism to notify the sender. I’m just going to give you the answer instead of going incrementally. Go can send broadcast messages to all receivers by closing a channel.

Modified code:

package main

import "time"

func gen(done chan struct{}, nums ...int) <-chan int {
    out := make(chan int)
    go func(a) {
        defer close(out)
        for _, n := range nums {
            select {
            case out <- n:
            case <-done:
                return}}} ()return out
}

func main(a) {
    defer func(a) {
        time.Sleep(time.Second)
        fmt.Println("the number of goroutines: ", runtime.NumGoroutine())
    }()

    // Set up the pipeline.
    done := make(chan struct{})
    defer close(done)

    out := gen(done, 2.3)

    for n := range out {
        fmt.Println(n) / / 2
        time.Sleep(5 * time.Second) // done thing may be interrupted
        if true { // if err ! = nil
            break}}}Copy the code

The gen function realizes the simultaneous processing of two channels through select. When an exception occurs, the <-done branch is entered, implementing a Goroutine exit. In order to demonstrate the effect and ensure the smooth release of resources, exit waiting for a few seconds to ensure that the release is complete.

The following output is displayed:

the number of goroutines:  1
Copy the code

Only the main Goroutine now exists.

Receive not send

Sending without receiving blocks the sender, and receiving without sending blocks the receiver. Take a look at the sample code directly, as follows:

package main

func main(a) {
    defer func(a) {
        time.Sleep(time.Second)
        fmt.Println("the number of goroutines: ", runtime.NumGoroutine())
    }()

    var ch chan struct{}
    go func(a) {
        ch <- struct() {} {}}}Copy the code

The running result shows:

the number of goroutines:  2
Copy the code

Of course, we do not normally encounter such a silly situation. In real work cases, it is more likely that the sending is completed, but the sender does not close the channel, and the receiver naturally cannot know that the sending is finished, so the blocking occurs.

What’s the solution? Remember to close the channel after sending the message.

nil channel

Sending and receiving data to a nil channel will block. This can happen when we forget to initialize a channel when we define it.

Sample code:

func main(a) {
    defer func(a) {
        time.Sleep(time.Second)
        fmt.Println("the number of goroutines: ", runtime.NumGoroutine())
    }()

    var ch chan int
    go func(a) {
        <-ch
        // ch<-(1)}}Copy the code

Two notations: <-ch and ch< -1, indicating receive and send respectively, will block. If you want to block, a combination of a nil channel and a done channel to prevent main from exiting might be the way to go.

func main(a) {
	defer func(a) {
		time.Sleep(time.Second)
		fmt.Println("the number of goroutines: ", runtime.NumGoroutine())
	}()

	done := make(chan struct{})

	var ch chan int
	go func(a) {
		defer close(done)
	}()

	select {
	case <-ch:
	case <-done:
		return}}Copy the code

When execution completes in Goroutine, done is detected and main exits.

Real scene

A real scenario would certainly not be as simple as the one in this case, and might involve collaboration between multi-stage Goroutines, where a goroutine might be both a receiver and a sender. But at the end of the day, no matter what mode of use. It’s all about putting the basics together.

Traditional synchronization

While Go concurrent data delivery is generally recommended, there are some scenarios where it is clear that a traditional synchronization mechanism is more appropriate. The traditional synchronization mechanism provided in Go is mainly in the sync and Atomic packages. Next, I’ll focus on locks and Waitgroups that can lead to goroutine leaks.

Mutex

As with other languages, there are two types of locks in Go, exclusive and shared, and we won’t cover their use. We take the exclusive lock as an example for analysis.

The following is an example:

func main(a) {
    total := 0

    defer func(a) {
        time.Sleep(time.Second)
        fmt.Println("total: ", total)
        fmt.Println("the number of goroutines: ", runtime.NumGoroutine())
    }()

    var mutex sync.Mutex
    for i := 0; i < 2; i++ {
        go func(a) {
            mutex.Lock()
            total += 1(1)}}}Copy the code

The result is as follows:

total: 1
the number of goroutines: 2
Copy the code

This code starts two goroutines to add total. In order to prevent data contention, the calculation part is locked, but it is not unlocked in time. As a result, the goroutine I = 1 is blocked waiting for the goroutine I = 0 to release the lock. As you can see, there are two goroutines on exit, there is a leak, and total is 1.

How to solve it? Since Go has deferred, this problem is easy to solve, just remember to defer Unlock when locking.

The following is an example:

mutex.Lock()
defer mutext.Unlock()
Copy the code

All the other locks are similar to this one.

WaitGroup

Waitgroups differ from locks in that they are similar to semaphores in Linux and can implement waits for a set of Goroutine operations. When used, if the wrong number of tasks is set, it can also cause blocking, resulting in leaks.

As an example, we need to access multiple tables while developing a back-end interface. Since there are no dependencies between the data, we can access them concurrently, as shown below:

package main

import (
    "fmt"
    "runtime"
    "sync"
    "time"
)

func handle(a) {
    var wg sync.WaitGroup

    wg.Add(4)

    go func(a) {
        fmt.Println("Access table 1")
        wg.Done()
    }()

    go func(a) {
        fmt.Println("Access table 2")
        wg.Done()
    }()

    go func(a) {
        fmt.Println("Access table 3")
        wg.Done()
    }()

    wg.Wait()
}

func main(a) {
    defer func(a) {
        time.Sleep(time.Second)
        fmt.Println("the number of goroutines: ", runtime.NumGoroutine())
    }()

    go handle()
    time.Sleep(time.Second)
}
Copy the code

The result is as follows:

the number of goroutines: 2
Copy the code

There was a leak. At the beginning of the code, it defines a variable WG of type sync.waitgroup and sets the number of concurrent tasks to 4, but you can see from the example that there are only 3 concurrent tasks. So the final wG.wait () exit condition will never be satisfied and Handle will always block.

How to prevent this from happening?

My personal advice is to try not to set the total number of tasks at once, even if the number is very clear. This is because blocking may also occur between the beginning of multiple concurrent tasks. It is best to Add as much as possible at mission launch via WG.add (1).

The following is an example:

. wg.Add(1)
    go func(a) {
        fmt.Println("Access table 1")
        wg.Done()
    }()

    wg.Add(1)
    go func(a) {
        fmt.Println("Access table 2")
        wg.Done()
    }()

    wg.Add(1)
    go func(a) {
        fmt.Println("Access table 3")
        wg.Done()
    }()
    ...
Copy the code

conclusion

I’ve basically covered all the scenarios that I think could lead to a Goroutine leak. To sum up, anything that causes blocking, whether it’s an infinite loop, a channel block, or a lock wait, can cause leakage. How to prevent goroutine leaks, then, becomes how to prevent blocking. To further prevent leaks, some implementations implement timeout handling, actively releasing goroutines that take too long to process.

This article focuses on how to prevent Goroutine leaks from the point of view of writing the right code. In the next installment, I’ll show you how to implement better monitoring detection to help us find leaks that already exist in our current code.

The resources

Concurrency In Go

Goroutine leak

Leaking-Goroutines

Go Concurrency Patterns: Context

Go Concurrency Patterns: Pipelines and cancellation

make goroutine stay running after returning from function

Never start a goroutine without knowing how it will stop