How to deal with panic and Recover in Golang

.

Since its release, Go has been known for its high performance and high concurrency. Because the standard library provides HTTP packages, even novice programmers can easily write HTTP services.

However, every coin has two sides. A language, with its merits to be proud of, must also hide a lot of holes. Beginners who don’t know these pits can easily fall into them. This series of blog posts will start with panic and Recover in THE Go language, and introduce the various pits that the author has stepped on and how to fill them.

First know Panic and Recover

panic

The word panic, in English, means panic, etc. Literally, in Go, it means a very serious problem, one that programmers fear most. Once there, it means the end of the program and exit. The panic keyword in Go language is mainly used to actively throw exceptions, similar to the throw keyword in Java and other languages.

recover

Recover is a word that means to recover in English. Literally, in Go, it stands for restoring the state of a program from a serious error to a normal state. Go language recover keyword is mainly used to catch exceptions, let the program back to the normal state, similar to Java languages such as try… The catch.

The author has 6 years of Linux system C language development experience. C does not have the concept of exception catching, there is no try… Catch, there is no panic or recover. However, the difference between an exception and an if Error then return approach is mainly in the depth of the function call stack. The diagram below:

In normal logic, the function call stack is backtracked one by one, while exception catching can be understood as a long jump in the program call stack. This is done in C through setjump and longjump functions. For example:

#include <setjmp.h>
#include <stdio.h>

static jmp_buf env;

double divide(double to, double by)
{
    if(by == 0)
    {
        longjmp(env, 1);
    }
    return to / by;
}

void test_divide(a)
{
    divide(2.0);
    printf("done\n");
}

int main(a)
{
    if (setjmp(env) == 0)
    {
        test_divide(a); }else
    {
        printf("Cannot / 0\n");
        return - 1;
    }
    return 0;
}

Copy the code

Due to a long jump, the normal execution flow was interrupted by a direct jump from divide to main. After compiling, the above code will print Cannot / 0 instead of done. Isn’t that amazing?

Try catch, Recover, setjump and other mechanisms will make the current state of the program (mainly the STACK pointer register SP of the CPU and the program counter PC, Go’s Recover relies on defer to maintain SP and PC) and is saved in a memory shared with Throw, Panic and Longjump. When there is an exception, the sp and PC register values saved before are extracted from the memory, and the function stack is directly moved back to the position pointed by SP, and the next instruction pointed by THE IP register is executed to restore the program from the abnormal state to the normal state.

Go deep into Panic and Recover

The source code

The source code for panic and recover can be found in the Go source code SRC /runtime/panic. Go, named gopanic and gorecover.

// Gopanic code, SRC/Runtime /panic. Go line 454

// The implementation of the predefined function panic
func gopanic(e interface{}) {
	gp := getg()
	ifgp.m.curg ! = gp {print("panic: ")
		printany(e)
		print("\n")
		throw("panic on system stack")}ifgp.m.mallocing ! =0 {
		print("panic: ")
		printany(e)
		print("\n")
		throw("panic during malloc")}ifgp.m.preemptoff ! ="" {
		print("panic: ")
		printany(e)
		print("\n")
		print("preempt off reason: ")
		print(gp.m.preemptoff)
		print("\n")
		throw("panic during preemptoff")}ifgp.m.locks ! =0 {
		print("panic: ")
		printany(e)
		print("\n")
		throw("panic holding locks")}var p _panic
	p.arg = e
	p.link = gp._panic
	gp._panic = (*_panic)(noescape(unsafe.Pointer(&p)))

	atomic.Xadd(&runningPanicDefers, 1)

	for {
		d := gp._defer
		if d == nil {
			break
		}

        // If the panic that triggered defer was triggered in the previous panic or Goexit defer, remove the previous defer from the list. The previous panic or Goexit will not continue.
		if d.started {
			ifd._panic ! =nil {
				d._panic.aborted = true
			}
			d._panic = nil
			d.fn = nil
			gp._defer = d.link
			freedefer(d)
			continue
		}

        // Mark defer as started, but keep it on the list so that if stack growth or garbage collection occurs before ReflectCall starts executing D.stone, Traceback can find and update the parameter frames of defer.
		d.started = true

        // Save panic that is performing defer. If a new panic is triggered in the defer function of that panic, it will find D in the list and mark D. _panic as aborted.
		d._panic = (*_panic)(noescape(unsafe.Pointer(&p)))

		p.argp = unsafe.Pointer(getargp(0))
		reflectcall(nil, unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))
		p.argp = nil

		// Reflectcall will not panic, remove d.
		ifgp._defer ! = d { throw("bad defer entry in panic")
		}
		d._panic = nil
		d.fn = nil
		gp._defer = d.link

		// GC() is used here to trigger stack shrinkage to test stack copy. Because it's test code, it's commented out. Reference stack_test. Go: TestStackPanic
		//GC()

		pc := d.pc
		sp := unsafe.Pointer(d.sp) // must be a pointer to be adjusted during stack replication
        The // defer handler is allocated dynamically and needs to be freed after execution. So, if defer is never executed (for example, if you keep creating defer in an infinite loop), it will cause a memory leak
		freedefer(d)
		if p.recovered {
			atomic.Xadd(&runningPanicDefers, - 1)

			gp._panic = p.link
            // Exit panic already marked, but still left in the g.anic list, remove them from the list.
			forgp._panic ! =nil && gp._panic.aborted {
				gp._panic = gp._panic.link
			}
			if gp._panic == nil { // must be done with signal
				gp.sig = 0
			}
			// Pass the recovering stack frame to recovery.
			gp.sigcode0 = uintptr(sp)
			gp.sigcode1 = pc
			mcall(recovery)
			throw("recovery failed") // McAll should not return}}// If all defer has been iterated, which means no recover (as mentioned earlier, McAll recovery does not return), proceed with subsequent panic processes, such as printing call stack information and error messages
	// Since it is not safe to call any user code after freezing the world, we call preprintpanics to call all the Error and String methods necessary to prepare the String output from Panic before startPanic.
	preprintpanics(gp._panic)

	fatalpanic(gp._panic) // Should not return* (*int) (nil) = 0      // Since Fatalpanic should not return, it is not normally executed here. If it does, this line of code will trigger panic
}
Copy the code

// Gorecover code, SRC /runtime/panic. Go line 585

// Implement the predefined function recover.
// Cannot split the stack because it needs to reliably find its caller's stack segment.
//
// TODO(rsc): Once we commit to CopyStackAlways,
// this doesn't need to be nosplit.
//go:nosplit
func gorecover(argp uintptr) interface{} {
	// When dealing with panic, the call to the Recover function must be placed in the top-level handler of defer.
	// p.argp is the argument pointer to the top-level delay function call, compared to argp passed by the caller, which can be recovered if it is consistent.
	gp := getg()
	p := gp._panic
	ifp ! =nil && !p.recovered && argp == uintptr(p.argp) {
		p.recovered = true
		return p.arg
	}
	return nil
}
Copy the code

From the function code, we can see that the main internal flow of Panic looks like this:

Gets where the current caller isg, that is,goroutine
Traverse and executegIn thedeferfunction
ifdeferThere are calls in the functionrecover“And found that it had happenedpanic, it willpanicMarked asrecovered
In a traversedeferProcess if the discovery has been marked asrecoveredIs extracteddeferSp and PC, saved ingIn the two status code fields.
callruntime.mcallCut tom->g0And to jump torecoveryFunction, which takes the previously obtainedgPass as a parameterrecoveryFunction.runtime.mcallThe code in go source codesrc/runtime/asm_xxx.s,xxxIs the platform type, such asamd64. The code is as follows:

// src/runtime/asm_amd64.S the first274Func McAll (fn func(*g)) // Switch to m->g0// Fn must never return. It should gogo(&g->sched) // To keep running g. runtime· McAll (SB), NOSPLIT, $0-8 MOVQ fn+0(FP), DI get_tls(CX) MOVQ g(CX), AX // save state in g->sched MOVQ 0(SP), BX // caller's PC
    MOVQ	BX, (g_sched+gobuf_pc)(AX)
    LEAQ	fn+0(FP), BX	// caller's SP MOVQ BX, (g_sched+gobuf_sp)(AX) MOVQ AX, (g_sched+gobuf_g)(AX) MOVQ BP, (g_sched+gobuf_bp)(AX) // switch to m->g0 & its stack, call fn MOVQ g(CX), BX MOVQ g_m(BX), BX MOVQ m_g0(BX), SI CMPQ SI, AX // if g == m-> JNE 3(PC), MOVQ $runtime· MOVQ SI, g(CX) // g = m->g0 MOVQ (g_sched+gobuf_sp)(SI), SP // sp = m->g0->sched.sp PUSHQ AX MOVQ DI, DX MOVQ 0(DI), DI CALL DI POPQ AX MOVQ $runtime·badmcall2(SB), AX JMP AX RETCopy the code

M ->g0 = m->g0 = m->g0 = m->g0 = m->g0

recoveryIn the delta function, thetagThe two status codes trace back to the stack pointer SP and restore the program counter PC to the scheduler, and callgogoreschedulinggThat will begRevert to callrecoverFunction position, goroutine continues execution. The code is as follows:

  // Gorecover code, SRC/Runtime /panic. Go line 637

// After panic, when recover is called in a delay function, the stack is retraced and execution continues as if the caller of the delay function returns normally.
  func recovery(gp *g) {
      // Info about defer passed in G struct.
      sp := gp.sigcode0
      pc := gp.sigcode1

      // The arguments to the delay function must already be stored on the stack.
      ifsp ! =0 && (sp < gp.stack.lo || gp.stack.hi < sp) {
          print("recover: ", hex(sp), " not in [", hex(gp.stack.lo), ",", hex(gp.stack.hi), "]\n")
          throw("bad recovery")}// Let the deferProc of the deferred function return again, this time 1. Calling the function jumps to the standard return end.
      gp.sched.sp = sp
      gp.sched.pc = pc
      gp.sched.lr = 0
      gp.sched.ret = 1
      gogo(&gp.sched)
  }
Copy the code

// src/runtime/asm_amd64.S the first274Func gogo(buf *gobuf) // Restore state from gobuf; longjmpThe TEXT runtime, gogo (SB),NOSPLIT.$16-8
    MOVQ	buf+0(FP), BX		// gobuf
    MOVQ	gobuf_g(BX), DX
    MOVQ	0(DX), CX// make sure g ! = nil get_tls(CX)
    MOVQ	DX, g(CX)
    MOVQ	gobuf_sp(BX), SP// Restore from gobufSPTo make the jump laterMOVQ	gobuf_ret(BX), AX
    MOVQ	gobuf_ctxt(BX), DX
    MOVQ	gobuf_bp(BX), BP
    MOVQ	$0, gobuf_sp(BX) // Here gobuf is cleaned up for garbage collection.MOVQ	$0, gobuf_ret(BX)
    MOVQ	$0, gobuf_ctxt(BX)
    MOVQ	$0, gobuf_bp(BX)
    MOVQ	gobuf_pc(BX), BX// Recover PC from gobuf for jumpJMP	BX
Copy the code

The above is the Go low-level exception processing process, simplified into three steps:

deferCall from a functionrecover
The triggerpanicAnd cut to theruntimeEnvironmental capturedeferCall therecover 的 gThe sp and PC
Back to thedefer 中 recoverThe processing logic behind it

What are the pits

As mentioned earlier, the panic function is mainly used to actively trigger exceptions. When we implemented the business code, in the program startup stage, if the resource initialization error, we can actively call Panic to immediately end the program. For starters, this is fine and easy to do.

However, the reality can be harsh — Go’s Runtime code calls panic at various points, which is a lot of digging for newcomers who don’t know the underlying implementation of Go. It is impossible to write robust Go code without familiarity with these pits.

Next, the author gives you a fine count of what pits.

Slice subscript out of bounds

This one is easier to understand. For statically typed languages, an array index out of bounds is a fatal error. The following code can be verified:

package main

import (
    "fmt"
)

func foo(a){
    defer func(a){
        if err := recover(a); err ! =nil {
            fmt.Println(err)
        }
    }()
    var bar = []int{1}
    fmt.Println(bar[1])}func main(a){
    foo()
    fmt.Println("exit")}Copy the code

Output:

runtime error: index out of range
exit
Copy the code

Because recover is used in the code, the program is restored and exit is printed.

If you comment out the recover lines, the following log will be printed:

panic: runtime error: index out of range

goroutine 1 [running]:
main.foo()
    /home/letian/work/go/src/test/test.go:14 +0x3e
main.main()
    /home/letian/work/go/src/test/test.go:18 +0x22
exit status 2
Copy the code

Access an uninitialized pointer or a nil pointer

This should make sense to anyone with c/ C ++ development experience. But this is the most common type of error for beginners who have never used Pointers before. The following code can be verified:

package main

import (
    "fmt"
)

func foo(a){
    defer func(a){
        if err := recover(a); err ! =nil {
            fmt.Println(err)
        }
    }()
    var bar *int
    fmt.Println(*bar)
}

func main(a){
    foo()
    fmt.Println("exit")}Copy the code

Output:

runtime error: invalid memory address or nil pointer dereference
exit
Copy the code

If you comment out the recover lines, it will print:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x4869ff]

goroutine 1 [running]:
main.foo()
    /home/letian/work/go/src/test/test.go:14 +0x3f
main.main()
    /home/letian/work/go/src/test/test.go:18 +0x22
exit status 2
Copy the code

Trying to go to something that’s already closechanSend data in

This is just learning how to use itchanBeginner’s mistakes. The following code can be verified:

package main

import (
    "fmt"
)

func foo(a){
    defer func(a){
        if err := recover(a); err ! =nil {
            fmt.Println(err)
        }
    }()
    var bar = make(chan int.1)
    close(bar)
    bar<- 1
}

func main(a){
    foo()
    fmt.Println("exit")}Copy the code

Output:

send on closed channel
exit
Copy the code

If recover is commented out, it prints:

panic: send on closed channel

goroutine 1 [running]:
main.foo()
    /home/letian/work/go/src/test/test.go:15 +0x83
main.main()
    /home/letian/work/go/src/test/test.go:19 +0x22
exit status 2
Copy the code

SRC /runtime/chan.go = chansend; SRC /runtime/chan.go = chansend;

// SRC /runtime/chan.go line 269

// If block is not nil, the protocol will not sleep, but returns if it cannot complete.
// When closing channels in sleep, you can wake up sleep with g.param == nil.
// We can easily loop and rerun the operation and see that it is closed.
func chansend(c *hchan, ep unsafe.Pointer, block bool, callerpc uintptr) bool {
    if c == nil {
        if! block {return false
        }
        gopark(nil.nil, waitReasonChanSendNilChan, traceEvGoStop, 2)
        throw("unreachable")}if debugChan {
        print("chansend: chan=", c, "\n")}if raceenabled {
        racereadpc(c.raceaddr(), callerpc, funcPC(chansend))
    }

    // Fast path: check for failed non-blocking operation without acquiring the lock.
    //
    // After observing that the channel is not closed, we observe that the channel is
    // not ready for sending. Each of these observations is a single word-sized read
    // (first c.closed and second c.recvq.first or c.qcount depending on kind of channel).
    // Because a closed channel cannot transition from 'ready for sending' to
    // 'not ready for sending', even if the channel is closed between the two observations,
    // they imply a moment between the two when the channel was both not yet closed
    // and not ready for sending. We behave as if we observed the channel at that moment,
    // and report that the send cannot proceed.
    //
    // It is okay if the reads are reordered here: if we observe that the channel is not
    // ready for sending and then observe that it is not closed, that implies that the
    // channel wasn't closed during the first observation.
    if! block && c.closed ==0 && ((c.dataqsiz == 0 && c.recvq.first == nil) ||
        (c.dataqsiz > 0 && c.qcount == c.dataqsiz)) {
        return false
    }

    var t0 int64
    if blockprofilerate > 0 {
        t0 = cputicks()
    }

    lock(&c.lock)

    ifc.closed ! =0 {
        unlock(&c.lock)
        panic(plainError("send on closed channel"))}ifsg := c.recvq.dequeue(); sg ! =nil {
        // Found a waiting receiver. We pass the value we want to send
        // directly to the receiver, bypassing the channel buffer (if any).
        send(c, sg, ep, func(a) { unlock(&c.lock) }, 3)
        return true
    }

    if c.qcount < c.dataqsiz {
        // Space is available in the channel buffer. Enqueue the element to send.
        qp := chanbuf(c, c.sendx)
        if raceenabled {
            raceacquire(qp)
            racerelease(qp)
        }
        typedmemmove(c.elemtype, qp, ep)
        c.sendx++
        if c.sendx == c.dataqsiz {
            c.sendx = 0
        }
        c.qcount++
        unlock(&c.lock)
        return true
    }

    if! block { unlock(&c.lock)return false
    }

    // Block on the channel. Some receiver will complete our operation for us.
    gp := getg()
    mysg := acquireSudog()
    mysg.releasetime = 0
    ift0 ! =0 {
        mysg.releasetime = - 1
    }
    // No stack splits between assigning elem and enqueuing mysg
    // on gp.waiting where copystack can find it.
    mysg.elem = ep
    mysg.waitlink = nil
    mysg.g = gp
    mysg.isSelect = false
    mysg.c = c
    gp.waiting = mysg
    gp.param = nil
    c.sendq.enqueue(mysg)
    goparkunlock(&c.lock, waitReasonChanSend, traceEvGoBlockSend, 3)
    // Ensure the value being sent is kept alive until the
    // receiver copies it out. The sudog has a pointer to the
    // stack object, but sudogs aren't considered as roots of the
    // stack tracer.
    KeepAlive(ep)

    // someone woke us up.
    ifmysg ! = gp.waiting { throw("G waiting list is corrupted")
    }
    gp.waiting = nil
    if gp.param == nil {
        if c.closed == 0 {
            throw("chansend: spurious wakeup")}panic(plainError("send on closed channel"))
    }
    gp.param = nil
    if mysg.releasetime > 0 {
        blockevent(mysg.releasetime-t0, 2)
    }
    mysg.c = nil
    releaseSudog(mysg)
    return true
}
Copy the code

Read and write the same map concurrently

For students who just learned concurrent programming, it is also easy to meet the problem of reading and writing map concurrently. The following code can be verified:

  package main

  import (
      "fmt"
  )

  func foo(a){
      defer func(a){
          if err := recover(a); err ! =nil {
              fmt.Println(err)
          }
      }()
      var bar = make(map[int]int)
      go func(a){
          defer func(a){
              if err := recover(a); err ! =nil {
                  fmt.Println(err)
              }
          }()
          for{
              _ = bar[1]}} ()for{
          bar[1] =1}}func main(a){
      foo()
      fmt.Println("exit")}Copy the code

Output:

fatal error: concurrent map read and map write goroutine 5 [running]: runtime.throw(0x4bd8b0, 0 x21)/home/letian. GVM/gos go1.12 / SRC/runtime/panic. Go: 617 + 0 x72 fp = 0 xc00004c780 sp = 0 = 0 x427f22 xc00004c750 PCS runtime.mapaccess1_fast64(0x49eaa0, 0xc000088180, 0x1, 0 xc0000260d8)/home/letian. GVM/gos/go1.12 / SRC/runtime/map_fast64 go: 21 + 0 x1a8 fp = 0 xc00004c7a8 sp = 0 xc00004c780 pc=0x40eb58 main.foo.func2(0xc000088180) /home/letian/work/go/src/test/test.go:21 +0x5c fp=0xc00004c7d8 sp=0xc00004c7a8 PC = 0 x48708c runtime. Goexit ()/home/letian /. GVM gos/go1.12 / SRC/runtime/asm_amd64. S: 1337 + 0 x1 xc00004c7e0 fp = 0 sp=0xc00004c7d8 pc=0x450e51 created by main.foo /home/letian/work/go/src/test/test.go:14 +0x68 goroutine 1 [runnable]: main.foo() /home/letian/work/go/src/test/test.go:25 +0x8b main.main() /home/letian/work/go/src/test/test.go:30 +0x22 exit status 2Copy the code

If you are careful, you will notice that the exit we printed at the end of the program does not appear in the output log, but directly prints the call stack. Look at the code in SRC/Runtime /map.go and you’ll find these lines:

  ifh.flags&hashWriting ! =0 {
      throw("concurrent map read and map write")}Copy the code

Unlike the previous cases, the exception thrown by a call to the throw function in the Runtime cannot be caught by recover in the business code, which is the most fatal. So, where a map is read and written concurrently, the map should be locked.

Types of assertions

Use type assertion pairsinterfaceIt is also easy to accidentally step on a pit when casting, and this pit is used immediatelyinterfaceFor a while people also tend to ignore the problem. The following code can be verified:

package main

import (
    "fmt"
)

func foo(a){
    defer func(a){
        if err := recover(a); err ! =nil {
            fmt.Println(err)
        }
    }()
    var i interface{} = "abc"
    _ = i.([]string)}func main(a){
    foo()
    fmt.Println("exit")}Copy the code

Output:

interface conversion: interface {} is string, not []string
exit
Copy the code

SRC /runtime/iface. Go

// panicdottypeE is called when doing an e.(T) conversion and the conversion fails.
// have = the dynamic type we have.
// want = the static type we're trying to convert to.
// iface = the static type we're converting from.
func panicdottypeE(have, want, iface *_type) {
    panic(&TypeAssertionError{iface, have, want, ""})}// panicdottypeI is called when doing an i.(T) conversion and the conversion fails.
// Same args as panicdottypeE, but "have" is the dynamic itab we have.
func panicdottypeI(have *itab, want, iface *_type) {
    var t *_type
    ifhave ! =nil {
        t = have._type
    }
    panicdottypeE(t, want, iface)
}
Copy the code

More and more panic

There are many more places to use panic in the Go library. You can search for panic in the source code.

Due to the limitation of space, this article will not introduce the techniques of pit filling. Thanks for reading!

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

How to deal with panic and Recover in Golang

.

First know Panic and Recover

Go deep into Panic and Recover

The source code

What are the pits

Slice subscript out of bounds

Access an uninitialized pointer or a nil pointer

Trying to go to something that’s already close`chan`Send data in

Read and write the same map concurrently

Types of assertions

More and more panic

Next time forecast

Recommend the article

Scan the code to pay attention to the public number

How to deal with panic and Recover in Golang

.

First know Panic and Recover

Go deep into Panic and Recover

The source code

What are the pits

Slice subscript out of bounds

Access an uninitialized pointer or a nil pointer

Trying to go to something that’s already closechanSend data in

Read and write the same map concurrently

Types of assertions

More and more panic

Next time forecast

Recommend the article

Scan the code to pay attention to the public number

Related Posts

Simple Kafka: Kafka without ZooKeeper

Spring Boot visual monitoring in one move!

Zero foundation for learning Python files

Trying to go to something that’s already close`chan`Send data in