Product | technology drops
The author | chun-hui cao
Preface: Syscall is the only means of language and system interaction. To understand Syscall in Go language, this article can help readers understand how Go language interacts with the system, and also understand some minor thoughts of syscall optimization in the bottom runtime, so as to have a deeper understanding of Go language.
— — –
▎ Reading Index
-
concept
-
The entrance
-
System call Management
-
The runtime of the SYSCALL
-
Interaction with scheduling
-
entersyscall
-
exitsyscallfast
-
exitsyscall
-
entersyscallblock
-
entersyscallblock_handoff
-
entersyscall_sysmon
-
entersyscall_gcwait
-
conclusion
▎ concept
▎ entrance
Syscall has the following entries in syscall/ ASM_linux_amd64.s.
1func Syscall(trap, a1, a2, a3 uintptr) (r1, r2 uintptr, err syscall.Errno)
2
3func Syscall6(trap, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2 uintptr, err syscall.Errno)
4
5func RawSyscall(trap, a1, a2, a3 uintptr) (r1, r2 uintptr, err syscall.Errno)
6
7func RawSyscall6(trap, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2 uintptr, err syscall.Errno)
8
Copy the code
The implementation of these functions is assembly. According to the Linux Syscall call specification, we only need to pass the parameters into the register in assembly and call the SYscall instruction to enter the kernel processing logic. After the execution of the system call, the return value is placed in RAX:
The only difference between Syscall and Syscall6 is that the parameters passed in are different:
1// func Syscall(trapint64, a1, a2, a3 uintptr) (r1, r2, err uintptr); 2 the text, the Syscall (SB), NOSPLIT,$0-56 3 CALL Runtime ·entersyscall(SB) 4 MOVQ A1 +8(FP), DI 5 MOVQ A2 +16(FP), SI 6 MOVQ A3 +24(FP), DX 7 MOVQ$0, R10
8 MOVQ $0, R8
9 MOVQ $0, R9
10 MOVQ trap+0(FP), AX // syscall Entry 11 syscall 12 // 0xFFffFFFFF001 is Linux MAX_ERRNO take reversal unsigned, http://lxr.free-electrons.com/source/include/linux/err.h#L17
13 CMPQ AX, $0xfffffffffffff001
14 JLS ok
15 MOVQ $-1, r1+32(FP)
16 MOVQ $020 CALL Runtime ·exitsyscall(SB) 20 RET 21OK: 22 MOVQ AX, r1+32(FP) 23 MOVQ DX, r2+40(FP) 24 MOVQ$0, err+48(FP) 25 CALL Runtime · exitSyscall (SB) 26 RET 27 27trap, A1, A2, A3, A4, A5, a6 uintptr) (R1, R2, Err uintptr)$0-80 30 CALL Runtime ·entersyscall(SB) 31 MOVQ A1 +8(FP), DI 32 MOVQ A2 +16(FP), SI 33 MOVQ A3 +24(FP), DX 34 MOVQ a4+32(FP), R10 35 MOVQ a5+40(FP), R8 36 MOVQ a6+48(FP), R9 37 MOVQtrap+0(FP), AX // syscall entry
38 SYSCALL
39 CMPQ AX, $0xfffffffffffff001
40 JLS ok6
41 MOVQ $-1, r1+56(FP)
42 MOVQ $030 CALL Runtime ·exitsyscall(SB) 46 RET 47ok6: 48 MOVQ AX, r1+56(FP) 49 MOVQ DX, r2+64(FP) 50 MOVQ$0, Err +72(FP) 51 CALL Runtime · exitSyscall (SB) 52 RETCopy the code
There’s not much difference between the two functions, so why not use one? Personally guess, Go function parameters are passed on the stack, probably to save a little stack space. I’m going to tell runtime before the normal Syscall operation, and then I’m going to do the Syscall operation Runtime ·entersyscall, and I’m going to call Runtime · exitSyscall when I exit.
1// func RawSyscall(trapUintptr) (R1, R2, Err uintptr) 2TEXT ·RawSyscall(SB),NOSPLIT,$0-56
3 MOVQ a1+8(FP), DI
4 MOVQ a2+16(FP), SI
5 MOVQ a3+24(FP), DX
6 MOVQ $0, R10
7 MOVQ $0, R8
8 MOVQ $0, R9
9 MOVQ trap+0(FP), AX // syscall entry
10 SYSCALL
11 CMPQ AX, $0xfffffffffffff001
12 JLS ok1
13 MOVQ $-1, r1+32(FP)
14 MOVQ $0, r2+40(FP)
15 NEGQ AX
16 MOVQ AX, err+48(FP)
17 RET
18ok1:
19 MOVQ AX, r1+32(FP)
20 MOVQ DX, r2+40(FP)
21 MOVQ $0, err+48(FP)
22 RET
23
24// func RawSyscall6(trap(R1, R2, Err Uintptr) 25TEXT ·RawSyscall6(SB),NOSPLIT,$0-80
26 MOVQ a1+8(FP), DI
27 MOVQ a2+16(FP), SI
28 MOVQ a3+24(FP), DX
29 MOVQ a4+32(FP), R10
30 MOVQ a5+40(FP), R8
31 MOVQ a6+48(FP), R9
32 MOVQ trap+0(FP), AX // syscall entry
33 SYSCALL
34 CMPQ AX, $0xfffffffffffff001
35 JLS ok2
36 MOVQ $-1, r1+56(FP)
37 MOVQ $0, r2+64(FP)
38 NEGQ AX
39 MOVQ AX, err+72(FP)
40 RET
41ok2:
42 MOVQ AX, r1+56(FP)
43 MOVQ DX, r2+64(FP)
44 MOVQ $0, err+72(FP)
45 RET
Copy the code
The difference between RawSyscall and Syscall is very subtle, except that the Runtime is not notified when it enters and exits Syscall, so there is no way for the Runtime to dispatch g’s m’s P. So if user code uses RawSyscall to make blocking system calls, it is possible to block other g’s.
Yes, if you call RawSyscall you may block other goroutines from running. The system monitor may start them up after a while, but I think there are cases where it won’t. I would say that Go programs should always call Syscall. RawSyscall exists to make it slightly more efficient to call system calls that never block, such as getpid. But it’s really an internal mechanism.
1// getTimeofday (TV *Timeval) (err uintptr) 2TEXT · getTimeofday (SB),NOSPLIT,$0-16
3 MOVQ tv+0(FP), DI
4 MOVQ $0, SI
5 MOVQ runtime·__vdso_gettimeofday_sym(SB), AX
6 CALL AX
7
8 CMPQ AX, $0xfffffffffffff001
9 JLS ok7
10 NEGQ AX
11 MOVQ AX, err+8(FP)
12 RET
13ok7:
14 MOVQ $0, err+8(FP)
15 RET
Copy the code
▎ system call management
First, the system call definition file:
1/syscall/syscall_linux.go
Copy the code
System calls can be divided into three categories:
-
Blocking system call
-
Non-blocking system calls
-
Wrapped system call
Blocking system calls are defined as follows:
1//sys Madvise(b []byte, advice int) (err error)
Copy the code
Non-blocking system calls:
1//sysnb EpollCreate(size int) (fd int, err error)
Copy the code
Then, based on these comments, the mksyscall.pl script generates a platform specific implementation. Mksyscall.pl is a Perl script for those who are interested.
Take a look at the results of blocking and non-blocking system calls:
1func Madvise(b []byte, advice int) (err error) {
2 var _p0 unsafe.Pointer
3 if len(b) > 0 {
4 _p0 = unsafe.Pointer(&b[0])
5 } else {
6 _p0 = unsafe.Pointer(&_zero)
7 }
8 _, _, e1 := Syscall(SYS_MADVISE, uintptr(_p0), uintptr(len(b)), uintptr(advice))
9 ife1 ! = 0 { 10 err = errnoErr(e1) 11 } 12return
13}
14
15func EpollCreate(size int) (fd int, err error) {
16 r0, _, e1 := RawSyscall(SYS_EPOLL_CREATE, uintptr(size), 0, 0)
17 fd = int(r0)
18 ife1 ! = 0 { 19 err = errnoErr(e1) 20 } 21return
22}
Copy the code
Obviously, the system call labeled sys uses Syscall or Syscall6, and the system call labeled SYSNb uses RawSyscall or RawSyscall6.
What about wrapped’s system call?
1func Rename(oldpath string, newpath string) (err error) {
2 return Renameat(_AT_FDCWD, oldpath, _AT_FDCWD, newpath)
3}
Copy the code
Maybe the name of the system call is not good, or there are too many arguments, so we’ll just wrap it up. Nothing special.
▎ SYSCALL in runtime
In addition to the blocking non-blocking and wrapped Syscall mentioned above, runtime defines some low-level syscall that are not exposed to the user.
The syscall library provided to the user, when used, puts goroutine and P into the Gsyscall and Psyscall states, respectively. But these Syscall encapsulated by the Runtime itself do not call enterSyscall and exitSyscall, whether or not they block. Although it is a “low-level” syscall.
However, the essence of Syscall is the same as that exposed to the user. This code is in runtime/sys_linux_amd64.s for a specific example:
1 text runtime, the write (SB), NOSPLIT,$0-28
2 MOVQ fd+0(FP), DI
3 MOVQ p+8(FP), SI
4 MOVL n+16(FP), DX
5 MOVL $SYS_write, AX
6 SYSCALL
7 CMPQ AX, $0xfffffffffffff001
8 JLS 2(PC)
9 MOVL $-1, AX
10 MOVL AX, ret+24(FP)
11 RET
12
13TEXT runtime·read(SB),NOSPLIT,$0-28
14 MOVL fd+0(FP), DI
15 MOVQ p+8(FP), SI
16 MOVL n+16(FP), DX
17 MOVL $SYS_read, AX
18 SYSCALL
19 CMPQ AX, $0xfffffffffffff001
20 JLS 2(PC)
21 MOVL $-1, AX
22 MOVL AX, ret+24(FP)
23 RET
Copy the code
Here is a list of all the additional syscAll defined by the Runtime:
1#define SYS_read 0
2#define SYS_write 1
3#define SYS_open 2
4#define SYS_close 3
5#define SYS_mmap 9
6#define SYS_munmap 11
7#define SYS_brk 12
8#define SYS_rt_sigaction 13
9#define SYS_rt_sigprocmask 14
10#define SYS_rt_sigreturn 15
11#define SYS_access 21
12#define SYS_sched_yield 24
13#define SYS_mincore 27
14#define SYS_madvise 28
15#define SYS_setittimer 38
16#define SYS_getpid 39
17#define SYS_socket 41
18#define SYS_connect 42
19#define SYS_clone 56
20#define SYS_exit 60
21#define SYS_kill 62
22#define SYS_fcntl 72
23#define SYS_getrlimit 97
24#define SYS_sigaltstack 131
25#define SYS_arch_prctl 158
26#define SYS_gettid 186
27#define SYS_tkill 200
28#define SYS_futex 202
29#define SYS_sched_getaffinity 204
30#define SYS_epoll_create 213
31#define SYS_exit_group 231
32#define SYS_epoll_wait 232
33#define SYS_epoll_ctl 233
34#define SYS_pselect6 270
35#define SYS_epoll_create1 291
Copy the code
These syscall are theoretically not stripped of P by the scheduler during execution, so the Goroutine will continue to execute after a successful execution, unlike the user’s Goroutine, which would queue p if stripped.
▎ and schedule interaction
Since I’m going to interact with the scheduler, kindly tell me it’s syscall: enterSyscall, I’m done: exitSyscall.
So by interaction I mean user code interacting with the scheduler when using the Syscall library. Syscall in runtime does not follow this process.
▎ entersyscall
1// Standard entry for syscall libraries and CGO calls 2//go:nosplit 3funcentersyscall() { 4 reentersyscall(getcallerpc(), getcallersp()) 5} 6 7//go:nosplit 8func reentersyscall(pc, Sp uintptr) {9 _g_ := getg() 10 11 // Need to disable the preemption of G 12 _g_ _g_. Stackguard0 = stackPreempt 16 // Set throwsplit in newstack if throwsplit is found to betrue17 // will directly crash 18 // the following code is newStack 19 //if thisg.m.curg.throwsplit {
20 // throw("runtime: stack split at bad time")
21 // }
22 _g_.throwsplit = true
23
24 // Leave SP around forGC and traceback. 25 // Save the scene, 26 Save (PC, sp) 27 _g_. Syscallsp = SP 28 _g_. Syscallpc = PC 29 CasgStatus (_g_, _Grunning, _Gsyscall) 30if _g_.syscallsp < _g_.stack.lo || _g_.stack.hi < _g_.syscallsp {
31 systemstack(func() {32print("entersyscall inconsistent ", hex(_g_.syscallsp), "[", hex(_g_.stack.lo), ",", hex(_g_.stack.hi), "]\n")
33 throw("entersyscall") 34}) 35} 36ifatomic.Load(&sched.sysmonwait) ! = 0 { 38 systemstack(entersyscall_sysmon) 39 save(pc, sp) 40 } 41 42if_g_.m.p.ptr().runSafePointFn ! = 0 { 43 // runSafePointFn may stack splitif run on this stack
44 systemstack(runSafePointFn)
45 save(pc, sp)
46 }
47
48 _g_.m.syscalltick = _g_.m.p.ptr().syscalltick
49 _g_.sysblocktraced = true
50 _g_.m.mcache = nil
51 _g_.m.p.ptr().m = 0
52 atomic.Store(&_g_.m.p.ptr().status, _Psyscall)
53 ifsched.gcwaiting ! = 0 { 54 systemstack(entersyscall_gcwait) 55 save(pc, sp) 56 } 57 58 _g_.m.locks-- 59}Copy the code
As you can see, G entering Syscall is guaranteed not to be preempted.
▎ exitsyscall
1// g has exited syscall 2// need to prepare g to run again on CPU 3// This function will only be called in syscall library, Syscall 4// does not require a write barrier. 6//go:nosplit 7//go: nowriteBarrierrec 8func exitSyscall (dummy int32) {9 _g_ := getg() 10 11 _g_.m.locks++ // see commentin entersyscall
12 if getcallersp(unsafe.Pointer(&dummy)) > _g_.syscallsp {
13 // throw calls print which may try to grow the stack,
14 // but throwsplit == true so the stack can not be grown;
15 // use systemstack to avoid that possible problem.
16 systemstack(func() {
17 throw("exitsyscall: syscall frame is no longer valid")
18 })
19 }
20
21 _g_.waitsince = 0
22 oldp := _g_.m.p.ptr()
23 if exitsyscallfast() {24if _g_.m.mcache == nil {
25 systemstack(func() {
26 throw("lost mcache") 27}) 28} 29 // There is currently p, Syscalltick++ 31 // change gstatus back to running 32 casgstatus(_g_, _Gsyscall, _Grunning) 33 34 // Garbage collection is not running (because our logic is executing) 35 // So it is safe to clean up syscallsp 36 _g_.syscallsp = 0 37 _g_.m.locks-- 38if_g_. Preempt {39 // prevent newStack from cleaning up the preempt flag 40 _g_. Stackguard0 = stackPreempt 41}else{42 / / or restore in entersyscall/entersyscallblock destroy normal _StackGuard _g_. 43 stackguard0 = _g_. Stack. Lo + _StackGuard 44} 45 _g_.throwsplit =false
46 returnSysexitticks = 0 50 _g_.m.ticks -- 51if _g_.m.mcache == nil {
56 systemstack(func() {
57 throw("lost mcache"61 // The scheduler returned, so we can clean up syscallSP information prepared for garbage collector 62 // during Syscall 63 // need to wait until Gosched returns, We are not sure if the garbage collector is running 64 _g_.syscallSP = 0 65 _g_.m.p.tr ().syscalltick++ 66 _g_.throwsplit =false
67}
Copy the code
Exitsyscallfast and exitSyscall0 are also called here.
▎ exitsyscallfast
1//go:nosplit
2func exitsyscallfast() bool {
3 _g_ := getg()
4
5 // Freezetheworld sets stopwait but does not retake P's. 6 if sched.stopwait == freezeStopWait { 7 _g_.m.mcache = nil 8 _g_.m.p = 0 9 return false 10 } 11 12 // Try to re-acquire the last P. 13 if _g_.m.p ! = 0 && _g_.m.p.ptr().status == _Psyscall && atomic.Cas(&_g_.m.p.ptr().status, _Psyscall, _Prunning) { 14 // There's a cpu for us, so we can run.
15 exitsyscallfast_reacquired()
16 return true
17 }
18
19 // Try to get any other idle P.
20 oldp := _g_.m.p.ptr()
21 _g_.m.mcache = nil
22 _g_.m.p = 0
23 ifsched.pidle ! = 0 { 24 var ok bool 25 systemstack(func() {
26 ok = exitsyscallfast_pidle()
27 })
28 if ok {
29 return true30} 31} 32return false
33}
Copy the code
In short, try to get a P to execute the logic after Syscall. If there’s no P for us anywhere, we’re going to go to ExitSyscall0.
1mcall(exitsyscall0)
Copy the code
When exitSyscall0 is called, it switches to the G0 stack.
▎ exitsyscall0
2// Set g state to runnable, //go: nowriteBarrierrec 4func exitSyscall0 (gp *g) {5 _g_ := getg() 6 7 casgstatus(gp, _Gsyscall, _Grunnable) 8 dropg() 9 lock(&sched.lock) 10 _p_ := pidleget() 11if_p_ == nil {12 globrunqput(gp) 14}else ifatomic.Load(&sched.sysmonwait) ! = 0 { 15 atomic.Store(&sched.sysmonwait, 0) 16 notewakeup(&sched.sysmonnote) 17 } 18 unlock(&sched.lock) 19if_p_ ! Acquirep (_p_) 22 execute(gp,false) // Never returns.
23 }
24 if_g_.m.lockedg ! Stoplockedm () 27 execute(gp,false) // Never returns.
28 }
29 stopm()
30 schedule() // Never returns.
31}
Copy the code
▎ entersyscallblock
I know I can block, so I just handed over my p.
1// Just like entersyscall, it will just hand over P, 2//go:nosplit 3func enterSyscallBlock (dummy Int32) {4 _g_ := getg() 5 6 _g_.m.locks++ // see commentin entersyscall
7 _g_.throwsplit = true
8 _g_.stackguard0 = stackPreempt // see comment in entersyscall
9 _g_.m.syscalltick = _g_.m.p.ptr().syscalltick
10 _g_.sysblocktraced = true
11 _g_.m.p.ptr().syscalltick++
12
13 // Leave SP around for GC and traceback.
14 pc := getcallerpc()
15 sp := getcallersp(unsafe.Pointer(&dummy))
16 save(pc, sp)
17 _g_.syscallsp = _g_.sched.sp
18 _g_.syscallpc = _g_.sched.pc
19 if _g_.syscallsp < _g_.stack.lo || _g_.stack.hi < _g_.syscallsp {
20 sp1 := sp
21 sp2 := _g_.sched.sp
22 sp3 := _g_.syscallsp
23 systemstack(func() {24print("entersyscallblock inconsistent ", hex(sp1), "", hex(sp2), "", hex(sp3), "[", hex(_g_.stack.lo), ",", hex(_g_.stack.hi), "]\n")
25 throw("entersyscallblock")
26 })
27 }
28 casgstatus(_g_, _Grunning, _Gsyscall)
29 if _g_.syscallsp < _g_.stack.lo || _g_.stack.hi < _g_.syscallsp {
30 systemstack(func() {31print("entersyscallblock inconsistent ", hex(sp), "", hex(_g_.sched.sp), "", hex(_g_.syscallsp), "[", hex(_g_.stack.lo), ",", hex(_g_.stack.hi), "]\n")
32 throw("entersyscallblock"37 Systemstack (enterSyscallBlock_handoff) 38 39 // Resavefor traceback during blocked call.
40 save(getcallerpc(), getcallersp(unsafe.Pointer(&dummy)))
41
42 _g_.m.locks--
43}
Copy the code
This function has only one caller, Notesleepg, which I won’t repeat here.
▎ entersyscallblock_handoff
1func entersyscallblock_handoff() {
2 handoffp(releasep())
3}
Copy the code
It’s easy.
▎ entersyscall_sysmon
1func entersyscall_sysmon() {
2 lock(&sched.lock)
3 ifatomic.Load(&sched.sysmonwait) ! = 0 { 4 atomic.Store(&sched.sysmonwait, 0) 5 notewakeup(&sched.sysmonnote) 6 } 7 unlock(&sched.lock) 8}Copy the code
▎ entersyscall_gcwait
1func entersyscall_gcwait() {
2 _g_ := getg()
3 _p_ := _g_.m.p.ptr()
4
5 lock(&sched.lock)
6 if sched.stopwait > 0 && atomic.Cas(&_p_.status, _Psyscall, _Pgcstop) {
7 _p_.syscalltick++
8 if sched.stopwait--; sched.stopwait == 0 {
9 notewakeup(&sched.stopnote)
10 }
11 }
12 unlock(&sched.lock)
13}
Copy the code
▎ summary
The runtime is notified of all system calls provided to the user, in the form of entersyscall or exitSyscall. If syscall blocks, the Runtime decides whether to release P for another M. Unbinding refers to the unbinding between M and P. If the binding is unbound, the g will be put into the execution queue RUNq when syscall returns.
At the same time, the Runtime retains the privilege of not getting my P removed while executing its logic, ensuring that any syscall used in Go’s “low-level” will be processed as soon as it returns.
Epollwait does not have the same privilege as syscall. epollwait.
▎ END
References are as follows
z.didi.cn/1HecgP
Xargin, open source enthusiast. Active on Github and various tech communities. A passion for tech confrontations. Author of the open source book Advanced Programming for Go