The book continues to analyze the process of closing the Watch process on the Client side. First, let’s review the key roles and design concepts of Watch:
role | instructions |
---|---|
watcher | External interface Watcher implementation, focusing on the Watch() method |
watchGrpcStream | Bridges, managing internal GRPC connections, managing internal virtual streams, message reception, and distribution |
watchClient | The underlying GRPC connects to the client |
watcherStream | A virtual Stream that handles communication between the server and the client |
To analyze the shutdown process, let’s first look at what conditions might cause a shutdown:
1. xx.ctx.Done()
First, all roles have an inner loop, and all roles contain a CTX context. Therefore, CTX is the universal turn-off method. Generally speaking, the usage routines of CTX are as follows:
func main(a){
ctx, cancel = context.WithTimeout(ctx, client.cfg.DialTimeout)
go resource.Run(ctx)
defer func(a){
ifcancel ! =nil{
cancel()
}
}()
for{}}func(r *Resource) Run(ctx context.Context) {
for {
select {
case <-ctx.Done():
return
default:
// .. do somethings }}Copy the code
This way, when CTX expires by cancel() or timeout, the running child Goruntine will exit automatically. Its internal communication is also through Channel, I will not introduce too much here. See how Context is used in the Watch process.
The internal loop of each role in the Watch process basically maintains a similar structure to the above example code, which checks ctx.done () through select to see if it is not blocked, thus executing the logic of closing the loop.
2. <-w.errc
In watchGrpcStream, there is a case Err := < -w.rc path, where err is forwarded by the serveWatchClient loop when it receives an error from the ETCD server, but does not necessarily cause the Watch process to close. Depending on the error, it can be shut down or restarted automatically.
3. No more virtual streams in watchGrpcStream
When watchGrpcStream finds that all virtual streams have been shut down, it closes itself, ending the Watch process.
4. Call Close() on the user side
When the user actively calls Close(), the entire Watch process is of course also closed. This is also done by calling the global CTX cancel() method, so the logic is the same as in point 1.
Detailed analysis of closing process
Now that we’ve found all the shutdown scenarios, let’s take a closer look at how each character is turned off:
1. The closure of Watcher
Func (w *watcher) Close() (err error) {w.mlock () // Close all watchgrpcStreams for _, wgs := range streams { if werr := wgs.close(); werr ! Canceled Canceled = nil {err = werr}} cancel() Canceled = nil {err = werr}} cancel() Canceled = nil {err = werr}} cancel() Canceled = nil Canceled {err == context.Canceled {err = nil} return err}Copy the code
Closing watcher is easy because it doesn’t have a loop of its own. All it needs to do is notify watchGrpcStream that it needs to close, and it basically doesn’t have to do anything.
2. Close the watchClient
func (w *watchGrpcStream) serveWatchClient(wc pb.Watch_WatchClient) { for { resp, err := wc.Recv() if err ! = nil { select { case w.errc <- err: case <-w.donec: Return} select {case w.rec < -resp: case < -w.donc: WatchGrpcStream is donec (closed).Copy the code
The watchClient shutdown is also easy to understand. If an error is received, the loop will exit. Or find that watchGrpcStream is already done and exit yourself. Err := wc.recv () if the watchGrpcStream is stuck on the resp line, err := WC.recv () if the watchGrpcStream is done, how does it exit the loop?
WatchClient, err = w.emote. Watch(w.tx, w.allopts…) ; Resp, err := wc.recv () also returns resp = nil and err = Context.canceled as soon as the watchGrpcStream’s CTX is passed in.
3. Closure of watcherStream
func (w *watchGrpcStream) serveSubstream(ws *watcherStream, resumec chan struct{}) { ... // Close the process defer func() {if! Resuming {/ / tag into the closing process (closed) / / need watchGrpcStream can truly close watcherStream / / because the need to recycle it ws watchGrpcStream closing = true} / / // Close (ws. Donec) if (ws. Donec) if (ws. Output {// send watchGrpcStream to handle its own shutdown w.closingc < -ws} // Quantity of watcherStream -1 W.W.D. One ()}() emptyWr := &WatchResponse{} for { ... select { case outc <- *curWr: ... case wr, ok := <-ws.recvc: if ! Ok {// close(ws.recvc) // Also can let watchStream into the closing process return}... case <-w.ctx.Done(): return case <-ws.initReq.ctx.Done(): return case <-resumec: // The watchStream inner loop exits, but after the watchStream is restarted, the inner loop starts again.Copy the code
WatchStream is a watchStream that sends itself to watchGrpcStream to recycle resources. Restart is to close the internal loop, and when the restart is complete, it will restart again, the resources are unchanged.
WatchStream: watchStream: State flow
With watchRequest, we will create a watchStream, which was introduced in the previous article. The watchStream created is in the initialize state. The watchStream will be substream only after the ETCD server resP is received. If watchGrpcStream restarts the underlying GRPC connection, all substream watchstreams will be restored to the descendant state. Third, both the delta and the substream states can be closed,ctx.Done
I won’t introduce you,close(recvc)
WatchGrpcStream closes itself and passesclose(recvc)
Close all watchstreams inside it. 2. The ETCD server returns the flagCanceled
Response is also calledclose(recvc)
Close the watchStream
WatchGrpcStream recyles the watchStream resource:
// This code is captured from the run() method of watchGrpcStream
case ws := <-w.closingc:
// Analyze later
w.closeSubstream(ws)
When the ETCD server returns a Canceled response, it adds WS to the closing set
// Then close its recvc, in this case removing the WS from the closing.
delete(closing, ws)
// watchGrpcStream can exit if substreams and filesize are not available
// If watchGrpcStream exits, the ETCD server is not greeted (the underlying connection is gone).
if len(w.substreams)+len(w.resuming) == 0 {
return
}
// If watchGrpcStream itself does not exit, please be polite to the ETCD server
// The ETCD server is not closed.
ifws.id ! =- 1 {
// Add to unique set to prevent multiple exit messages.
cancelSet[ws.id] = struct{}{}
cr := &pb.WatchRequest_CancelRequest{
CancelRequest: &pb.WatchCancelRequest{
WatchId: ws.id,
},
}
req := &pb.WatchRequest{RequestUnion: cr}
iferr := wc.Send(req); err ! =nil{... }}}Copy the code
There are two data structures called closing and cancelSet, both of which are designed to prevent repeated closures when WS is being closed elsewhere. These details are no longer developed. The key is the watchStream state transition.
Reclaiming watchStream resources does three things
func (w *watchGrpcStream) closeSubstream(ws *watcherStream){...// 1. Send a close message to the user side to close the Channel
ifcloseErr := w.closeErr; closeErr ! =nil && ws.initReq.ctx.Err() == nil {
go w.sendCloseSubstream(ws, &WatchResponse{Canceled: true, closeErr: w.closeErr})
} else ifws.outc ! =nil {
close(ws.outc)
}
// 2. Remove ws from substreams
ifws.id ! =- 1 {
delete(w.substreams, ws.id)
return
}
// 3. Remove ws from eq
for i := range w.resuming {
if w.resuming[i] == ws {
w.resuming[i] = nil
return}}}Copy the code
3. Close watchGrpcStream
And finally, watchGrpcStream, which has the most complicated shutdown process. Because it also involves an automatic restart. Let’s first analyze the automatic restart process:
case err := <-w.errc:
// Automatic restart is only possible if an error is received from the ETCD server
if isHaltErr(w.ctx, err) || toErr(w.ctx, err) == v3rpc.ErrNoLeader {
// Cannot restart, serious error occurred
// Serious error generally refers to:
// 1. It is not an Unavailable error
// 2. This is not an Internal error
closeErr = err
return
}
// Reconnect a WatchClient, the original one has exited due to err
ifwc, closeErr = w.newWatchClient(); closeErr ! =nil {
return
}
// Since all ws becomes a scalar, find the ws at the head of the team and send out its request
// Restart the ws state flow
ifws := w.nextResume(); ws ! =nil {
iferr := wc.Send(ws.initReq.toPB()); err ! =nil {
w.lg.Debug("error when sending request", zap.Error(err))
}
}
// cancelSet is reset because all ws becomes a scalar
cancelSet = make(map[int64]struct{})
Copy the code
Let’s look at the internal logic of newWatchClient:
func (w *watchGrpcStream) newWatchClient(a) (pb.Watch_WatchClient, error) {
// Remember when we analyzed the watchStream inner loop
// Resumec? You can go back and look at it
close(w.resumec)
w.resumec = make(chan struct{})
// All WS internal loops exit after resumec is reset
// Not close, but wait for the loop to exit
w.joinSubstreams()
// change ws from substrems to scalar
for _, ws := range w.substreams {
ws.id = - 1
w.resuming = append(w.resuming, ws)
}
...
// There is a ws that cannot be converted to a scalar state
// The internal CTX expires (ctx.done)
// This is where the ws is removed
stopc := make(chan struct{})
donec := w.waitCancelSubstreams(stopc)
wc, err := w.openWatchClient()
close(stopc)
<-donec
// Now we can start the loop again for the rest of the ws
// The restart is complete
for _, ws := range w.resuming {
if ws.closing {
continue
}
ws.donec = make(chan struct{})
w.wg.Add(1)
go w.serveSubstream(ws, w.resumec)
}
iferr ! =nil {
return nil, v3rpc.Error(err)
}
// Re-listen on the underlying GRPC connection
go w.serveWatchClient(wc)
return wc, nil
}
Copy the code
Careful readers may notice that newWatchClient() does so much for a restart that it is not affected when it first runs. If you look closely, the first time it was run, before WS, a lot of the code was running on empty with no real logic.
Before we look at the watchGrpcStream shutdown process, we can pause to think about what we would do if we implemented the shutdown process based on all the information in this article.
I think it should be divided into the following steps:
- If there are errors, collect them and send them to the user via the watchStream channel
- Close WS inside the substream
- Shut down WS from within the CLAN
- Close the context so that objects that depend on it can be closed
Ok, let’s look at the actual core code for watchGrpcStream:
defer func(a) {
// Collect errors, which are carried to the user side when closeSubstream is used.
w.closeErr = closeErr
// Close ws within SubStreams
for _, ws := range w.substreams {
if_, ok := closing[ws]; ! ok {close(ws.recvc)
closing[ws] = struct{} {}}}// Close ws inside the scalar
for _, ws := range w.resuming {
if_, ok := closing[ws]; ws ! =nil && !ok {
close(ws.recvc)
closing[ws] = struct{} {}}}// Wait for the WS to shut down successfully
w.joinSubstreams()
// Recycle the resources of ws
for range closing {
w.closeSubstream(<-w.closingc)
}
w.wg.Wait()
// Close yourself
w.owner.closeStream(w)
}()
Copy the code
The core closing process is in defer, inside closeStream:
func (w *watcher) closeStream(wgs *watchGrpcStream) {
w.mu.Lock()
// Send donEC signal to tell yourself to shut down
close(wgs.donec)
/ / CTX shut down
wgs.cancel()
// Remove yourself from Watcher streams
ifw.streams ! =nil {
delete(w.streams, wgs.ctxKey)
}
w.mu.Unlock()
}
Copy the code
When comparing the actual shutdown process with the shutdown process I analyzed, I omitted that watchGrpcStream is cached for sharing and should be removed from the cache if it is shut down.
Watch Process Summary
OK, this is the end of the Watch process analysis. Before the end, let’s review the whole process and see what the essential design ideas can be used for us:
- For those who look at the Watch flow code for the first time, they are bound to be confused (as I was). I don’t know why there are so many roles. The data is forwarded to and fro within the company. After analysis, you should have a clear understanding. The second is to break down complex logic in order to clear division of labor. And we are in the process of analysis, also seized the right idea, that is to dismantle. Whether it is the breakdown analysis of watcher, watchGrpcStream, watchStream, etc., or the breakdown analysis of the start, listen, close, restart process. During analysis, focus only on information that is of interest to the analysis topic and ignore irrelevant code. This idea is feasible for source code analysis in any scenario.
- CTX ETCD code is the use of CTX to the essence, close without it, communication without it. In this sense, CTX is overkill if it is only used to pass context variables.
- Donec /resumc/closingc: Status \ isClosing int This representation is certainly possible, but in a multithreaded environment, concurrency problems are likely. We are looking at the code of ETCD. The way it represents state makes heavy use of Channel. This Channel does not send any data, it is just a signal, close(DONec) will be fired, thus indicating the state flow. And it’s concurrency safe.
There are other summaries that readers can post to the comments section for discussion.