Previous in the Tbox coroutine library added IOCP based IO processing, during the step of a lot of pit, here to do a simple record, save the time to forget, I do not understand their own code (= =)

A pit point

WSARecv/WSASend when lpNumberOfBytesRecv and overlap are set at the same time, if their IO operation returns success immediately, and lpNumberOfBytesRecv has obtained the actual NUMBER of IO processing bytes, But the IO event will also be put into complete queue, you need to call to obtain GetQueuedCompletionStatus.

Is the pit for a long time, before I had thought that if WASRecv has completed immediately, so there is no need to go through the GetQueuedCompletionStatus wait, I can quickly return to process the results directly, reduce unnecessary coroutines wait and switch.

However, after actual measurement, IT is found that this is not the case in real time. Even if I return from RECV this time, I will not wait any more. If the NEXT RECV is pending, I may wait for the IO event successfully processed last time, which is painful.

To get a better idea of how it works, I looked at the official documentation, which explains lpNumberOfBytesRecvd:

lpNumberOfBytesRecvd

A pointer to the number, in bytes, of data received by this call if the receive operation completes immediately.

Use NULL for this parameter if the lpOverlapped parameter is not NULL to avoid potentially erroneous results. This parameter can be NULL only if the lpOverlapped parameter is not NULL.
Copy the code

LpOverlapped indicates that there are potential problems if lpNumberOfBytesRecvd and lpOverlapped were set at the same time. (this is where lpOverlapped came in.) Try not to set lpNumberOfBytesRecvd, just NULL.

But this is not what I want. I still want to be able to quickly handle the situation where WSARecv returns immediately and successfully. There is no need to switch coroutines every time to wait for IO events.

One way to do this is to call WSARecv twice, try to read it without passing lpOverlapped, and then send it to the completion queue via lpOverlapped to wait for the event to complete.

Another way is to ignore every time GetQueuedCompletionStatus before has been successfully processed object and return time.

But these two kinds of I tried, are not ideal, processing is more complex, efficiency is not high, there is no other better way, I read the golang source code for IOCP processing, finally found a solution. (Sure enough, it’s Golang.)

Poll /fd_windows.go contains the following code and comments:

// This package uses the SetFileCompletionNotificationModes Windows
// API to skip calling GetQueuedCompletionStatus if an IO operation
// completes synchronously. There is a known bug where
// SetFileCompletionNotificationModes crashes on some systems (see
// https://support.microsoft.com/kb/2568167 for details).

var useSetFileCompletionNotificationModes bool // determines is SetFileCompletionNotificationModes is present and safe to use
Copy the code

Original golang is to use the API to set the iocp SetFileCompletionNotificationModes port in each GetQueuedCompletionStatus waiting for IO event, go directly to the internal oversight has successfully returns immediately IO events, That is, if WSARecv returns immediately and successfully, no MORE IO events will be queued.

This is great. It’s just what I want, and it’s easy to use.

SetFileCompletionNotificationModes(socket, FILE_SKIP_COMPLETION_PORT_ON_SUCCESS);
Copy the code

However, this interface is not supported by all Windows operating system versions, so it cannot be used on XP. Fortunately, there are not many Windows operating system versions now, so golang also has related implementation to check whether this interface is supported. You can refer to:

// checkSetFileCompletionNotificationModes verifies that // SetFileCompletionNotificationModes Windows API is present //  on the system and is safe to use. // See https://support.microsoft.com/kb/2568167 for details. func checkSetFileCompletionNotificationModes() { err := syscall.LoadSetFileCompletionNotificationModes() if err ! = nil { return } protos := [2]int32{syscall.IPPROTO_TCP, 0} var buf [32]syscall.WSAProtocolInfo len := uint32(unsafe.Sizeof(buf)) n, err := syscall.WSAEnumProtocols(&protos[0], &buf[0], &len) if err ! = nil { return } for i := int32(0); i < n; i++ { if buf[i].ServiceFlags1&syscall.XP1_IFS_HANDLES == 0 { return } } useSetFileCompletionNotificationModes = true }Copy the code

This did not elaborate, everybody can understand oneself look at the code, however, it seems that way through setting ignore SetFileCompletionNotificationModes IO event, have other questions to udp processing, specific I didn’t also verified, since no on its use udp golang, I’ll just do this for TCP for now, as you can see in the following comments:

if pollable && useSetFileCompletionNotificationModes { // We do not use events, so we can skip them always. flags := uint8(syscall.FILE_SKIP_SET_EVENT_ON_HANDLE) // It's not safe to skip completion notifications for UDP: // https://blogs.technet.com/b/winserverperformance/archive/2008/06/26/designing-applications-for-high-performance-part-iii .aspx if net == "tcp" { flags |= syscall.FILE_SKIP_COMPLETION_PORT_ON_SUCCESS } err := syscall.SetFileCompletionNotificationModes(fd.Sysfd, flags) if err == nil && flags&syscall.FILE_SKIP_COMPLETION_PORT_ON_SUCCESS ! = 0 { fd.skipSyncNotif = true } }Copy the code

Pit point 2

GetQueuedCompletionStatusEx if only wait to an IO event, nearly double the efficiency is slower than the GetQueuedCompletionStatus.

Because GetQueuedCompletionStatusEx can wait n IO events at the same time, every time can greatly reduce the number of calls, quickly handle multiple time object.

To have more than one event to finish at the same time, this call is better than using GetQueuedCompletionStatus efficiency a lot, but I do while on a pressure measurement in local, if every time only under the condition of complete an IO events, GetQueuedCompletionStatusEx efficiency was really bad, still be inferior to the GetQueuedCompletionStatus.

To do this, I made a slight optimization in the Tbox iocP processing:

/* we can use GetQueuedCompletionStatusEx() to increase performance, perhaps, 
 * but we may end up lowering perf if you max out only one I/O thread.
 */
tb_long_t wait = -1;
if (poller->lastwait_count > 1 && poller->func.GetQueuedCompletionStatusEx)
    wait = tb_poller_iocp_event_wait_ex(poller, func, timeout);
else wait = tb_poller_iocp_event_wait(poller, func, timeout);

// save the last wait count
poller->lastwait_count = wait;
Copy the code

I recorded the waiting time of the latest return number, if you are only latest IO events, then switch to the GetQueuedCompletionStatus to wait IO, if current IO events more, then switch to the GetQueuedCompletionStatusEx to deal with.

Now I have just dealt with it briefly, and the effect is good after the test. I can adjust the optimization strategy according to the actual effect when I have time later.

Point three

CancelIO can only be used to cancel an I/O event issued by the current thread. If you want to cancel an I/O event issued by another thread, you need to CancelIOEx.

Point four pit

Writing a good IOCP program, or is not easy, a variety of processing details of the nuclear attention is very many, here is not a list, and so I have time to add it, you can also through the comments, post their usual development IOCP program, often encountered some pit.