1. Basic description

1.1. Tap-win32 virtual network adapter

The TAP-Win32 virtual network card does not contain any actual hardware, but is simply a driver that contains a DHCP server program that responds to DHCP offer/ ACK/NAK packets. The parameters of the DHCP server for this driver are configurable. Tap-win32 driver is divided into three parts. First, it acts as a nic driver and NDIS intermediate driver interface and sets some callback functions. The second part is a DHCP server, but the function of DHCP is simplified, which is not compatible with NDIS interface. The relationship between the two is reflected in the callback function of the network card sending data in NDIS. The callback function of the network card sending data specifically processes DHCP packets and then directly responds to them. In the third part, tap-win32 implements a read/write/control file that is exported to user-mode programs such as OpenVPN as an interface.

1.2. Configure a virtual NIC using DHCP

Because the TCP/IP attribute of the Windows NIC contains the automatic obtain IP address option box, In addition, the DHCP operation of automatically obtaining IP addresses is bound with the DHCP client service of Windows (unlike Linux, which can run any OPEN source or closed source DHCP client that complies with THE DHCP protocol specifications and only obtain IP addresses, and then configure the IP address to the network adapter through ifconfig). To facilitate the configuration of the virtual NETWORK adapter without causing errors, the virtual network adapter of the OpenVPN client is configured automatically in most cases.

The process for configuring a virtual NIC in DHCP mode is as follows:

1. The DHCP server is configured on the OpenVPN client. The DHCP server is implemented in the tap-Win32 driver but does not interface with NDIS. OpenVPN Configures the DHCP server address. 2. Configure the IP address that the DHCP server in tap-Win32 assigns to the virtual NIC on the OpenVPN client. 3. When DHCP parameters are configured, OpenVPN starts another OpenVPN process using dhcp-internal and calls win32 APIS such as iprelease and ipreNew in this process. If you don't have DHCP parameters, you don't have to fork another process (the next process is done automatically by Windows and OpenVPN will not participate) 4. Check that The DHCP client service is started in Windows. The DHCP client service will initiate DISCOVER as the DHCP client and terminate the ACK of the DHCP server. 5. After receiving a proper ACK message from the DHCP server, the DHCP client service on Windows internally invokes the API to add an IP address for the virtual nicCopy the code

1.3. Configure the OpenVPN server

In standard configuration, IP-Win32 adaptive is adopted by default. That is, server 172.16.0.0 255.255.0.0 is used as the IP address pool based on dynamic (DHCP mode)

1.4. Client Configuration

The standard configuration, using DHCP, assumes that there is only one client, and the client will get 172.16.0.2.

1.5. Common faults

1.5.1. Symptom 1: DHCP is faulty

A large number of DHCP errors are recorded in system logs. One is that the VIRTUAL network adapter loses its IP address, that is, the renewal of the lease fails halfway, and the other is error 10049.

1.5.2. Symptom 2: Routing is faulty

The OpenVPN server (192.168.81.28:1194) connected from the Intranet is disconnected and reconnected every minute. Even if DHCP is not used, the OpenVPN server cannot be pinged from its virtual NIC IP address (172.16.0.1). The OpenVPN server (128.42.53.17:1194) connected to the external network is running properly, except for DHCP.

2. Fault recurrence and analysis

2.1. Symptom 1

2.1.1. Fault recurrence

— IP-win32 dynamic 0 30 After the client restarts, wait 30 seconds to capture network data packets, and ipconfig /all to check the network configuration:

        Connection-specific DNS Suffix  . :
        Description . . . . . . . . . . . : VPN-Win32 Adapter Versio Physical Address. . . . . . . . . : 00-FF-40-58-BC-FE Dhcp Enabled. . . . . . . . . . . : Yes Autoconfiguration Enabled . . . . : Yes IP Address. . . . . . . . . . . . : 172.16.0.2 Subnet Mask........... : 255.255.0.0 Default Gateway......... : DHCP Server........... : 172.16.0.0 NetBIOS over Tcpip........ : Disabled Lease Grant.......... : 20 April 2011 17:24:49 Lease Expires.......... : 20 April 2011 17:25:19Copy the code

We found that the DHCP server address was a network address: 172.16.0.0. Is this address accessible? See analysis for details.

After 30 seconds, the IP address can be successfully obtained. However, after 90 seconds, the IP address cannot be obtained. Therefore, the fault occurs easily.

2.1.2. Fault analysis

2.1.2.1. On the phenomenon itself

Analyzing the packet capture results, it is found that discover initiated by the DHCP client for the first time can correctly end with ACK, and then discover again 30 seconds later. In the process, request is directly broadcast with the broadcast address as the purpose, which is unreasonable. Therefore, in THE DHCP protocol, the destination IP address of the previously discovered DHCP server is used as the destination IP address of the DHCP server when the lease is renewed before half of the lease. Ping 172.16.0.0 (172.16.0.0)

Destination specified is invalid.
Copy the code

Error code 10049, Microsoft said this destination address is not caused by the wrong, such as the destination address is a subnet number (all bits after the mask) all 0 network address.

But the 172.16.0.2 address on the client is actually the address assigned by 172.16.0.0.

Roughly four stages: original DHCP (dynamic host configuration protocol) to discover/offer/request/ack,

In the first phase, the source/destination is 0.0.0.0/255.255.255.255,

The second phase, the source/destination is along / 255.255.255.255,

The third stage is 0.0.0.0/255.255.255.255,

The fourth stage is 172.16.0.0,

None of the phases has a destination address of 172.16.0.0, so the Windows DHCP client server gets the IP without error, but after a certain period of renewal, it accesses 172.16.0.0 directly. See the DHCP specification for details), so the Windows DHCP client service gets a 10049 error (every time you click the “fix” button on virtual card status, it blocks the renewed IP address and then displays a message in the system event viewer that says “This requested address is invalid in its context.” Warning message event, 10049, that is, when the DHCP client service renewed the IP address of the virtual network card, it tried to access 0.0 directly, error), so it tried to rediscover the DHCP server, that is, directly request and rediscover with the broadcast address as the destination address. When the DHCP server in tap-win32 receives the request message, it only gives the client three times to request the IP address and the virtual nic to be inconsistent. For details, see the tap-Win32 driver code (ProcessDHCP function in Tapdrvr. c of OpenVPN) :

if(msg_type == DHCPREQUEST && ((dhcp->ciaddr && dhcp->ciaddr ! = p_Adapter->m_dhcp_addr) || ! P_Adapter - > m_dhcp_received_discover | | p_Adapter - > m_dhcp_bad_requests > = 3) 3) / / here in the source code for the macro definition SendDHCPMsg (p_Adapter, DHCPNAK, eth, ip, udp, dhcp);else
    SendDHCPMsg (p_Adapter,
         (msg_type == DHCPDISCOVER ? DHCPOFFER : DHCPACK),
         eth, ip, udp, dhcp);
  if (msg_type == DHCPDISCOVER)
    p_Adapter->m_dhcp_received_discover = TRUE;
 if(msg_type==DHCPREQUEST&&dhcp->ciaddr! =p_Adapter->m_dhcp_addr) ++p_Adapter->m_dhcp_bad_requests;Copy the code

The DHCP server in Windows also loses any information about the DHCP server that was discovered for the first time. If you run the ipconfig /all command to check the IP address of the DHCP server, the IP address of the DHCP server is 0.0.0.0. Since DHCP servers in tap-Win32 are no longer allowed to be requested, there is no way for OpenVPN clients to get IP addresses except by restarting.

2.1.2.2. Error address 10049 returned

The fault was caused by 172.16.0.0 being configured as a DHCP server address, but this address could not be accessed as a destination IP address. It may have been identified as a network address rather than a host address. However, ping the network address in Windows may not return 10049 error, my machine is Windows XP SP2, after testing found the following situation:

  1. Add 172.16.0.3/16 to the physical nic, ping 172.16.0.0, 10049 error;

  2. However, if an IP address 17.16.0.3/16 is added, ping 17.16.0.3, 10049 fails. If the OpenVPN client is started, ping 17.16.0.0 times out again, at least the link is open.

  3. However, if you add a 17.110.0.3/16 address, ping 17.110.0.0, 10049 error, after the OpenVPN client is started, ping 17.110.0.0 again, still 10049 error. There is no regular pattern, at first I thought that Windows reserved 172.16 private network segment, but there is no regular pattern in the non-private network segment, Windows socket has many vendors (such as Microsoft official or some antivirus vendors) LSP/BSP/NSP. Winsock is spI-based rather than system-call interface directly into the kernel, so it can go through a lot of filtering in user mode, making it hard to debug what’s going on.

These delicate situations are described and analysed in more detail below:

Case 1:

  1. Add 17.16.0.3/16, ping 17.16.0.0, 10049 error, and delete the IP address for the physical NIC.

  2. If the OpenVPN client is pinged to 17.16.0.0, the DHCP server address is 17.16.0.2. If the OpenVPN client is pinged to 17.16.0.0, the DHCP server address is 17.16.0.2. 172.16.0.0, however, is different. See case 3.

  3. Stop the OpenVPN client, add 17.16.0.3/16, ping 17.16.0.0 again, and the OpenVPN client times out.

Situation 2:

  1. Add 17.16.0.3/16, ping 17.16.0.0, 10049 error, and delete the IP address for the physical NIC.

  2. Start the OpenVPN client, set 17.16.0.254 as the DHCP server address, and first allocate 17.16.0.2 to the virtual network adapter (the network adapter will not be disconnected even if the network adapter is renewed).

  3. Stop the OpenVPN client, add 17.16.0.3/16 again, ping 17.16.0.0 again, 10049 error is still displayed.

Analysis:

The result is that the DHCP client knew about the existence of the 17.16.0.0 DHCP server address (17.16.0.2 was allocated for the first DHCP request), and it somehow remembered the address (in the Winsock directory). So Windows does not treat it as an “inaccessible” address when ping 17.16.0.0. In case 2, the address 17.16.0.0 is never found in any way, so Winsock will treat it as “inaccessible” and return 10049. To verify the guess, after case 1, stop the DHCP client and disable the virtual network card. Run the netsh winsock reset command, restart the machine, ping 17.16.0.0 again, and return 10049 error code.

Case 3:

  1. Add 172.16.0.3/16, ping 172.16.0.0, 10049 to the physical NIC, and delete the IP address.

  2. Start the OpenVPN client, use 172.16.0.0 as the DHCP server address, and first allocate 172.16.0.2 to the virtual network card (after a period of time, the virtual network card will be disconnected due to renewal, and the difference is only the IP address. Ping 172.16.0.0 will return 10049 before there is no fault.

  3. Stop the OpenVPN client, add 172.16.0.3/16 again, ping 17.16.0.0 again, still 10049.

Situation 4:

The difference with case 3 is that the company computer is changed into the home computer, which is also pirated Windows XP SP2. The ping of the network IP address 172.16.0.0 is not 10049 anymore, but timed out. In addition, the OpenVPN client can renew the DHCP server address using 172.16.0.0.

Analysis of two: Microsoft does not disclose the details of its WinSock implementation, so it is difficult to know how it filters IP addresses. If you start OpenVPN on your home PC and access 172.16.0.0, it will not be 10049. If you start OpenVPN at home, it will be 10049 at work. In the company 17.16.0.0 will not be 10049, 17.110.0.0 will be 10049, and 17.161.0.0 will be 10049, so there is no need to waste time on this issue.

2.1.3. Analysis and summary

Because the OepnVPN server does not push the Windows client IP configuration mode, and the client does not explicitly configure the IP configuration mode, so the dynamic mode is used to obtain the IP, and the TAP-Win32 driver internal implementation of a DHCP server, The address must be explicitly set by OpenVPN. By default, the OpenVPN client will set the address to 0.0. When the virtual network adapter configured to automatically obtain IP, DHCP client service Windows will serve as a DHCP client for the virtual network adapter to obtain IP address, and then configure the IP addresses, DHCP protocol discover/offer/request/ack is no problem, everything is normal, Discover will get a response from the tap-Win32 driver’s internal DHCP server, sending an offer… However, until the lease time half or for some reason (such as manually renew the contract, or the sleep/wake) need to renew the contract, some 0.0 as the destination IP address of access to the DHCP server will return a 10049 error, after the test, the “something” here is not the same in different editions of Windows (in the development of my phone, Along / 172.161.0.0/17.110.0.0… 172.17.0.0/17.16.0.0 is available, and tests on home machines are different…) In this way, the DHCP client service of Windows will never be able to contact the server whose IP address ends in 0.0 that was originally leased to it. When the lease time is x (according to the DHCP implementation and field configuration), The Windows DHCP client service automatically loses the IP address of the virtual network adapter’s IP address and reinitiates the DISCOVER process. However, the tap-Win32 driver only gives three opportunities for the client address in the DHCP Request packet to be inconsistent with the OpenVPN configured address. In other words, the number of DISCOVER operations is limited. After the number of DISCOVER operations expires, according to tap-Win32 implementation logic, the virtual network adapter cannot obtain IP addresses from DHCP server 0.0 any more, and an error is reported in Windows system logs. In this case, the OpenVPN client can only be restarted

2.2. Symptom 2

2.2.1. Fault recurrence

If some machines have problems on the Intranet, there are no problems with other machines. For example, there are no problems with wireless connections.

2.2.2. Fault analysis

  1. The IP address of the virtual network adapter of the OpenVPN server cannot be pinged. Check whether the physical network adapter of the OpenVPN server can be pinged. In this case, 192.168.81.28 cannot be pinged.

  2. The OpenVPN server logs and OpenVPN client logs show that the server continues to send OpenVPN-ping messages to the client. After the keepAlive time expires, the server sends a client restart signal, and the client reconnects.

  3. After starting the OpenVPN client, run route print to check the routing table of the machine where the client is located. It is found that there are two routes on the network segment 192.168.1.0/24 on the Intranet, one of which uses the physical nic as the outlet, and the other uses the virtual NIC as the outlet. To access 81.28, 1.254 must be used. Access to 1.254 must be through a physical nic.

  4. After checking the OpenVPN server configuration, it is found that the route on network segment 40 is pushed down. Therefore, the OpenVPN client on network segment 192.168.1.0/24 cannot communicate with hosts outside network segment 192.168.1.0/24 after connecting to the OpenVPN server.

2.2.3. Analysis and summary

  1. This problem is subtle for two reasons. First, the OpenVPN server communicates with the OpenVPN client using UDP, which does not need to be confirmed. Therefore, the OpenVPN server packets sent from 81.28 can be received by the OpenVPN client on the 40 network segments, and the other way around. If TCP is used, it is quickly disconnected. Second, when testing, you just hang the client there without any data being transferred. If you transfer data, you will immediately find that the data cannot be transmitted.

  2. The route pushed by the OpenVPN server must be different from the route pushed by the OpenVPN client. Therefore, pay attention to the following WARNING messages when analyzing logs: WARNING: Potential route subnet configures conflict between local LAN 192.168.1.0/255.255.255.0 and remote VPN 192.168.1.0/255.255.255.0

Problem solving

3.1. Solution of routing problems

Make sure there are no routing table conflicts. In addition to routing tables, check the route cache. The route cache on Linux can be viewed using Route -c, but not on Windows.

3.2. OpenVPN Configuration Parameters

OpenVPN has a parameter for Windows clients: IP-win32. This parameter has been ignored for a long time, and in fact this omission caused a problem that took a long time to solve. We generally do not configure this parameter, so The Adaptive mode will be used by OpenVPN. First, dynamic will be used to try, while dynamic cannot be matched with any parameter in adaptive mode. In fact, there are two parameters that can be set in Dynamic mode:

dynamic  [offset]  [lease-time]
Copy the code

The first parameter offset affects the IP address of the DHCP server. In OpenVPN server mode, assume that the address pool is X.Y. 0.0/16, then the CLIENT DHCP server IP address is: IP = 0.0 && (x.y. offset = = 0) | | (x.y. 0. Offset && offset > 0) | | (255.255 + offset x.y. && offset < 0)

3.3. Dynamic solution

OpenVPN does not assume that Windows is an inaccessible network address, nor does OpenVPN assume any *SP filtering behavior of Winsock. Therefore, using 0.0 as the DHCP server address makes sense for OpenVPN, but on Windows machines the above strange problems can occur. If offset is not 0, note that the IP address of the DHCP server cannot be the same as the IP address assigned to the virtual network card of the OpenVPN client. For example, if offset is 2, the first connected client will get the address X.Y. 0.2, and the DHCP server will also get the address. This causes conflicts when initializing the NETWORK adapter IP address, so be aware of this. It is recommended to use -1 for offset, so that the IP address of the DHCP server will be X.Y. 255.254. In normal cases, it is difficult to use this IP address unless it is specially assigned.

3.4. Solve the problem in non-dynamic mode

Configure the IP address in netsh, IPAPI, or manual mode. The netsh and IPAPI modes may conflict with the DHCP client service. For example, the status of the virtual nic may be “Obtaining an IP address”, but the IP address has already been set. The status is not tested because it is related to the DHCP client service of Windows. Netsh/IPAPI and DHCP quirks also depend on the Windows version. The cleanest way to set an IP address is manual. However, you need to manually configure an IP address based on the output information of the OpenVPN client. Therefore, this method is not recommended.

3.5. Change the tap-Win32 driver mode

Modify the ProcessDHCP function logic and increase the number of DHCP limits to a large value: #define BAD_DHCPREQUEST_NAK_THRESHOLD 0Xffffffff #define BAD_DHCPREQUEST_NAK_THRESHOLD 0Xffffffff #define BAD_DHCPREQUEST_NAK_THRESHOLD 0Xffffffff

3.6. Precautions

Note that if the tap-win32 driver is not modified, ipconfig /release and ipconfig /renew cannot be used to operate the virtual nic even if offset is added to dynamic. Because every time such operations will initiate a release/discover/offer/request / * ck, the process of the request in the process of the stages of the client IP to 0, and the tap – win32 DHCP server IP is not the same, Tap-win32 gives only three such opportunities, after which it will nAK. However, it is possible to renew the IP address with the “Repair” button in the network card status, as this does not initiate a DISCOVER sequence.

Conclusion 3.7.

If the OpenVPN client is disconnected, ensure that there is no problem with the route. During the test, it is better to perform data transmission rather than just hang the OpenVPN client. For DHCP problems, dynamic with offset is recommended.

4. OpenVPN and virtual NIC problems

4.1. Keepalive of the OpenVPN Client and server

The Keepalive packet is a PING packet (not an ICMP Echo Request). The PING packet is sent and received over a physical link and has nothing to do with the virtual network adapter. Therefore, Keepalive of OpenVPN only ensures the connectivity of physical links between OpenVPNs, but does not guarantee the connectivity of virtual VPN links. For example, if the virtual network card is down manually (Linux: ifconfig tapX Down/Windows: In this case, as long as PING can be sent or received normally, OpenVPN does not know that the VPN link is faulty. Therefore, you need to monitor the status of virtual nics inside or outside the OpenVPN.

4.1.1. Internal monitoring of virtual nics

The internal monitoring mode of the virtual network adapter is easy to synchronize with the Keepalive mechanism (PING) of OpenVPN, but the real-time performance is poor. The OpenVPN server checks the virtual network adapter status (whether the NETWORK adapter is enabled and whether the IP address of the network adapter is changed) before each or several times of sending the PING. If the PING is not enabled, the OpenVPN server stops sending the PING. After a period of time, the OpenVPN server restarts the client. To reinitialize the virtual network card, assign IP addresses, and so on.

4.1.1.1. Linux Internal monitoring

Before sending a PING, check the tapX status through ifconfig and obtain the IP address. Then compare tapX with the saved address. If the IP address is inconsistent, no PING is sent. Monitor the push of ifconfig in add_option, update the saved virtual network card address information, and disable the sending of pings. The fact that the server pushes the ifconfig address indicates that a new connection has just started.

4.1.1.2. Windows Internal monitoring

Before sending a PING, use an IP Helper API function such as GetAdaptersAddresses to check the status of the Secure connection and obtain the IP address. Then compare the IP address with the saved address. If they are different, no PING is sent. Monitor the push of ifconfig in add_option, update the saved virtual network card address information, and disable the sending of pings. The fact that the server pushes the ifconfig address indicates that a new connection has just started.

4.1.2. External Monitoring of virtual nics

We always hope that whenever the status of the virtual network card or THE IP address changes, the OpenVPN process will be notified to restart, so that the real-time performance is better. However, there is no call point inside OpenVPN for us to insert such logic. Therefore, it is necessary to enable a separate external monitoring process or a thread inside the OpenVPN process to complete the monitoring, and the notification mechanism of the OpenVPN client needs to be defined.

4.1.2.1. Linux External monitoring

Monitoring can be done using IP Minitor in Iproute2, which is relatively simple. This is because Linux uses NetLink for notifications. IP Monitor polls on the netLink type it is interested in. Whenever the status/IP of the virtual network card changes, the kernel notifiable chain sends notifications to all nodes on the chain. However, the node’s callback function will send a NetLink message, which will be received by the user IP Monitor.

4.1.2.2. Windows External monitoring

There is no NetLink mechanism on Windows, however, it has two own mechanisms for notifying network card status changes. One of these is registry monitoring, where Windows uses guIDS as device keys to store devices in the registry. HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Services/Tcpip/Parameters/Interfaces save all the GUID card below and find our virtual network adapter, Simply call RegNotifyChangeKeyValue. As long as the status/IP of the nic is changed, the above key value must be overwritten. RegNotifyChangeKeyValue will return, which can be used as an interrupt signal, and then query the current status of the NIC to get the comparison information. The second approach is to use a function like NotifyAddrChange, which is independent of the registry. When the function returns, we call an API like GetAdaptersAddresses to query for matching information.





reference