This article is from OPPO Internet technology team, please note the author. At the same time, welcome to follow our official account: OPPO_tech, share with you OPPO cutting-edge Internet technology and activities.
In modern network communication, high-speed and secure network access service has been the common pursuit of Internet manufacturers. For the traditional TCP + TLS security Internet services, in the major head of the Internet manufacturers respond strongly, such as Google has put forward a variety of upgrades and patches, such as TCP Fastopen, TLS1.3, etc., and the traditional system based on TCP has been running for decades, formed a solidified or even rigid network infrastructure, This makes it difficult to upgrade patches or incorporate new solutions. In order to solve this problem in a more systematic and natural way, Google created a new way from UDP to create a better performance, higher security QUIC (Quick UDP Internet Connection). The IETF also uses QUIC as the standard communication infrastructure for HTTP3.
1. Inefficient transmission based on TCP+TLS
In the Web world, taking a simple HTTPS request such as www.XXX.com as an example, from sending the request to receiving the data, we can easily find the data communication process and the data flow during the whole link from tcpdump and Wireshark. It’s also clear what needs to be done to prepare for actual user data.
First, domain name resolution, domain name resolution is usually provided by using the DNS service of the network operator. The DNS service of the network operator usually has a cache, and this cache can shorten the network time.
Second, you need to go through a TCP three-way handshake to establish a connection, which consumes 1 RTT (the last ACK can be sent with the data);
The next step is TLS handshake operation. For TLS1.2, the most widely used, four TLS handshakes need to consume two complete RTT.
Before doing so much preparation, finally is the real user data transmission, assuming that the user’s data volume is very small, a packet transmission (such as about 1K), only need a 1RTT. Therefore, the data transmission efficiency here is very low. The whole transmission process takes DNS time +4 RTT communication time, among which only one RTT is carrying user data, and the transmission efficiency is only 25%. You can clearly see the communication interaction at each stage from the data transmission sequence diagram below.
2. Performance improvement of QUIC
From the above analysis, it can be seen that the extra overhead brought by HTTPS transmission based on TCP has become very large. How to reduce the 75% extra overhead has become the goal that people have been pursuing in the optimization of network transmission. If TLS is removed, the security cannot be guaranteed completely. Plaintext transmission, man-in-the-middle attack and so on bring great security threats to data transmission over the Internet.
Traditional TCP performance improvement practices
As security becomes an increasingly necessary part of people’s lives, TLS can now only be enhanced, not removed. Then how to improve the efficiency of data transmission? In TLS level, some optimization methods emerged later, such as TLS handshake link reuse. Use the sessionID or ticket mode to reduce the number of data exchanges in TLS handshake, so that the TLS handshake that originally requires two RTT can be changed into one RTT in some cases (connection reuse occurs, for example, the handshake has been successful and is within the validity period). Saving one RTT increases efficiency from 25% to 33%. It’s not just TLS. In TCP, some of the world’s top technology companies like Google have been working on optimizing TCP performance. Among them, new features such as TFO (TCP Fastopen) are proposed to improve the transmission efficiency of TCP, but these have the requirements of higher kernel versions. However, the infrastructure of the overall network TCP layer has been fossilized in the world, which makes it very difficult to upgrade patches or integrate new solutions.
QUIC uses UDP 1-RTT handshake to improve efficiency
The pace of people’s pursuit of the ultimate performance is unstoppable. In the case of THE basic rigidity of TCP, Google took the lead in the experiment of using UDP to rewrite the whole transmission scheme, and gradually formed a new transmission scheme gQUIC. Subsequently, Google submitted the proposal of experimental transport layer network protocol QUIC to IETF, and the first QUIC working group meeting was held by the Internet Engineering Task Force (IETF) in November 2016, which attracted extensive attention in the industry. This also meant that QUIC began its standardization process, becoming the next generation of transport layer protocols, forming the latest iQUIC. In the original gQUIC, it was a set of transmission encryption protocol similar to TLS designed by Google itself. Then, when QUIC entered IETF, with the standardization process of the whole QUIC and the appearance of TLS1.3 in recent years, performance and security have been greatly improved. IETF abandoned Google’s encryption protocol on QUIC and used standard TLS1.3.
In the communication process of QUIC, 1-RTT handshake mechanism is used when no connection has been established for the first time, and at the same time, connection establishment and security are ensured. Here’s how QUIC’s 1-RTT handshake works:
-
The Server holds the 0-RTT public and private key pair, generates an SCFG (Server configuration information object), and puts the public key in the SCFG.
-
When the client makes the initial request, it needs to obtain the 0-RTT public key from the server, which consumes one RTT, which is also the 1-RTT of QUIC.
-
The client will be cached after receiving the 0 – RTT is the public key, generate your own temporary public-private key pairs at the same time, after a RTT is in front of the client his temporary private key and the server sending a 0 to RTT is public key based on DH algorithm to generate an encryption keys K1, at the same time using K1 temporary public key to encrypt data at the same time with their own together to send the service side, User data has been sent.
-
After the server receives the user data encrypted with K1 and the temporary public key from the client, it does the following:
-
The 0-RTT private key and the temporary public key sent by the client are used to generate K1 to decrypt user data through DH algorithm and submit to the application.
-
Generate the server temporary public and private key pair, using the private key of the temporary public and private key pair, and the client temporary public key sent by the client, generate the DATA to be transmitted by the K2 encrypted server
-
Send the server’s temporary public key and application data encrypted using K2 to the client
- After receiving the temporary public key of the server and application data encrypted using K2 from the server, the client uses the DH algorithm to re-generate THE K2 decryption data of the temporary public key of the server and the original temporary private key of the client, and then uses K2 for data layer encryption and decryption
Remark:
Why does the server regenerate a temporary public/private key pair and then use the DH algorithm to generate the encryption key K2? Its core consideration is security. If there is no temporary public and private key and K2 on the server side, K1 used in the communication process is not secure, because the 0-RTT public and private key in SCFG on the server side is for all clients and will be kept for a long time until expiration, and the expiration time is generally quite long. Once the server’s 0-RTT private key is compromised, all client communications cannot be secure forward. An attacker simply captures the packet and obtains the 0-RTT private key to decrypt all communication data.
QUIC’s 0-RTT handshake efficiency is greatly improved
0-RTT is a key attribute of QUIC that can carry user data on the first datagram of the connection. However, we can also see that if the client and server never communicate, there is no 0-RTT, and a complete RTT is required to carry user data.
This is QUIC’s 1-RTT process, so what does he do with his 0-RTT? The client saves the 0-RTT handshake public key and the related information, so that the server can directly use the saved data when establishing a connection. As long as the data has not expired, the server will accept the saved data. Therefore, the RTT sent by the public key can be avoided and K1 encrypted user data transmission can be generated directly.
This process is a gQUIC process. IQUIC uses TLS1.3, so the details of packets in the handshake phase are different. For example, the first request is for information such as certificate and PSK. In the 0-RTT phase, session multiplexing ticket mode is used.
From the above analysis, we can see that QUIC has a great performance improvement in the handshake phase. The maximum delay is only increased by one RTT. The performance can be consistent with plaintext BASED HTTP, but the security can be consistent with HTTPS. If the 0-RTT feature is used, the data efficiency is improved more efficiently, but the security is slightly reduced, because the 0-RTT feature inevitably involves replay attacks. In general, the transmission efficiency of user data has been greatly improved, from the original fastest 33% to 50%, and even to 100% in the case of 0-RTT. User data is transmitted from the first message.
Both gQUIC and iQUIC integrate the management and security of the connection into one, allowing the transport protocol to have native security attributes.
3. Safety analysis of QUIC
Does the fact that QUIC brings such a big performance boost mean that we can push all the traffic into QUIC regardless? Let’s take a look at how these new features of QUIC are secure, what security concerns should be addressed by security personnel and users of QUIC, what types of businesses are most effective for using QUIC at this stage, and which businesses are not suitable for QUIC or some of the features of QUIC are not suitable.
The immaturity of the agreement and the stability of the product
QUIC as a new generation of agreements, including is congestion control algorithm, the other security policy, maturity is not particularly high, the current in a production environment has not a generality and compatibility, either implementation, each manufacturer are according to their own situation to realize the QUIC part features, there are some features did not fully implemented, Either the implementation mechanism is questionable, or some security feature is sacrificed, so there are various problems in the use process. In this case, the use of open source QUIC services as a small or medium vendor presents many problems, such as security, stability, reliability, resource consumption, and so on. It takes a lot of manpower and material resources to fully develop the implementation, so it is not recommended for small and medium-sized manufacturers to fully follow up on QUIC. Or should wait for the agreement mature, mature products and then cut in.
SCFG signature computing security problem
From the previous analysis, we can see that the importance of SCFG is very critical. In the 0-RTT scenario, this data is completely used to obtain the 0-RTT handshake public key, and the flow needs to be transmitted between the client and the server. So it is very important that it is secure and reliable. How does QUIC protect against man-in-the-middle attacks? Does it bring other security risks?
A signature mechanism is added to this data in QUIC, and an expiration time is set to ensure security. The signature is signed by using the private key of the public certificate. The certificate needs to be authenticated on the client, which ensures that the man-in-the-middle attack cannot be implemented. The expiration time of SCFG greatly alleviates the malicious collection of SCFG. We all know that the signature is to use an asymmetric algorithm to do, if using the RSA signature will make the service side signature cost a lot of computing resources, whereby an attacker can result in a work force attack on the service side DoS, in a production environment you need to use the hardware acceleration card to offload the signature, so that we can to alleviate its effective At the same time, using the ECC certificate for signature also improves computing performance.
0- Security problems caused by RTT public key leakage
0 – RTT is the public key can be saved on the client, the server also can almost permanent stay and each client is Shared, in 0 – RTT is the process of communication, at the same time 0 – RTT will carry on the first packet user data (using the performance analysis of K1 encrypted user data), so safety problem clearly, lost the forward security. After the packet is captured, the data can be cracked once the server 0-RTT private is leaked.
The breakdown of 0-RTT forward security can be seen in the figure below
0-RTT causes replay attack security problems
No 0-RTT mechanism is immune to replay attacks (whether it is 0-RTT in TLS1.3 or 0-RTT in gQUIC), and all 0-RTT mechanisms have a significant damage to security while improving performance.
0-RTT does not provide forward safety capability (PFS)
0-RTT First packet does not have source address authentication capability
QUIC/TLS1.3 provides a key exchange mechanism to ensure PFS after 0-RTT or after handshake
As we have seen from the previous analysis, 0-RTT does not have forward security capability. Data can be captured continuously and can be cracked when the handshake private key is leaked. QUIC’s non-0-RTT packets are capable of providing source address authentication, which can be challenged by sending the source address in risky scenarios. QUIC provides a mechanism of STK, the client sends packets for the first time the server will be according to the source address of the packet and the server timestamp generated a source address factors such as TOKEN (STK), then with response packets sent to the client, and in the subsequent data transmission in the process of the client need to passthrough the STK to the server, So that the server can perform verification. Of course, for the sake of performance, the server will not verify every time. Instead, it will launch a verification challenge when the corresponding relationship between the source address and the connection ID changes or the connection migrates. However, this verification cannot be performed on the first packet of 0-RTT.
For business operations with high security requirements, such as POST or PUT operations, 0-RTT is usually turned off to ensure security. For example, Facebook and CloudFlare also disable 0-RTT on some key operations. Only idempotent operations (GET, HEAD, etc.) use 0-RTT.
Replay the attack:
Weak security of UDP compared to TCP
UDP security exists in several key places, source address spoofing attacks, UDP amplification attacks and so on. In QUIC, the security mechanism of source address TOKEN (STK) authentication is designed to solve the spoofing attack of source address. In the communication process, the server requires to confirm the source address TOKEN of the client. The source address TOKEN generates STK according to the source address of the packet and the timestamp of the server. It is then sent to the client together with the response packet, and in the subsequent data transmission process, the client needs to pass through the STK to the server, so that the server can verify. When detecting that the source address of a connection changes, the server sends RETRY packets to verify the source address. The client can also actively send origin address authentication information. Source address authentication protects against two types of attacks, source spoofing attacks and UDP amplification attacks.
-
To verify whether the address of the client is forged by an attacker when a connection is established, the server generates a token and sends a Retry packet to the client. The client needs to carry this token with subsequent Initial packets so that the server can verify the address.
-
The server can pre-issue the token in the current connection through the NEW_TOKEN frame, so that the client can use it in subsequent new connections. This is an important feature of QUIC to implement 0-RTT.
-
QUIC provides connection migration to avoid disconnection when our network path changes (such as switching from cellular to WIFI). QUIC verifies the reachability of new network addresses through Path Validation to prevent addresses from being forged by attackers during connection migration.
Due to the asymmetry of the handshake, it can also cause magnification attacks:
The formulation of the QUIC protocol provides some mitigation against amplification attacks in the following ways:
-
In 0-RTT stage, the source address verification capability is missing, but the packet size is required to be fully filled, so the amplification factor is greater than <1.
-
Provide a retry mechanism for source address challenge verification.
-
Non-0-rtt packet source address validation STK, STK expiration mechanism
-
HTTP / 3, with its rate-limiting capabilities and ephemeral authentication tokens, can act as compensation control for DDOS attacks while partially mitigating attack situations
Security issues of source address spoofing and path spoofing caused by connection migration
When our network path changes (such as switching from cellular to WIFI or NAT rebinding), QUIC provides connection migration to avoid disconnection.
When the source address changes, QUIC does not immediately terminate the transmission for address verification challenge. Instead, it continues the current transmission for performance reasons and then performs source address verification challenge or path verification challenge. So there’s a gap between that and address spoofing or even amplification. Some ways to alleviate the problem are to configure the speed limit after the address changes until the new challenge is successfully completed or the challenge fails to return to the previous connection. In addition, if the authentication fails, you need to restore the original valid connection to prevent link reset attacks.
Points to note in the current feature scenario
As the maturity and security of the current QUIC protocol are insufficient, there are a lot of points that need to be paid attention to in the current stage. The following is a simple analysis.
Data modification scenarios need to be restricted
The problem of 0-RTT forward security deficiency and replayable attacks shows that it is not suitable to enable 0-RTT feature in critical scenarios such as transfer operations, because replayable attacks can bring very large data tampering problems. In the industry, many top companies pay great attention to security. Generally, 0-RTT is enabled only in idempotent operations such as GET operations, and 0-RTT is disabled in other scenarios. Even the function of connection migration is limited
Perform this operation with caution for services requiring high security scenarios
Caution should be exercised when using QUIC in high-security service scenarios such as accounts. Source ADDRESS spoofism may occur in the handshake phase or connection migration phase, which may cause account exceptions
Video, games and other high real-time requirements of the scene effect is obvious
Because QUIC has good performance improvement in the process of first connection and network type switching, it can be used for download services and video services. This is especially true in seconds. At present, the best way for this kind of business is to separate signaling data from transmission mode (such as TCP), while video data can go through QUIC, which can give both security and transmission efficiency a better consideration.
4. QUIC’s practice in OPPO
OPPO’s overseas business is developing rapidly, and the number of users has grown exponentially, reaching a scale of 100 million. Overseas network coverage is poor, especially in India, Indonesia and other Southeast Asian regions. The success rate of HTTPS links is low, and the download delay is large. At the same time, the switching process between WiFi and 4G is also very unstable, and some places even appear QoS traffic limiting situation. How to improve the network experience has become a very big problem we are facing.
Security-enhanced architecture
In order to better improve user experience and solve these problems, as well as OPPO’s pursuit of perfection in technology, it actively follows up the development of advanced technology and adopts QUIC protocol to improve user experience in some scenarios with high user experience. At the same time, OPPO is a company that pays great attention to security. It has done a lot of research before using QUIC, and has also made a detailed analysis of the network performance improvement and security problems brought by QUIC.
Here are a few key security considerations for OPPO:
-
Add WAF support to QUIC protocol base;
-
Use caution with 0-RTT;
-
Security governance module to docking QUIC address verification;
To enhance the security capabilities of QUIC, we are supporting WAF security capabilities using QUIC base, integrating WAF directly into the processing of secure QUIC services. At the same time, the security governance module abstracted from QUIC can directly launch the source address authentication challenge in many scenarios.
In the use of 0-RTT, we only allow the use of O-RTT in idempotent operations (such as GET, HEAD, etc.), 0-RTT as a very cautious use of the module, even did not support this feature in the early stage.
Performance test results
The test results in OPPO weak net laboratory are as follows:
From the laboratory data, we can see that there has been a very good improvement. However, the online environment is changeable and complicated, so we have adopted a cautious strategy online. At present, some overseas businesses have been launched in gray scale. And from the point of the effect of online, in weak network environment to enhance or delay has a good effect, the delayed about 11% of ascension, there is only part of the gray level, the online environment is relatively complex, may have the user local network support for UDP is not too good, in the online environment of congestion control also need separate optimization according to the reality. There are still many features that need to be continuously supported, including congestion control in the online environment, low-level processing of UDP packets through DPDK to improve processing performance, improve system utilization and throughput. QUIC’s security governance module is still in the experimental stage, and it needs to be continuously optimized and adjusted in the future to achieve better security effects and balance security and performance.
The IETF has designated QUIC as the transport bearer protocol for HTTP3, and a Request for Comments Document (RFC) is expected to be issued in 2021. At that time, more websites and applications will run on QUIC, and the leading manufacturers in the industry (Google, Facebook, Tencent, Alibaba, Huawei, etc.) have gradually started to support QUIC, and it is believed that its security capabilities will be improved in the subsequent large-scale operation. For us, QUIC is a starting point, and we will better improve people’s user experience on OPPO.