From entering the URL to sending the request, we went through a few things, and today we summarize them. In short, there are several processes
- DNS Domain name Resolution
- Initiating a TCP Connection
- Sending an HTTP request
The server processes the request and returns HTTP packetsThe browser parses the rendered pageConnect the end of the
URL Address Composition
DNS Domain name Resolution
What is domain name resolution
The DomainNameSystem (DomainNameSystem) is a service of the Internet. As a distributed database that maps domain names and IP addresses to each other, it makes it easier for people to access the Internet
Domain name Resolution Process
- The system checks the browser cache for an IP address that has been resolved for the domain name. If it does, the resolution process ends. The browser cache is controlled by the expiration time of the domain name and the size of the cache.
- If the user does not have one in the browser cache, the browser looks for the local Host file in the operating system cache
- The router may also have a cache.
- Sent to the local operating system will be domain name server, local domain name server to query DNS cache, find success is returned as a result, the failure is initiated by an iterative DNS request 1, local domain name server) to the root name servers, although there is no specific information for each of the domain name, but the store is responsible for each domain, The root DNS server returns the address of the TOP-LEVEL DNS server in the COM domain. 2. The local DNS server sends a request to the TOP-LEVEL DOMAIN server of the COM domain and returns the address of the Baidu.com domain server. 3. The local DNS server sends a request to the baidu.com domain name server and obtains the IP address of www.baidu.com.
- The local DNS server returns the IP address to the operating system and caches the IP address itself. The operating system returns the IP address to the browser and caches it itself;
- At this point, the browser has obtained the IP address corresponding to the domain name.
Initiating a TCP Connection
TCP/IP Layer 5 protocol
- Application layer: The activities that determine communication when providing application services to users. The TCP/IP protocol family stores various common application services. For example, FTP, DNS, and HTTP.
- Transport layer: Transport layer to upper application layer, providing data transfer between two computers in a network connection. At the transport layer, there are two different protocols: Transmission Control Protocol (TCP) and User Data Protocol (UDP).
- Network layer: The network layer handles the packets that flow over the network. A packet is the smallest unit of data transmitted over a network. This layer defines the path through which the packets are sent to each other’s computers. When communicating with the other computer through multiple computers or network devices, the role of the network layer is to select a transmission route among many options.
- Data link layer: on the basis of bitstream service provided by the physical layer, the data link between adjacent nodes is established, the error-free transmission of data frames on the channel is provided through error control, and the action series of each circuit is carried out. The units of data are called frames.
- Physical layer: The physical layer is built on the basis of the physical communication medium. As the interface between the system and the communication medium, the physical layer is used to realize the transparent bit stream transmission between entities of the data link. Only this layer is the real physical communication, and other layers are virtual communication.
TCP protocol
Transmission Control Protocol (TCP) is a connection-oriented, reliable, byte stream based transport layer communication protocol defined by RFC 793 of the IETF. TCP is a connection-oriented, reliable streaming protocol. A stream is an uninterrupted data structure, which you can think of as the flow of water in a drainpipe.
TCP Connection Process (Three-way handshake)
- 1. First handshake: the client sends a SYN packet (Seq= X) to the server and enters the SYN_SEND state, waiting for the server to confirm.
- 2. Second handshake: After receiving a SYN packet, the server must acknowledge the client’s SYN (ACK = X +1) and send a SYN packet (Seq= Y). At this time, the server enters the SYN_RECV state.
- 3. Third handshake: After receiving the SYN+ACK packet from the server, the client sends an ACK packet (ACK = Y +1) to the server. After the packet is sent, the client and the server enter the ESTABLISHED state to complete the three-way handshake.
- The packet transmitted during the handshake does not contain data. After three handshakes, the client and server start data transmission. Ideally, once a TCP connection is established, it is maintained until either of the communicating parties voluntarily closes the connection.
Why do you use the three-way handshake? Is it ok to use the two-way handshake? Four times?
- The connection is established using client-server mode, assuming that host A is the client and host B is the server. The three-way handshake is used to prevent the invalid connection request segment from suddenly being sent to host B and causing an error. Invalid connection request packet segment: Host A sends A connection request to host B again after A period of time because the connection request is not acknowledged by host B. The connection request is successfully established and data is transmitted in sequence. Consider A special case, host A sends A connection request is not lost for the first time, but because the network nodes lead to delay to host B, thought is to host A and host B launched A new connection, so the host B agree to connect, and sent back to the host A confirmation, but now host A can’t ignore, host B I have been waiting for the host to send data, Resources on host B are wasted.
- Using a two-handshake will not work because of the special case of a failed connection request described above. In the three-way handshake, both the client and server send syn and ack packets. If both parties can receive ack packets after sending syn packets, the communication is ready.
- Why not four handshakes? You should be aware of the famous blue and Red Army agreement in communication. This example shows that communication cannot be 100% reliable, and the above three handshakes have already done the preparation work for communication. Additional handshakes do not significantly improve reliability, and are not necessary.
TCP features
- connection-oriented
Connection-oriented means that the connection must be established at both ends before data is sent. The method of establishing a connection is the “three way handshake”, which will establish a reliable connection. The establishment of a connection is to lay a foundation for reliable transmission of data.
- Only unicast transmission is supported
Each TCP transmission connection has only two endpoints for point-to-point data transmission. Multicast and broadcast transmission modes are not supported.
- Word oriented stream
Unlike UDP, TCP transmits packets in byte stream mode without preserving packet boundaries.
- Reliable transport
For reliable transmission, error codes are determined by the TCP segment number and confirmation number. To ensure the reliability of packet transmission, TCP assigns a serial number to each packet, and the serial number also ensures that the packets sent to the receiving entity are received in sequence. The receiving entity then sends back an acknowledgement (ACK) of the successfully received byte; If the sending entity does not receive acknowledgement within a reasonable round trip delay (RTT), the corresponding data (if lost) will be retransmitted.
- Provide congestion control
When the network is congested, TCP can reduce the rate and quantity of data injected into the network to alleviate the congestion
- TCP provides full duplex communication
TCP allows applications on both sides of the communication to send data at any time because there are caches at both ends of the TCP connection to temporarily store two-way communication data. Of course, TCP can send a segment immediately, or it can cache for some time to send more segments at once (the maximum segment size depends on MSS)
UDP protocol.
UDP is a connectionless protocol used to process data packets like TCP on the network. In the OSI model, layer 4, the transport layer, is one layer above the IP protocol. UDP does not provide packet grouping, assembly, and sorting. That is, after a packet is sent, it is impossible to know whether the packet arrived safely and intact.
Features of UDP
- Connectionless oriented
First of all, UDP does not need to establish a connection by three handshakes like TCP before sending data, and you can start sending data. They are only porters of data packets and do not split or splice data packets. To be specific:
- At the sending end, the application layer passes the data to the UDP protocol at the transport layer. UDP simply adds a UDP header to the data to identify the UDP protocol, and then passes the data to the network layer
- At the receiving end, the network layer passes the data to the transport layer, and UDP passes the IP header to the application layer without any concatenation
- Unicast, multicast, broadcast functions
UDP not only supports one-to-one transmission, but also one-to-many, many-to-many, and many-to-one transmission modes. That is, UDP provides unicast, multicast, and broadcast functions.
- UDP is packet oriented
After the header is added, the UDP packets from the sender are sent to the IP layer. UDP does not merge or split packets from the application layer, but retains the boundaries of the packets. Therefore, the application must select a message of the appropriate size
- Unreliability.
First of all, the unreliability is reflected in the absence of connection. Communication does not need to establish a connection. If you want to send, it is definitely not reliable. The data is transmitted as it is received, without backing up the data, and without caring whether the data has been correctly received. Again, the network is up and down, but UDP always sends data at a constant speed because it has no congestion control. Even if the network condition is not good, the sending rate will not be adjusted. The disadvantage of this implementation is that it may lead to packet loss in the case of poor network conditions, but the advantage is also obvious. In some real-time demanding scenarios (such as teleconference), UDP is used instead of TCP. UDP only throws the intended data packets to the other party, regardless of whether the data arrives safely and intact.
- The header cost is low and it is very efficient in transmitting data packets.
The UDP header contains the following data:
- Two hexadecimal port numbers, source port (optional field) and destination port
- The length of the entire data packet
- Checksum of the entire data packet (IPv4 optional field). This field is used to find errors in header information and data
QUIC agreement
Quick UDP Internet Connection (QUIC) is an Internet transport layer protocol with low latency based on UDP developed by Google. In November 2016, the Internet Engineering Task Force (IETF) held the first QUIC working group meeting, which received wide attention from the industry. This also means that QUIC has begun its standardization process, with the ultimate goal of replacing TCP and TLS as the next generation of transport layer protocols on the Web
QUIC well addresses the various requirements facing the transport and application layers today, including handling more connections, security, and low latency. QUIC incorporates features of protocols including TCP, TLS, HTTP/2, but is based on UDP. One of the main goals of QUIC is to reduce the connection latency. When the client first connects to the server, QUIC only needs 1RTT (round-trip Time) to establish a reliable and secure connection, which is faster than the 1-3 RTT of TCP+TLS. After that, the client can cache the encrypted authentication information locally, and the connection establishment delay of 0-RTT can be achieved when the client attempts to establish a connection with the server again. QUIC multiplexes HTTP/2 Multiplexing at the same time, but avoids HTTP/2 Blocking because QUIC is based on UDP. Because QUIC is based on UDP and runs in the user domain rather than the system kernel, QUIC protocol can be updated and deployed quickly, so as to solve the difficulties of TCP deployment and update
Comparison of TCP and UDP
Sending an HTTP request
HTTP
HTTP (Hypertext Transfer Protocol) is an application layer protocol for distributed, collaborative and hypermedia information systems. HTTP is the foundation of data communication over the Internet. It was coordinated by the World Wide Web Consortium (W3C) and the Internet Engineering Task Force (IETF) to develop HTTP standards, and finally published a series of RFCS, and RFC 2616, published in June 1999, defines a widely used version of THE HTTP protocol — HTTP 1.1.
HTTP Access Procedure
HTTP is an application-layer protocol in the TCP/IP model. When the browser communicates with the server, a TCP connection must be established before the server receives the browser’s request information. After receiving the request information, the server returns the corresponding message. Finally, the browser receives a response to the server’s message and interprets the data.
Under HTTP 1.0, the browser had to establish a separate connection for each visit, which wasted resources.
Later in HTTP 1.1 pipelining, clients can make multiple HTTP requests at the same time, process multiple requests in a single connection and overlap them
- Note: This pipelining is just a theoretical scenario, and most desktop browsers still choose to turn HTTP Pipelining off by default!
- So now using HTTP1.1 protocol applications, it is possible to open multiple TCP connections!
HTTP Pipelining is a process that combines multiple HTTP requests into a TCP connection and sends them one by one without waiting for a response from the server. However, the client still receives the response in the same order it sent the request!
- In HTTP1.0, when you send a request, you wait for the server to respond before you can continue sending the request.
- In HTTP1.1, a request can be sent without waiting for the server to respond, but the client still needs to receive the data in the order of the response
- So whether HTTP1.0 or HTTP1.1 introduced Pipelining theory, it still gets blocked. Technically speaking, this condition is called Head of line blocking
SPDY: optimization of http1.x
In 2012, Google, like a thunderclap, proposed SPDY solution to optimize http1. X request latency and solve the security of http1. X. Details are as follows:
- For HTTP’s high latency problem, SPDY gracefully takes multiplexing. Multiplexing can share a TCP connection through multiple request streams, which solves the problem of HOL blocking, reduces latency and improves bandwidth utilization.
- A new problem with multiplexing is that key requests can be blocked on a shared basis. SPDY allows you to prioritize each request so that important requests get a response first. For example, when the browser loads the home page, the HTML content of the home page should be displayed first, and then all kinds of static resource files and script files are loaded, so as to ensure that users can see the content of the web page in the first time.
- The http1. x header mentioned earlier is often redundant. Choosing the right compression algorithm can reduce the size and number of packets.
- HTTPS based encryption protocol transmission greatly improves the reliability of transmitted data
- For example, my web page has a sytl. CSS request. When the client receives sytl. CSS data, the server will push the sytl. js file to the client. When the client tries to retrieve sytle.js again, it can be retrieved directly from the cache without having to request it again. SPDY composition diagram:
HTTP2
HTTP/2 (Hypertext Transfer Protocol Version 2, originally named HTTP 2.0), is the second major version of the HTTP protocol used on the World Wide Web. HTTP/2 is the first update of HTTP protocol since THE release of HTTP 1.1 in 1999. It is mainly based on SPDY protocol (a TCP based application layer protocol developed by Google to minimize network latency, improve network speed, and optimize users’ network experience).
Differences between HTTP2.0 and SPDY:
- HTTP2.0 supports plaintext HTTP transport, while SPDY enforces the use of HTTPS
- Http2.github. IO /http2-spec/… SPDY using DEFLATE zh.wikipedia.org/wiki/DEFLAT…
Comparison of HTTP1.0, HTTP1.1, and HTTP2.0
HTTP Protocol Features
- Simple, fast, and flexible: When a user wants to send a request to the server, simply pass the request method and path. HTTP allows the transfer of any type of data object. And HTTP protocol is easy to use, HTTP server size is small, to ensure the speed of network communication;
- No connection, no stateless: The HTTP protocol limits each connection to a single request. When the server receives a request, it disconnects, saving transmission time. At the same time, HTTP protocol has no memory capacity for transaction processing, if the subsequent request needs to use the previous information must be retransmitted data;
- Pipelining and content encoding: With the advent of pipelining, HTTP requests are faster than persistent connections, and HTTP compresses files to reduce transmission time when some messages are too large.
- HTTP supports client/server mode
From HTTP to HTTPS
HTTP protocol has been used for data transmission between web servers and browsers because of its simplicity, speed and low resource consumption. However, there are also obvious problems in the process of data transmission. Because HTTP is a plaintext protocol, data is not encrypted in any way. When hackers steal the packets transmitted between the web server and the browser, they can directly read the transmitted information, resulting in the disclosure of the website and user data. Therefore, HTTP is not suitable for the transmission of sensitive information, so HTTPS (Hypertext Transfer Security Protocol) needs to be introduced.
HTTPS
HTTPS is a transport protocol for secure communication on computer networks. The SSL/TLS layer is added under HTTP to protect the privacy and integrity of exchanged data and provide the function of identity authentication on the website server. Simply speaking, it is the secure version of HTTP.
HTTPS Access Procedure
HTTPS shakes hands with the Web server and Web browser to determine their encrypted passwords before data transfer. The specific process is as follows:
- 1. The Web browser sends the supported encrypted information to the Web server
- 2, the website server will choose a set of encryption algorithm and hash algorithm, will verify the identity information in the form of certificate (certificate issuing CA authority, certificate validity period, public key, certificate owner, signature, etc.) sent to the Web browser;
- 3. When a Web browser receives a certificate, it first needs to verify the validity of the certificate. If the certificate is trusted by the browser, it will be displayed in the browser address bar with a mark; otherwise, it will display an untrusted mark. When the certificate is trusted, the Web browser randomly generates a string of passwords and encrypts them using the public key in the certificate. After that, it is to use the agreed hash algorithm to shake the message, and generate random number to encrypt the message, and then send the generated information to the website;
- 4. When the website server receives the data sent by the browser, it will use the private key of the website itself to decrypt the information to determine the password, and then decrypt the handshake message sent by the Web browser through the password, and verify whether the hash is consistent with the Web browser. The server then encrypts the new handshake with a password and sends it to the browser;
- 5. Finally, the browser decrypts and computes the handshake message encrypted by the hash algorithm. If the handshake is consistent with the hash sent by the service, the server and browser will exchange data using the random password and symmetric encryption algorithm generated by the previous browser after the handshake.
HTTPS encryption algorithm
To protect data security, HTTPS uses asymmetric encryption: the encryption key is different from the decryption key, which is called public key and private key. The public key and algorithm are public, while the private key is secret. Asymmetric encryption algorithm has low performance but strong security. Due to its encryption characteristics, asymmetric encryption algorithm can encrypt data length is also limited. For example, RSA, DSA, ECDSA, DH, and ECDHE.
From the HTTPS to HSTS
But when the web protocol goes from HTTP to HTTPS, is the data really secure?
When users want to access a website, they usually enter only one domain name in the browser. Instead of adding http:// or https:// before the domain name, the browser automatically fills in the domain name. Currently, all browsers fill in http:// by default. Generally, the website administrator will use the 301/302 redirect to switch from HTTP to HTTPS. However, HTTP is always used in this process, so it is easy to be hijacked and attacked by third parties.
This is where HSTS (HTTP Strict Secure Transport) comes in
HSTS
HTTP Strict-transport-Security (HSTS) is a Web Security policy mechanism. It is a new Web Security protocol implemented by the International Internet Engineering Organization (IETF). After using HSTS, users do not need to manually enter HTTPS in the address bar when accessing a website. The browser automatically uses HTTPS to access the website address, ensuring that users always access the encrypted links of the website and protecting data transmission security.
HSTS principle
HSTS controls browser operations primarily through the way the server sends response headers:
1, add the HSTS response header to the server response header: strict-transport-security: max-age=expireTime [; includeSubDomains] [; Preload] This response header takes effect only when HTTPS access is returned. The parameter in [] indicates optional. 2.
3. The next time the user accesses the HTTP server, the client redirects internally and displays the 307 Redirect Internel response code.
4. The web server becomes the HTTPS access source server.
After HSTS is enabled, the website can effectively guard against man-in-the middle attacks, and save the time spent on 301/302 redirect of the website, greatly improving the security factor and user experience.
HSTS Preload List
Although HSTS is a good solution to HTTPS degradation attacks, the first HTTP request before HSTS takes effect cannot avoid being hijacked. To solve this problem, browser manufacturers put forward the HSTS Preload List scheme. The method is to build a list of names in the browser that can be updated periodically. For the names in the list, users will use HTTPS even if they have not visited them before
If you want to add your domain name to the preloaded list, the following conditions must be met:
- Provide a valid certificate.
- Redirect all HTTP traffic to HTTPS.
- Ensure HTTPS is enabled for all subdomains, especially the WWW subdomain.
- Output HSTS response header: 1. Max-age takes at least 1 year (31536000 seconds). 2, specify includeSubdomains parameter; 3. The preload parameter must be specified. 4. If you are providing an additional redirect from an HTTPS site, the redirect must still have an HSTS header (not the page to which it was redirected).
Sending an HTTP request
The process of sending an HTTP request is to construct an HTTP request packet and send it to a specified port on the server through TCP. A request packet consists of a request line, a request header, and a request body
-
The Request Line is divided into three parts: Request method, Request address, and protocol and version, ending with CRLF(RN). HTTP/1.1 defines eight request methods: GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS, and TRACE. The two most common GET and POST methods are used for RESTful interfaces.
-
Request header
Principles of HTTP Proxy
General agent
The HTTP client sends request packets to the proxy server. The proxy server correctly processes the request and Connection (for example, Connection: keep-alive), sends the request to the server, and forwards the received response to the client.
The client accesses the website of A through the proxy. For A, it will regard the proxy as the client, completely unaware of the existence of the real client, which realizes the purpose of hiding the IP address of the client. Of course, the proxy can also modify the HTTP request header to tell the server the actual CLIENT IP through a custom header like X-Forwarded-IP. However, the server cannot verify that the custom header was actually added by the proxy or that the client modified the request header, so you need to be careful when retrieving the IP from the HTTP header field. The client actually accesses the proxy. After receiving the request packet, the proxy sends a request to the server that actually provides the service and forwards the response to the browser. This situation is commonly referred to as a reverse proxy and can be used to hide server IP and port numbers. After the reverse proxy is used, you need to modify the DNS to resolve the domain name to the IP address of the proxy server. In this case, the browser cannot detect the existence of the real server and does not need to modify the configuration. Reverse proxy is the most common deployment mode of Web system.
Tunnel proxy
Through the CONNECT method, the HTTP client requests the tunnel agent to create a TCP connection to any destination server and port, and blind forwards the subsequent data between the client and server.
The client accesses website A through the proxy. The browser first requests the proxy to create A TCP connection to website A through the CONNECT request. Once the TCP connection is established, the proxy mindlessly forwards subsequent traffic. So this proxy, in theory, can work with any TCP-based application layer protocol, as well as the TLS protocol used by HTTPS websites. This is why such agents are called tunnels. For HTTPS, the client directly negotiates the key through the TLS handshake with the server through the proxy, so it is still secure
Unblock the browser cache mechanism
-
Strong cache The request sent by the user is directly obtained from the client cache without sending the request to the server or interacting with the server.
-
Negotiation Cache After a request is sent to the server, the server determines whether to obtain resources from the cache.
-
They have something in common: the data that the client gets ends up getting from the client cache.
-
The difference: As the name suggests, strong caches do not interact with the server, whereas negotiated caches do.