Part of this article is excerpted from Illustrated HTTP, so you don’t need to use reference tags.
The birth of HTTP
In March 1989, the Internet belonged to a minority. In this dawn of the Internet, HTTP was born.
Dr Tim Berners-Lee of CERN, the European Organisation for Nuclear Research, has come up with an idea that would allow far-flung researchers to share knowledge. The basic idea of the original idea was to make the WWW (World Wide Web) accessible to each other by means of HyperText, which is formed by the correlation between multiple documents.
Three WWW construction techniques have been proposed, which are: HTML (HyperText MarkupLanguage) which uses SGML (Standard Generalized MarkupLanguage) as the text MarkupLanguage of pages; HTTP as a document delivery protocol; UniformResource Locator specifies the URL (UniformResource Locator) of the document.
The name WWW, which was used by Web browsers to browse hypertext client applications, is now used to refer to this collection, or Web for short.
TCP/IP protocol suite
There are two perceptions of TCP/IP:
- TCP/IP refers to TCP and IP
- TCP/IP refers to the protocol family used in IP communication
TCP/IP hierarchical management
From top to bottom:
- Application layer: File Transfer Protocol (FTP), Domain Name System (DNS), and HTTP
- Transport layer, such as Transmission Control Protocol (TCP) and User Data Protocol (UDP)
- Network layer (also known as the network interconnection layer), such as IP
- The link layer (also known as the data link layer or network interface layer) deals with the hardware parts connected to the network, such as the operating system, hardware device drivers, network adapters, and optical fibers
IP
The IP protocol, which belongs to the network layer, is responsible for transmitting data packets and is used by almost all systems using the network. The IP protocol requires various conditions to transfer data, two of the most important of which are:
- An IP address is the IP address assigned to a node. The IP address is periodically changed (assigned) on a non-special network.
- The MAC address is the fixed address of the nic.
ARP protocol
The communication between IP addresses depends on MAC addresses. ARP associates IP and MAC addresses. It searches for the associated MAC address on the LAN using the IP address, and then communicates on the network. For more information about ARP, see THE ARP protocol
TCP
TCP is a transport layer communication protocol that provides connection-oriented, reliable, byte stream services.
- Reliable means being able to transmit data accurately and reliably to the other party
- The Byte Stream Service divides a chunk of data into packets based on segment for easy transmission
Three-way handshake
To ensure reliable transmission, TCP needs to establish a connection first, and when establishing a connection, it uses a three-way handshaking strategy. The three-way handshake process is as follows:
- First handshake
- Sender: Sends packets with SYN flags to the receiver
- Receiving end: Receives packets with SYN
- Second handshake
- Receiver: Sends packets with SYN and ACK flags to the sender
- Sender: Receives packets and verifies that the SYN flag is correct
- Third handshake
- Sender: Sends packets with ACK flags to the receiver
- Receiver: Receives the packet and verifies that the ACK flag is correct
If no exception occurs in the preceding four steps, the connection is established.
Why three handshakes? To establish a reliable connection, ensure that both the sender and the receiver have sending and receiving capabilities
- First handshake
- The receiving end receives the SYN and knows that the sender has the sending capability
- Second handshake
- The sender receives an ACK and knows that the receiver is capable of sending
- The sender verifies the SYN to know that the receiver is capable of receiving packets
- Third handshake
- The receiving end verifies ACK to know that the sender has the ability to receive
As mentioned above, the minimum number of handshakes required to ensure a reliable connection is three, so TCP has a three-way handshake policy.
Four times to wave
TCP disconnects using the four-wave wave policy:
- First wave
- Sender: Sends a packet marked with FIN to the receiver, indicating that it wants to disconnect the connection
- Receiving end: Receives data packets and stores the FIN temporarily
- Second wave
- Receiver: Sends a packet marked with ACK to the sender, indicating that the request to disconnect is received but not immediately disconnected
- Sender: receives the packet and stores the ACK temporarily
- Third wave
- Receiver: Sends a packet with the FIN flag to the sender to indicate that I am ready to disconnect
- Sender: Receives the packet and authenticates the FIN
- Fourth wave
- Sender: Sends a packet with an ACK flag to the receiver indicating that you can disconnect
- Receiver: After receiving the packet and verifying the ACK, the receiver can disconnect and reclaim the port. The receiver enters the TIMED_WAIT phase and waits for two MSL Max Segment LifeTime (MSL)
Note: Max Segment LifeTime The maximum LifeTime of packets is determined by the platform. For Windows, the maximum LifeTime is 120s. For Linux, the maximum LifeTime is 60s
There are two important stages in the four waves that need to be explained:
- CLOSE_WAIT: Why does the receiver need the CLOSE_WAIT phase instead of directly returning to the FIN, because the receiver may still have some requests that have not been returned, so it will take time to return them all and close the connection?
- TIME_WAIT: There are two reasons why the sender needs the TIME_WAIT phase:
- After an ACK, it may be due to reasons such as network jitter ACK packet is not normal arrived at the receiving end, the receiver will timeout after repeated a third handshake (send the sender FIN), so the sender receives the FIN for the first time can’t directly to close the connection, if direct close status is not normal, may lead to the receiving end cause waste of resources
- HTTP/2 May respond later than the FIN to the receiving end. Therefore, you need to wait for a period of time after sending an ACK to ensure normal data flows
Also may people will wonder why want to wait for two MSL time, that there is no standard answer, because this is just like a cache timeout configuration (two also can wait for a can wait, don’t affect the principle of wave TCP strategy), must be hard to answer, if it’s just strategy decided to wait for two MSL time.
Differences between TCP and UDP
The User Data Protocol (UDP) is a connectionless and unreliable User Data packet service Protocol. So the mechanical difference is:
- TCP is connected, but UDP is not
- UDP resources consume less
- TCP is reliable, but UDP is unreliable
- UDP packet is smaller
- UDP does not guarantee order
- UDP packet loss
Learn more about TCP and UDP
DNS
The Domain Name System (DNS) is a protocol at the application layer like HTTP. It provides the resolution service between Domain names and IP addresses.
The DNS service is needed because it is often hard to remember IP addresses made up of long numbers. It is much easier to remember domain names made up of letters (such as google.com). When we visit a domain name, the DNS service converts it into a destination IP address for communication.
The process by which a browser enters a domain name to return an IP address
- Parsing the URL
- DNS query. The query priorities are as follows (if the result is matched, the query is not continued) :
- Example Query the DNS cache of the browser
chrome://net-internals/#dns
- Example Query the local hosts file
/etc/hosts
- Example Query the local DNS server
- Example Query the upper-layer DNS server
- Example Query the DNS cache of the browser
HTTP
HTTP, like many other protocols in the TCP/IP protocol family, is used for communication between clients and servers. The end that requests access to resources such as text or images is called the client, and the end that provides resource response is called the server.
HTTP itself does not store state, which is a stateless protocol. While mechanisms such as cookies and sessions are often used to help the HTTP protocol stay in state, this is not part of the protocol itself, but a standalone solution.
A message form
The request message
An HTTP request message consists of the request method, request URI, protocol version, optional request header field, and content entity.
The response message
The HTTP response packet basically consists of the protocol version, status code, reason phrase used to explain the status code, optional response header field, and entity body.
keep-alive
In the original version of THE HTTP protocol, TCP connections had to be disconnected for every HTTP communication, which caused a lot of communication overhead when the request frequency was high. So HTTP/1.1 applies the HTTP Persistent Connections (also known as httpkeep-alive or HTTP Connection reuse) method to keep a TCP connection as long as either end does not explicitly ask to disconnect.
Status code
The status code is responsible for describing the returned request results when the client sends a request to the server. The status code lets the user know whether the server handled the request normally or if an error occurred.
- 2 xx success
- 3 xx redirection
- 301 Moved Permanently, Permanently redirects, indicating that the requested resource has been assigned a new URI and should be requested in the future
- 302 Not Found, temporary redirect: the resource has been assigned a new URI
- 303 See Other, a temporary redirection, but unlike 302, 303 expects the user to use GET when requesting a new URI
- 304 Not Modified: indicates that the resource is Not Modified, indicating that the cache of the client can be directly used
- 4XX Client error
- 400 Bad Request: syntax errors exist in the Request packet
- 401 Unauthorized: This status code indicates that the request to be sent requires HTTP authentication information (including BASIC authentication and DIGEST authentication). If the request has been authenticated once before, the user fails to be authenticated.
- 403 Forbidden: Access to requested resources is denied by the server
- 404 Not Found: The requested resource cannot be Found on the server
- 5XX Server error
- 500 Internal Server Error: An Error occurred when the Server executed the request
- 503 Service Unavailable: Indicates that the server is temporarily overloaded or undergoing maintenance outage and cannot process requests
HTTP/2
HTTP/2 has the following optimizations based on HTTP:
- Header compression, compression of HTTP header fields to reduce network resource usage
- Multiplexing to avoid queue head blocking (HTTP/1 transmission is based on serial request-reply mode, queue head request processing is too slow will block its subsequent requests, HTTP/2 messages are broken up into independent frames, sent interleaving, and then reassembled at the other end)
- Request priority to avoid blocking high-priority requests
- Server push, more flexible data transmission
To achieve these optimizations, HTTP/2 adds a binary framing layer to the original HTTP structure, which splits the data stream into smaller messages and frames and encodes them in binary format.
- Data stream: A bidirectional byte stream within an established connection that can carry one or more messages. A TCP connection can have multiple bidirectional data streams, each with a unique identifier and optional priority information
- Message: A request or response, and each message contains one or more frames
- Frame: The smallest unit of HTTP/2 communication. Each frame contains a frame header that at least identifies the data stream to which the current frame belongs
To learn more
HTTPS
The HTTP protocol has the following security risks:
- Communications use clear text and can be eavesdropped
- The identity of the communicating party is not verified, so it is possible to encounter camouflage
- The integrity of the message could not be proved, so it may have been tampered with
To prevent content from being eavesdropped, disguised, or tampered with, THE HTTP protocol takes on the guise of SSL, which is also known as HTTPS. Therefore, HTTPS is not a new protocol at the application Layer, but the HTTP communication interface is replaced by Secure Socket Layer (SSL) and Transport LayerSecurity (TLS). Originally, HTTP directly communicates with TCP. When SSL is used, it communicates with SSL first and then with SSL and TCP.
encryption
HTTPS encryption can be performed in either symmetric encryption or asymmetric encryption:
- Asymmetric encryption is used to ensure that both the client and server have the same public key
- Packets are encrypted using this public key
The detailed steps are as follows:
- The client requests HTTPS (port 443 of the server)
- The server sends information such as certificate authority, expiration time, asymmetric public key to the client (the server must have a digital certificate)
- The client receives the certificate information
- Verify certificates to the CA through certificate information
- Generating a symmetric public key
- Encrypting a symmetric public key with an asymmetric public key
- Sends the encrypted symmetric public key to the server
- After connecting to the encrypted symmetric public key, the server decrypts it using the asymmetric private key (the client and the server have the same symmetric public key).
- The actual request is then initiated. The client encrypts the request using the symmetric public key, and the server decrypts the request using the symmetric public key after receiving it
- The server returns the encrypted data using the symmetric public key, and the client decrypts the received data using the symmetric public key to obtain the real data