Hypertext Transfer Protocol (HTTP) is a Protocol used for communication on the World Wide Web.
First, the birth of HTTP protocol
Computer scientists came up with the idea of creating a hypertext-based project that would allow information to be shared between different computers. The goal was to make it easier for researchers to share and update information. This idea eventually became the basis for the World Wide Web (WWW), revolutionizing the way human society communicates.
2. HTTP basic information
Hypertext Transfer Protocol (HTTP) is an application layer protocol, is the core of the World Wide Web ecosystem, in the OSI seven-layer model at the top, it does not involve packet transmission, mainly defines the communication format between the client and server, the default port 80.
(1) Features
1. BS architecture
The HTTP protocol uses the BS architecture, that is, the browser-to-server architecture. The client sends HTTP requests to the server through the browser, and the server responds to the client’s requests after parsing.
2. Simple and quick
When a client requests a service from a server, it simply passes the request method and path. The commonly used request methods are GET, HEAD and POST. Each method specifies a different type of contact between the client and the server. Because HTTP protocol is simple, the HTTP server program size is small, so the communication speed is very fast.
3. The flexible
HTTP allows the transfer of data objects of any type. The Type being transferred is marked by content-Type, which is the identifier used in the HTTP package to indicate the Content Type.
4. No connection
Connectionless means to limit processing to one request per connection. The server disconnects from the customer after processing the request and receiving the reply from the customer. In this way, transmission time can be saved.
5. A stateless
HTTP is a stateless protocol. Stateless means that the protocol has no memory for transaction processing. The lack of state means that if the previous information is needed for subsequent processing, it must be retransmitted, which can result in an increase in the amount of data transferred per connection. On the other hand, the server responds faster when it doesn’t need the previous information.
(2) HTTP transactions
An HTTP transaction consists of a request command sent from the client to the server and a response result sent back from the server to the client. This communication takes place through formatted blocks of data called HTTP messages
The HTTP message
HTTP packets are composed of simple string lines. HTTP packets are plain text, not binary code, so people can easily read and write them
Development of HTTP
1. HTTP 0.9, the earliest version
The 1991 prototype version of HTTP was called HTTP/0.9. This protocol has a number of serious design flaws and should only be used to interact with older clients. HTTP/0.9 only supports GET methods. It does not support MIME types, HTTP headers, or version numbers for multimedia content. HTTP/0.9 was originally defined to fetch simple HTML objects (that could not respond to other formats), and was soon superseded by HTTP/1.0.
2. The HTTP 1.0
HTTP/1.0 release, added POST command and HEAD command, rich browser and server interaction means. This version of the HTTP protocol can send content in any format, including text, images, video, and files. In addition to the request method and support for sending files, HTTP/1.0 added format changes. In addition to the data section, each communication must include headers (HTTP headers) that describe some metadata. Status codes, multi-character set support, multi-part type, authorization, cache, content encoding, and more have also been added.
Disadvantages: Only one request can be sent each time a TCP connection is established. Once the data is sent, the connection is closed, and if additional resources are requested, a new connection must be created. If multiple requests are made, the server will suffer from high resource performance loss (TCP three-way handshake and four-way wave).
3. HTTP 1.1
The biggest change in HTTP/1.1 is the addition of persistent connections to the HTTP standard, meaning that TCP connections are not closed by default and can be reused by multiple requests. In addition, HTTP/1.1 has many new methods, such as PUT, PATCH, HEAD, OPTIONS, DELETE. The improved HTTP/1.1 version is still in use today.
4. https
The birth of the HTTP protocol is mainly in order to solve the problem of information transfer and sharing, and does not take into account the security problem, the HTTP protocol does not have any data encryption, identity verification, such as mechanism, using the HTTP protocol transfer data in clear text in the network transmission, arbitrary nodes of the third party can hijack flow, tamper with the data or steal information, Unable to ensure data confidentiality, integrity and authenticity, has been unable to meet the security requirements of modern Internet applications.
SSL/TLS Secure SocketsLayer (SSL) or Transport Layer Security (TLS) is a Layer Security protocol between TCP and HTTP. The data at the application layer is transferred to the SSL layer instead of the transport layer. The SSL layer encrypts the data received from the application layer and uses data encryption, identity authentication, and message integrity verification mechanisms to ensure the security of data transmission on the network.
5. HTTP2.0
Features: without changing HTTP semantics, methods, status codes, URIs and header fields, greatly improve web performance.
1. Binary transmission
(1. X is text transmission) Binary transmission, convenient and robust implementation
2. Multiplexing frames and streams
The so-called multiplexing, that is, there are multiple streams in a TCP connection, that is, multiple requests can be sent at the same time, and the peer end can know which request the frame belongs to through the representation in the frame.
3. The Header compression
In HTTP1.0, we transmitted the header as text, and carrying cookies in the header would require hundreds to thousands of bytes of repeated transmission at a time, which is quite an overhead. In HTTP2.0, we use the HPACK (HTTP2 header compression algorithm) compression format to encode the transmitted headers, reducing the header size. An index table is maintained at both ends to record the headers that have been recorded. The key names of recorded headers can be transmitted later in the transmission process. After receiving data, the peer end can find the corresponding value by the key names.
4. Server Push
In HTTP2.0, a server can actively push other resources after a request from a client. (Relatively less latency) Prefetch can also be used if the browser is compatible.
5. More safe
Using TLS extended ALPN as protocol upgrade. TTP2.0 further enhances the security of TLS
Thinking: How does HTTP enable 2.0? (ALPN)
6. Http3.0 (QUIC protocol)
Because of the HTTP2.0 concept of frames and streams, each TCP connection hosts multiple two-way streams, each with a unique identity and priority, and streams are made up of binary frames. The header of a binary frame identifies which stream it belongs to, so these frames can be interlaced and then assembled into complete data at the receiving end using the header information. This eliminates congestion and improves network speed utilization.
But HTTP2.0 is also based on TCP, which processes packets in a strict order. A packet error can be calculated and retransmitted. If two or more packets fail, all packets are retransmitted. When one of the packets has a problem, the TCP connection needs to wait for the entire packet to complete the retransmission before it can continue. Although HTTP2.0 allows multiple streams to logically transfer parallel content on a TCP connection, there is no associated data in the middle, one after another. Previous stream2 frames are not received, and subsequent stream1 frames are blocked as a result.
Therefore, The QUIC protocol proposed by Google switches from TCP to UDP, with the following characteristics:
1. The udp connections
A TCP connection is identified by a quad, which is the source IP address, source port, and destination port. Once an element changes, the TCP connection is disconnected and reconnected. TCP Three-way handshake to establish a connection causes a certain delay.
UDP is identified by a random 64-bit number instead of a quad. UDP is connectionless. Therefore, when the IP address or port number changes, you do not need to re-establish a connection as long as the ID remains the same
2. Customize the retransmission mechanism
QUIC has a sequence number, which is incremented, so packets with any sequence number are sent only once and then incremented by one. Since QUIC is connection-oriented, it is a data stream, just like TCP. The data sent in the data stream has an offset. The offset can be used to check where the data was sent. If it does, it will still be able to assemble a stream using offset.
3. Non-blocking multiplexing
As with HTTP2.0, multiple streams can be created on the same QUIC connection to send multiple HTTP requests. However, QUIC is udP-based and there are no dependencies between streams on a connection. Thus, if stream2 loses a UDP packet, stream3’s UDP packet can be sent to the user without waiting, although stream2’s UDP packet needs to be retransmitted.
4. Customize traffic control
TCP traffic control is through the sliding window protocol. QUIC’s flow control also uses WINDOW_update to tell the peer how many bytes it can accept. But QUIC’s Windows are adapted to its own multiplexing mechanism, controlling Windows not only on a connection, but on each Steam window in a connection. The ACK of QUIC is based on offset. Each packet with offset comes into the cache and can be answered. After the packet is answered, it will not be resent. Is the size of the real window
Is HTTPS secure?
The basic idea of THE SSL/TLS protocol is to use public-key cryptography, also known as asymmetric cryptography. The client first requests the Public key from the server, and then encrypts information with the Public key. After the server receives the ciphertext, Decrypt with your own private key.
But there are two issues involved:
- How do I ensure that the public key is not tampered with?
- Public key encryption and decryption is speed sensitive :(time consuming)
Solutions:
- A public key is placed in a digital certificate and is considered to be trusted as long as the certificate is
- For each session, the client and server generate a session key, which is used to encrypt information. Because the “conversation key” is symmetrically encrypted, the operation is very fast, whereas the server public key is only used to encrypt the “conversation key” itself, which reduces the time consumed in the encryption operation.
Therefore, the basic process of SSL/TLS protocol is as follows:
- The client requests and validates the public key from the server (certificate Authority)
- Both parties negotiate to generate “dialogue key”
- The two sides use a “conversation key” to encrypt communication
Question to consider:
1. Why is symmetric encryption used for data transmission?
2. Why do I need a CA to issue a certificate?
The HTTPS protocol mainly solves the security problem of network transmission. First, we assume that there is no certification authority and that anyone can create a certificate, which presents a security risk known as the classic “man-in-the-middle” problem.
Due to the lack of certificate verification, although the client initiates an HTTPS request, the client is completely unaware that its network has been intercepted and the transmitted content is stolen by a middleman.
3. How does the browser ensure the validity of the CA certificate?
(1) What information does the certificate contain?
The certificate contains the following information:
- Information of issuing Authority
- The public key
- Your company information
- The domain name
- The period of validity
- The fingerprint
- .
If a certificate is issued by an authority, we consider it legitimate.
(2) How does the browser verify the validity of the certificate?
- Verify that the domain name and validity period are correct. Certificates contain this information, making it easier to verify.
- Determine whether the certificate source is valid. You can search for the root certificate of each issued certificate based on the authentication chain. The operating system and browser store the local root certificate, and use the local root certificate to verify the source of the certificates issued by the corresponding authority.
- Determine whether the certificate has been tampered with. Verify with the CA server. (Check signature)
- Determine whether the certificate has been revoked. This is implemented through Certificate Revocation List (CRL) and Online Certificate Status Protocol (OCSP).
OCSP can be used in step 3 to reduce interaction with the CA server and improve verification efficiency. Each certificate has an issuer (the issuer of the root certificate is the root certificate Authority). The issuer can use its own public key to generate a unique signature based on a specific algorithm. The signature is part of the certificate. In other words, the issuer can obtain the certificate issued by himself and generate the signature according to the algorithm to verify whether the certificate is forged by a third party. Although the middleman can obtain the certificate, the private key cannot be obtained.