Introduction: I have been working on the front end for one year, during which I was busy and didn’t have time to sort out some knowledge system. This series of articles is a review and summary of the front end foundation.
First, pre-knowledge
As we know, HTTP is based on the TCP/IP protocol. TCP/IP is a subset of TCP/IP, which is the general name of the protocol family associated with Internet communication. Understanding the TCP/IP protocol family helps us better understand HTTP.
1. Hierarchical management
The TCP/IP protocol family is divided into:
- Application layer: The activities that communicate when providing application services to users.
- Transport layer: Provides data transfer between two computers in a network connection.
- Network layer: The network layer handles the packets that flow over the network.
- Data link layer: The part of the hardware that handles the connection to the network.
2. The TCP/IP transmission flow
Embezzled an image from Illustrated HTTP:
For example, a client initiates a canonical HTTP request at the application layer. The transport layer (TCP) then fragments the packet and sequences it to the network layer, which adds the MAC address of the target server, and finally the link layer sends the request to the target machine. The server receives data at the link layer and sends it sequentially to the upper layer, all the way to the application layer. HTTP requests sent by clients are received only when they are transmitted to the application layer.
3. The URL and URI
- URI: Uniform resource Identifier (URI)
- URL: Uniform resource locator
A URI is a string used to identify the name of an Internet resource, most commonly in the form of a uniform resource locator (URL), often specified as an informal web address. A rarer use is the Uniform resource Name (URN), which is intended to provide a way to do this. Used to identify resources in a particular namespace to complement urls. That is, URLS and UrNs are subsets of URIs, which are abstract concepts, and urls are a common concrete representation of URIs.
4. Concepts of proxy, gateway, and tunnel
A. the agent
Introduction to the
A proxy is a forwarding application that acts as a “middleman” between the server and the client, receiving requests sent by the client and forwarding them to the server, and receiving responses returned by the server and forwarding them to the client. Each time a request or response is forwarded through a proxy server, Via header information is appended.
Classification of agents:
- Transparent proxy: directly forwards the request without processing it, otherwise it is an opaque proxy
- Caching proxy: Caching the requested resource on the proxy server for the next request.
Method of agency:
- Forward proxy: In simple terms, forward proxy is transparent to the server and proxies requests from different users to a server without the server knowing which user the request comes from.
- Reverse proxy: A reverse proxy is transparent to users and forwards user requests to a server in the server cluster. Users do not know which server they access.
Benefits of using proxies:
- Use caching technology to reduce network bandwidth traffic.
- Access control within an organization for specific web sites, mainly for the purpose of obtaining access logs
B. the gateway
The working mechanism of a gateway is very similar to that of a proxy, but it can use non-HTTP protocols to communicate with the server and database. That is, after receiving HTTP requests from clients, the gateway can directly connect to the database and use SQL to query the database.
C. the tunnel
The purpose of the tunnel is to ensure that the client can communicate with the server securely and does not parse the HTTP request. That is, the request is forwarded to the subsequent server as is, and the tunnel ends when the communication ends. Tunnels are used for secure communication between two remote servers.
Second, HTTP protocol
What is HTTP?
1. Introduction
HTTP is an application-layer communication protocol that allows hypertext Markup Language (HTML) documents to be sent from a Web server to a client’s browser.
2. Packet structure
An HTTP request packet consists of a protocol line, an optional request header, and a request body
Request: The Request line contains the Request method, requested resource, and HTTP version. Request Header Contains the following information: Cache header, Client header, Cookie/Login header, Entity header, Miscellaneous header, Transport header Blank line Request Body Response Response
Response Line Includes: HTTP version number, status code, and message Response header includes: Cache header, Cookie/Login header, Entity header, Miscellaneous header, Transport header, Location header blank line Response body
3. Request process
A complete HTTP request is as follows:
- The user enters the URL in the browser
- Domain name resolution (DNS addressing)
- TCP three-way handshake
- After the handshake succeeds, a TCP channel is established and an HTTP request is sent
- The server responds to the HTTP request and returns the corresponding response packet
- The client starts parsing the render
4. A status code
- 1XX Message: The request is being processed
- 2XX success error code: The request has been accepted
- 3XX redirection: Additional operations are required to complete the request
- 4XX client error: The requested resource is incorrect or invalid, and the server cannot process the request
- 5XX server error: The server failed to process the request
Common status codes: 200 OK 302 Found 301 Move Permanently redirection Permanently 304 Not Modified Using cache 400 Bad Request Client Request and Syntax Errors 403 Forbidden The Server rejects services 404 Not Found The requested resource does Not exist 500 Internal Server Error An unexpected Error occurs on the Server 503 Server Unavailable The Server cannot process requests from clients and may recover after a period of time
5. Header field
Header fields are generally divided into 4 categories:
- Common fields (Connection, VIA, Cache-control, data, etc.)
- Request header fields (accept-charset, accept-encoding, if-Modified-since, Referer, user-agent, etc.)
- Response header fields (Age, ETag, Server, Location, etc.)
- Entity header fields (Allow, Content-Type, Expires, Last-Modified, etc.)
Characteristics of two.
HTTP was designed to be simple and flexible. Its main characteristics are simple, flexible, stateless, connectionless, support B/S mode.
1. Stateless HTTP.
A. introduction
- Stateless means that the protocol itself does not save the status. That is, the HTTP layer does not save the status of the request and response for each request, and the subsequent request cannot know the information about the previous request.
- Advantages: Because there is no need to save state, the resource consumption on CPU and memory is reduced, and the protocol can process a large number of transactions more quickly and is more scalable.
- The problem: With the rapid development of Web applications, the Web has become more and more complex, but this design has caused serious obstacles to some businesses, such as a shopping website, after logging in, every page needs to keep the current login state. Rather than requiring users to log in every time they initiate a request. Obviously, the stateless nature of HTTP does not allow for state retention. Therefore, to solve this problem, cookie and session technology came into being.
B. the Cookie technology
- Concept: Cookie technology controls client state by writing Cookie information in request and response packets.
- How it works: The Cookie tells the client to save the Cookie based on a header field called set-cookie in the response packet sent from the server. When the client sends a request to the server next time, the client automatically adds the Cookie value to the request packet and sends the request packet. After discovering the Cookie sent by the client, the server will check which client sent the connection request, and then compare the records on the server to obtain the previous status information.
- Disadvantages: Cookies are limited in length and quantity. Each domian cannot exceed 20 and each cannot exceed 4KB. From the point of view of security, cookies are easy to be stolen.
C. the Session
- Concept: A Session is a data structure stored on the server to track user status. This data can be stored in a cluster, database, or file.
- How it works: Session is stored by the server. After a login operation, the server writes the login account information and other information into a specific file as a session. Set-cookies are used in the packet return to set the sessionId for the client. Session information will be added to achieve the purpose of state preservation. So when the client disables cookies, Sessoin doesn’t work either.
D. Other authentication technologies – Tokens
- The general process of token authentication is as follows: The client and server agree on a specific encryption method. For example, a field in the cookie is encrypted and encoded into a string through a specific bit operation. Finally in the request as a parameter to the server, the server in the receipt of the request, then use the same encryption method, to obtain the encryption string, and the passed parameters for comparison. So as to achieve identity authentication relationship. Due to the relationship of homology, cookies cannot be stolen, and the encryption method is customized to ensure security.
2. Connectionless HTTP
A. introduction:
Connectionless indicates that the TCP connection is disconnected after each request. In the rapid development of Web technology today, each page needs to request more and more resources, and each request needs to re-establish the TCP connection, which obviously greatly increases the meaningless communication overhead. Thus, keep-alive was proposed to solve the problem that TCP could not be reused.
b. Keep-Alive:
A. introduction:
Keep-alive specifies the Connection: keep-alive in the protocol header, which indicates that the current Connection is persistent and the duration is controlled by the server. If no shutdown signal is received, the TCP Connection will not be disconnected. In this way, it avoids the useless time of repeatedly establishing TCP requests.
B. The problem remains unresolved:
Keep-alive helps a lot on the PC side, but on the APP side, requests are scattered and the time span is long. Therefore, it is unreasonable to set the keep-alive time to a large value. Therefore, other long-link schemes and pseudo long-link schemes are generally sought. More on this later.
3. Existing defects:
HTTP is an excellent protocol, but it still has some drawbacks:
- Data is transmitted in plaintext and is vulnerable to eavesdropping
- The identity of the communicating party is not verified, so it is possible to encounter camouflage
- The integrity of the message could not be proved, so it may have been tampered with
But the programmer’s wisdom is infinite. If it’s not secure, make it secure, and HTTPS was born.
Third, HTTPS
1. What is HTTPS?
HTTPS, a product of HTTP+ SSL +TLS, is used for communication encryption, authentication, and integrity protection. It is not a new protocol, but rather parts of HTTP are proxyed by SSL and TLS.
2. How does HTTPS ensure communication security (principle)?
Principle n.
Let’s take a look at two commonly used encryption methods:
- Symmetric encryption: the same key is used for both encryption and decryption.
- Asymmetric encryption: Encryption and decryption are two different keys.
HTTPS uses the hybrid encryption mechanism, that is, asymmetric encryption is used to exchange communication keys, and symmetric encryption is used to obtain the keys for subsequent communication. But how do you ensure that the public key obtained by the client in step 1 is correct? At this point, we need to use our digital certificate.
Is provided by the third party certification certificate, the server first to the third party to apply for public key, and then get the public key and the use of third-party digital signature, in the process of asymmetric encryption, digital signature and issue together contains the public key of the client, the client through the third party’s public key to verify the signature on the certificate, once passed, The public key is correct.
B. Request process
Let’s look at the process of establishing secure communication:
- The client sends a Client Hello packet carrying the SSL version supported by the client and encryption-related conventions (such as the key length and encryption algorithm).
- The server responds to the Server Hello packet, indicating that SSL communication is enabled and encryption conventions are carried.
- The server sends a certificate message carrying the public key.
- The server sends server Hello done to indicate that the initial SSL negotiation is complete.
- The Client sends a Client Key Exchange packet and carries the pre-master packet obtained after phase 1 negotiation
Secret Encryption random string (symmetric key).
- The client sends a Change Cipher Spec packet, indicating that subsequent requests use pre-master Secret for encrypted communication
- The client sent a Finished packet. Procedure The packet contains the overall checksum of all packets so far connected. Whether the handshake negotiation can succeed depends on whether the server can decrypt the packet correctly.
- The server sends a Change Cipher Spec packet, indicating that decryption succeeds. The negotiated key is used for secure communication.
- The server sent a Finished packet to complete secure communication.
- Application layer protocol communication, that is, sending HTTP responses.
- Finally, the client sends the close_notify packet to disconnect the connection.
3. Defects of HTTPS
- SSL encryption results in slower speeds and higher server load.
- The cost of applying for a certificate
HTTP2 protocol
1. Historical pain points
A. The two biggest problems of the HTTP1.0 era are:
- The connection cannot be reused: The TCP channel needs to be re-established for each request, and a three-way handshake is required.
- Queue head blocking: the request channel is like a single log bridge. Multiple requests are sent together. The second request can only be started after the first request is returned.
B. Solutions in those years:
1. Solve the problem of connection reuse:
Tcp-based long links:
Generally, APP will build a long connection communication protocol based on TCP, which has a high threshold, but once completed, it will bring a very large return. Information is pushed and updated in a timely manner, and at some request flashpoints, reduces the server’s stress compared to the time it takes to re-establish requests with traditional HTTP. Now the industry’s mature solutions such as: Google’s Protobuf.
http long-polling:
The long-polling request is to launch a polling request to the server when the client is initialized, and then wait for the server to return data when there is resource update. When the data is put back, launch a polling request again to continue listening. Of course, polling requests also have some drawbacks, such as long connection time will increase server stress, complex business scenarios need to consider how to establish a healthy request channel, etc. In addition, this approach has a fatal flaw: data communication is one-way, the initiative is on the side of the server, the client can only passively receive data according to the server, when there is a new business request cannot be timely transmitted.
http streaming:
Unlike HTTP-polling, http-Streaming initiates an uninterruptable request at the time of initialization, continuously listens for packets from the server, and returns data through this request channel when the server has data updates. Like HTTP-polling, this approach is one-way. Streaming tells the client that new data is coming by adding “Transfer Encoding: chunked” to the header of the server response. Streaming, of course, has its drawbacks: business data cannot be partitioned as requested, so clients must do their own protocol parsing, which means customizing their own protocols.
websocket:
WebSocket is similar to traditional TCP socket connections. It is also based on TCP and provides bidirectional data channels. WebSocket has the advantage of providing the concept of message, which is simpler to use than TCP sockets based on byte streams, while also providing the long-connection capability that traditional HTTP lacks. Websocket is typically used in scenarios where data needs to be updated in real time.
2. Solve queue head congestion:
HTTP pipelining
Pipelining requires the establishment of long connections. Multiple keep-alive requests use the same TCP connection to make the requests parallel. Pipelining can be compared to traditional requests in serial and parallel mode, where multiple requests are initiated at the same time without waiting for the last request to return. But it’s not the savior, and it has its flaws:
- Pipelining only works with HTTP1.1 and requires server-side support
- The problem of queue head blocking is not fundamentally solved, because the server responds to the first request on a first-in, first-out basis, and responds to the second request only after the first packet has been sent.
2. New changes (HTTP2)
The popularity of HTTP1.0 and 1.1 has made it necessary for HTTP2 to solve this problem without changing its original approach, which means that HTTP2 cannot be as free as Angular2. So, HTTP2 is used in the same way as the original, HTTP2 changes quite a lot, here mainly talk about the impact on us a large number of points:
A. the binary
The http2.0 protocol resolution decision uses binary format, implementation is convenient and robust. Each request has these common fields: Type, Length, Flags, Steam Identifier, and Frame Payload. Length defines the start and end of the frame, type defines the type of the frame, flags uses the bit bit to define important parameters, stream ID is used for flow control, The rest of the payload is the body of the request.
B. Multiplexing
Multiplexing is a major problem solved by HTTP2.0. Each request corresponds to a stream and is assigned an ID, so that there can be multiple streams on a connection, and the frames of each stream can be randomly jumbled together. Recipients can assign frames to different requests based on the stream ID.
C. Head compression
Stateless HTTP causes that each request needs to carry the parameters required by the server, and some header information is basically fixed. This part of repeated information can be used to compress and reduce the packet size.
D. Reset the connection
One drawback of HTTP 1.1 is that when an HTTP message with an exact content-Length value is sent, it is very difficult to interrupt it. Of course, you can usually break the entire TCP connection (but not always), but at the cost of re-establishing a new TCP connection through a three-way handshake. A better solution is to terminate only the currently transmitted message and resend a new one. In HTTP2, we can implement this requirement by sending RST_STREAM frames to avoid wasting bandwidth and breaking existing connections.
E. Dependencies and priorities
Each stream contains a priority, which is used to tell the peer which stream is more important. So as to realize the effective allocation of resources.
F. Server push
When a client requests resource X and the server knows that it probably also needs resource Z, the server can proactively push resource Z to the client before the client sends the request. This feature helps clients cache Z for future use.
G. Flow control
Each stream on HTTP2 has its own public traffic window that restricts the other end from sending data.
V. Summary:
There are only two versions of HTTP, but the changes in each version are significant. Made breakthroughs and attempts are also very much, of course, HTTP also has competitors, such as HTTP2 has not been proposed, by Google proposed and implemented SPDY protocol, its advantage is to solve the problem of HTTP1.0 can not be multiplexed, the speed of resource request has a great improvement, currently in the market there are still a lot of use, HTTP2 also borrows many of SPDY’s features. Another example is the QUIC protocol, which claims to be faster than HTTP2. This will be highlighted in the next article ~ ha! The main thing is this one’s a little long.
reference
www.zhihu.com/questio…
wiki.jikexueyuan.com/p…