preface

If you are a middle or senior front-end engineer, there is a high probability that HTTP questions will be asked in the interview, and we will encounter a lot of related questions in our project.

A Lazy Guide:

If you just want to get a quick answer to some of the most frequently asked questions below, skip the formal section (this is the HTTP overview section, which includes extensive reading of Illustrated HTTP and other web sources) and go straight to the summary section.

Question:

If an interviewer asked you the following questions, would you be able to answer them? How many? This is a common interview question, so give it a try

  • What is a three-way handshake?

  • Why is there a three-way handshake when you connect and a four-way handshake when you close?

  • The difference between TCP and UDP?

  • What happens from the time you enter the URL to the time the page loads?

  • What do you know about HTTP response codes? What do they mean?

  • HTTP protocol workflow?

  • What are the existing problems with HTTP/1.0 and 1.1

  • HTTP is different from HTTPS

  • What is a long link and why is it needed?

  • Why does HTTP/2 channel multiplexing improve performance?

  • HTTP caching mechanism

  • What are the defenses against XSS and Crsf attacks?

  • How to make efficient use of cache and live front-end code?

    1, the cache time is too long, released online, the client still uses the cache, there will be bugs

    2. The cache time is too short, and too many files are repeatedly loaded, which wastes bandwidth

In fact, some of the above questions, in three, four times asked front-end performance optimization (file acquisition optimization) when these questions.

A formal

In a Web application, the server sends the Web page to the browser, essentially sending the HTML code of the page to the browser for display. The transport protocol between the browser and the server is HTTP. HTTP is a protocol for transferring HTML over a network for communication between a browser and a server. The HTTP protocol belongs to the application layer and is based on the transport layer protocol TCP. The client establishes a TCP connection with the server and invokes TCP by accessing the Socket interface to send HTTP requests and receive HTTP responses. Because HTTP does not have a connection, there are only requests and responses, which are packets.

Network basics and the Web

The web is briefly

Web :(web browser);

Web pages do not appear out of thin air. According to the URL specified in the address bar of the Web browser, the Web browser obtains information such as resources from the Web server to display the Web page. The Web uses HTTP (Hypertext Transfer Protocol) protocol as a specification to complete a series of operations from the customer side to the server side. The Web is resume communication over HTTP.

2. An agreement to specify rules; In order for computers to communicate, they need to define communication rules, which are protocols; Data encapsulation format + transmission; There are many kinds of protocols;

Client: A Web browser that sends a request for a service resource can be called a client.

Basic NETWORK TCP/IP

To understand HTTP, we must first understand that the TCP/IP protocol family is the basis on which networks commonly used, including the Internet, operate. HTTP is a subset of that.

TCP/IP

The protocols associated with the Internet are collectively called TCP/IP.

TCP/IP refers to the TCP and IP protocols.

Also think: TCP/IP is IP protocol communication process, used to the protocol family collectively.

Why stratification?

  • Decompose a complex process into several sub-processes with relatively single functions
  • The whole process is clearer and complex problems are simplified
  • It is easier to identify problems and address them

OSI seven-layer network model

layered function role
The application layer An interface that a network serves an end user Provides interface services between the network and user applications, shielding the details of network transport
The presentation layer Data representation, security, compression Provides formatted presentation and transformation of data services such as encryption and compression
The session layer Establish, manage, and terminate sessions Provides mechanisms for establishing and maintaining communication between applications, including access authentication and session management
The transport layer Defines the protocol port number for transmitting data, as well as blank space and error-checking 1. Provide the function of establishing, maintaining and canceling the transmission connection, responsible for the reliable transmission of data (PC); 2. Provide users with reliable (end-to-end) services; 3. The transmission layer shields the details of the lower level data communication from the higher level
The network layer Logical address addressing to realize the path selection between different networks Handles routing between networks and ensures timely transmission of data (router). Packet is the smallest unit of data transmitted over the network. This layer defines the path (transmission path) through which the packets are sent to the other computer.
Data link layer Establish logical connection, hardware address addressing, error checking and other functions The part of the hardware used to handle the connected network. Responsible for error-free data transmission, frame confirmation, error retransmission, etc. (switch)
The physical layer Establish, maintain, and disconnect physical connections Define how physical devices transfer data; Provides mechanical, electrical, functional and process characteristics (network card, network cable, twisted pair, coaxial cable, repeater)

TCP/IP reference model

The network has a five-layer structure

  • TCP/IP is short for Transmission Control Protocol/Network Interconnection Protocol
  • The early TCP/IP model is a four-layer structure, from the bottom up are the network interface layer, the Internet layer, the transport layer and the application layer
  • Later, the network interface layer is divided into physical layer and data link layer by referring to the OSI seven-layer reference model, forming a five-layer structure
layered agreement
The application layer HTTP (Hypertext Transfer Protocol), FTP (file transfer Protocol), TFTP (SMTP), SNMP, DNS (Domain name System)…
The transport layer TCP (Transmission Control Protocol), UDP (User Datagram Protocol)…
The network layer ICMP (Internet Control Message Protocol), IGMP (Internet Group Management Protocol), IP (Internet protocol), which addresses and routes packets between host and network, Obtain MAC addresses of hardware hosts in the same physical network.
Data link layer A protocol defined by the underlying network
The physical layer A protocol defined by the underlying network

Packet encapsulation

How is upper-layer protocol data converted to lower-layer protocol data?

This is done through encapsulate. Application data is passed down the protocol stack before being sent to the physical network. Each layer protocol adds its own header information (link layer also adds tail information) to the data of the upper layer protocol to provide the necessary information for the implementation of the layer function. When the sender sends data, the data is transmitted from the upper layer to the lower layer, and each layer is typed with the header information of the layer. When the receiving end receives data, the data will be transferred from the lower layer to the upper layer, and the header information of the lower layer will be deleted before transmission.

Why encapsulate it this way?

Because the header information of the lower layer protocol has no actual use for the upper layer protocol, the header information of this layer is removed when the lower layer protocol transmits data to the upper layer protocol, and the encapsulation process is completely transparent to the upper layer protocol. The advantage of this is that the application layer only needs to care about the implementation of application services, not the implementation of the underlying layer.

Protocols closely related to HTTP: IP, TCP, and DNS

1, IP

By layer, IP addresses are located at the network layer. In the TCP/IP family, IP refers to the Internet protocol, not to be confused with IP addresses.

IP address: specifies the address to which a node is assigned.

MAC address: indicates the fixed address of the NIC.

An IP address can be paired with a MAC address. The IP address can be changed, but the MAC address is basically unchanged. The communication between IP addresses depends on MAC addresses.

When we are not in the same LAN, we use ARP (a protocol for resolving addresses) to find out the MAC address of the communication party based on its IP address.

2, TCP

A TCP connection needs to be created for the client and server to send information. TCP(Transimision Control Protocal) is Transimision Control Protocal (TCP) at the transport layer. To provide reliable byte stream service (to divide large chunks of data into packet segments for easy transmission for packet management). Reliable transmission service is the ability to transmit data accurately and reliably to each other.

TCP slow start

TCP connections “tune” themselves over time, limiting the maximum speed of the connection at first and increasing the speed of the transfer over time if the data is successfully transferred. This tuning is called TCP slow start.

2.1 TCP Functions:

  • Connection establishment: Mainly refers to the TCP/IP three-way handshake to establish a connection
  • Segmented data transmission: This is mainly determined by the MTU (maximum transmission unit) during network transmission. Data must be packaged into segments before transmission.
  • Control the sequence of each packet number: After the data is packed into segments, the sequence number is sorted to ensure data consistency
  • Lost, reissued and discarded in transit
  • Flow control: flow control through sliding window
  • Avoid congestion: The method to solve congestion control is to combine slow start and congestion avoidance algorithm to control congestion

2.2 TCP status

Common TCP states are as follows: CLOSED, LISTEN, SYN_SENT, SYN_RECV, ESTABLISHED, FIN_WAIT1, CLOSE_WAIT, FIN_WAIT2, LAST_ACK, TIME_WAIT, CLOSED. TCP uses the TCP status to mark the current communication phase.

2.3 TCP and UDP

  • TCP(Transimision Control Protocal)
    • Transmission control protocol
    • A reliable, connection-oriented protocol
    • Low transmission efficiency
  • UDP(User Datagram Protocal)
    • User datagram protocol
    • Unreliable, connectionless services
    • High transmission efficiency
  • The differences between TCP and UDP are as follows: TCP provides connection-oriented, byte stream, and reliable transmission.

2.4 UDP application

  • QQ
  • Video software
  • TFTP Simple File Transfer Protocol (SMS)

2.5. Three-way handshake, data transfer, four-way wave

To prevent the server from opening useless links.

2.5.1 Three handshakes

TCP uses the three-way handshake to deliver data to the target without error.

Note:

  • TCP is a connection-oriented protocol that establishes a virtual rather than physical connection between a source point and an endpoint
  • Before data communication, the sender and receiver must establish a connection, and then disconnect the connection after data transmission
  • Each side of a TCP connection consists of an IP address and a port
  • HTTP has no connection, only request and response, which are packets

The specific process is as follows:

** First handshake: ** The client sends a data segment with the SYN flag to the server. Through this data segment, the server tells the server that it wants to establish a connection, and the server replies with the start sequence number of transmission. Then the client enters the SYN_SEND state and waits for the server to confirm.

** Second handshake: ** After receiving a SYN segment from the client, the server needs to send an ACK message to confirm the SYN segment. You also send your own SYN request. The first is to send an ACK to inform the client that the data segment has been received, and the second is to inform the client from which sequence number to mark. The server will put the above information into a packet segment (SYN+ACK packet segment) and send the ACK equal to the value of SEQ +1 to the client. At this point, the server will enter the SYN_RECV state.

Third handshake: After receiving a SYN+ACK packet from the server, the client sends an ACK packet to the server. After the ACK packet is sent, the client and the server enter the ESTABLISHED state to complete the TCP three-way handshake.

After the three-way handshake is complete, the TCP protocol maintains the connection status for both parties. To ensure successful data transmission, the receiving end must send an ACK packet after receiving the data packet. If the sender does not receive the ACK packet from the receiver within the specified time (resend timeout), the sender resends the timed data.

2.5.2 Data transmission

  • The client first sends data to the server in a datagram of 159 bytes.
  • After receiving the packet, the server also sends a data acknowledgement (ACK) to the client and returns the data requested by the client. The length of the data is 111, the SEQ is set to 1, and the ACK is set to 160 (1 + 159).
  • The client acknowledges (ACK) upon receiving the data returned from the server, setting seQ to 160 and ACK to 112 (1 + 111).

2.5.3 Four disconnections

  • The client sends a REQUEST for disconnection from the FIN control bit
  • The server responds to acknowledge the disconnection request
  • The server requests a shutdown in the opposite direction
  • The client acknowledges the connection closure request received from host B

3, DNS * * * *

DNS is short for Domain Name Service. It is located at the application layer. The DNS server translates Domain names and corresponding IP addresses

Usually we visit a website using a host name or domain name. That’s because domain names are easier to remember than IP addresses (a set of pure numbers). But TCP/IP uses IP addresses for access, so there must be a mechanism or service to convert domain names to IP addresses. The DNS service is designed to solve this problem. It provides domain name to IP address resolution service. That is, the DNS provides the service of searching for an IP address by domain name or reverse-searching for a domain name from an IP address.

DNS domain name resolution process:

So when the user enters the URL in the browser’s address bar, press enter, we want to find the IP address of the URL, how do we find it?

As shown in the figure above, the browser first searches its cache to see if there are any cached records. The result does not find any cached records, so it turns to the system cache to find hosts files in the system. So, the computer will be sent to the local DNS domain name server (service provider) for local connection, local DNS server to send a domain name to other server can not find, recursive process, will first sent to the root name servers to find and return to the top-level domain name server IP address, request a top-level domain name server IP back to the secondary domain name server IP, The secondary DNS server IP address is returned to the tertiary DNS server IP address…… Until the corresponding IP address is found, return to the browser.

DNS load balancing:

DNS load balancing, also known as DNS redirection. Content Delivery Network (CDN) uses the DNS redirection technology. The DNS server returns the IP address of the closest point to the user. The CDN server responds to the user’s request and provides the required Content.

Does the DNS return the same IP address every time? If it’s the same every time, does that mean you’re requesting resources on the same machine? How much performance and storage does that machine need to handle billions of requests?

In fact, behind the real Internet world there are thousands of servers, large websites and even more. But from the user’s point of view, all it needs to do is process his request, and it doesn’t matter which machine does it. DNS can return the IP address of a suitable machine to the user, for example, according to the load of each machine, the distance between the machine and the user’s geographical location, etc. This process is DNS load balancing.

4. Relationship between TCP/IP family and HTTP protocols

Socket communication mechanism

Socket is a form of IPC for communication between the same host or different hosts. Socket communication is implemented in a domain, which is a method that identifies a socket (socket address format).

Socket is a set of TCP/UDP communication interface API, both TCP or UDP, through the programming of the scoket, can achieve TCP/UCP.

1, common domain:

  • Unix Domain: a way of communication between different processes of the same host based on socket mechanism; AF_UNIX, AF_LOCAL, address is a pathname (file)
  • IPv4 Domain: AF_INET, a mechanism for interprocess communication between different hosts (or the same host) based on IPv4 protocol based on socket mechanism. The address is a 32-bit ipv4 address and a 16-bit port number
  • IPv6 Domain: AF_INET6. The address is a 128-bit IPv6 address and a 16-bit port number

2, Socket type:

  • TCP: Stream socket, SOCK_STREAM provides reliable, bidirectional, byte – oriented stream
  • UDP: datagram socket, SOCK_DGRAM

3. Related system calls:

  • Socket (): Creates a new socket
  • Bind () : Binds to a set of word addresses and ports
  • Listen (): listens for sockets
  • Accept (): Receives the connection request
  • Connect (): initiates a connection request
  • Close (): closes the connection
  • Read (): Reads data from the socket into the buffer
  • Write (): Writes data from the cache to the socket

URL, URI

A URI identifies an Internet resource as a string, and a URL represents the resource’s location (its location on the Internet). A URL is a subset of a URI.

1, the URI

A Uniform Resource Identifier (URI) is a Uniform Resource Identifier (URI) that uniquely identifies a Resource according to a certain rule, such as a person’s ID number.

  • The Housekeeping department handles many different types of resources without having to recognize the specific access method of the resource based on context;
  • Resource: Anything that can be identified
  • Identifier: Indicates an identifiable object

2, URL

Uniform Resource Locator (URL) A Uniform Resource Locator (URL) that indicates the location of resources. The URL is a WEB page address that you need to enter when accessing a WEB page using a browser

  • Uniform does not recognize specific access methods for resources based on context
  • Resource anything that can be identified
  • The Location positioning

2.1 format of URL

  • Protocol type: Usedhttp.https.fileProtocol Scheme name Specifies the protocol type when obtaining access resources. You can also usedata:orjavascript:This class specifies the schema name of the data or script. Case insensitive with a colon at the end. Must be followed by: / /Join together.
  • Login information: Optional. Specify the user name and password as the necessary login information for the secondary server goods resource. Very unsafe, not recommended, not used often.
  • Server ADDRESS: server address
  • Server port number: server port number
  • Hierarchical file path: represents the request path and marks the location of the resource.
  • Query string: indicates the query parameter, iskey=valThis form is used between multiple key-value pairs&Separated.
  • Fragment identifier: Represents an anchor point within the resource located by the URI that the browser can jump to

The HTTP protocol

HTTP profile

  • HTTP stands for Hyper Text Transfer Protocol.
  • The HTTP protocol clearly distinguishes between the client and the server. The requesting party is called the client and the responding party is called the server.
  • Communication is achieved through the exchange of requests and responses
  • The HTTP protocol is used to transfer hypertext from WWW servers to local browsers. It can make browsers more efficient and reduce network traffic. It not only ensures that the computer transfers the hypertext document correctly and quickly, but also determines which parts of the document to transfer and which parts of the content to display first (e.g. text before graphics).
  • HTTP is an application layer of object-oriented protocol, is based on TCP/IP communication protocol to transfer data (HTML files, image files, query results, etc.).
  • HTTP is a stateless protocol.
  • The default HTTP port number is 80 and HTTPS port number is 443.
  • HTTP is usually carried on top of TCP, and sometimes on top of TLS or SSL, at which point it becomes known as HTTPS. The diagram below:

HTTP features

1. Connectionless: Limit processing to one request per link. The server disconnects from the customer after processing the request and receiving the reply from the customer. In this way, transmission time can be saved. The reason for doing this early on was to ask for fewer resources and pursue faster. Later, Connection: keep-alive was used to implement the long Connection.

Stateless: HTTP is a stateless protocol. Stateless means that the protocol has no memory for transaction processing. The lack of state means that if the previous information is needed for subsequent processing, it must be retransmitted, which is specifically designed to process a large number of transactions faster and ensure the scalability of the protocol. On the other hand, the server responds faster when it doesn’t need the previous information. The protocol does not persist requests sent or corresponding to them. HTTP/1.1, while stateless, adds cookie technology. With cookies and HTTP communication, you can manage state. We’ll talk about that later.

3, simple and fast: when customers request service to the server, only need to transmit the request method and path. The commonly used request methods are GET, HEAD and POST. Each method specifies a different type of contact between the client and the server. Because HTTP protocol is simple, the HTTP server program size is small, so the communication speed is very fast.

4. Flexibility: HTTP allows the transfer of data objects of any type. The Type being transferred is marked by content-Type. It is mainly reflected in two aspects: one is semantic freedom, which only provides basic formats, such as space to separate words and newline to separate fields, and there are no strict grammatical restrictions on other parts. Another is the variety of transmission forms, not only can transmit text, but also can transmit pictures, videos and other arbitrary data, very convenient.

6, support client/server mode

HTTP is media independent: this means that any type of data can be sent over HTTP as long as the client and server know how to handle the data content. The client and server specify the appropriate MIME-type content type to use.

Attention:!!

HTTP is a stateless connection oriented protocol. Stateless does not mean that HTTP cannot hold TCP connections. HTTP does not use UDP (no connection). The TCP connection between a client and a server that transmits HTTP data will not be closed. If the client visits a web page on the server again, it will continue to use the established connection. This time can be set in different server software, such as Apache

The HTTP version

Its development was the result of a collaboration between the World Wide Web Consortium and the Internet Engineering Task Force (IETF), which eventually published a series of RFCS, RFC 1945 defines HTTP/1.0. The most famous of these is RFC 2616. RFC 2616 defines a version that is commonly used today, HTTP /1.1.

HTTP / 0.9

Came out in 1990. It has not been established as a formal standard. The current VERSION of HTTP actually stands for HTTP/1.0, so it is called HTTP/0.9. There is only one command GET

The version is extremely simple, with only one commandGET.

GET /index.html
Copy the code

After the TCP connection is established, the client sends a request to the server for webpage index.html.

The protocol states that the server can only respond to strings in HTML format, not other formats, such as header information.

<html>
<body>Hello World</body>
</html>
Copy the code

The server closes the TCP connection after sending the packet.

HTTP / 1.0

In May 1996, version HTTP/1.0 was released with much more content. Added a lot of commands, added status, code and header, multi-character set support, multi-part sending, permissions, caching, etc. RFC 1945 describes the HTTP 1.0 specification.

What’s new in HTTP/1.0:

1, add methods: in addition to GET method, also introduced POST method and HEAD method;

2. Any format of content can be sent. This allows the Internet to transmit not only text, but also images, videos and binary files;

The format of HTTP requests and responses has also changed. In addition to the data section, each communication must include headers (HTTP headers) that describe some metadata.

4. Other new features include Status Code, multi-character set support, multi-part Type, authorization, Cache, and Content encoding.

HTTP / 1.0 faults:

Disadvantages: Only one request can be sent per TCP connection. Once the data is sent, the connection is closed, and if additional resources are requested, a new connection must be created.

Solution: Use Connection:keep-alive

HTTP / 1.1:

In January 1997, HTTP/1.1 was released, only half a year after version 1.0. It further refined the HTTP protocol, which is still in use 20 years later and is still the most popular version.

Support for persistent connections, added pipeline, added host, and other commands based on HTTP/1.0.

RFC 2616 describes the HTTP 1.1 specification.

HTTP / 1.1 new features

1. Persistent connection is introduced, that is, TCP connections are not closed by default and can be reused by multiple requests without declaring connection: keep-alive.

The client and server can close the connection if they find the other side inactive for a period of time. However, it is standard practice for the client to send Connection: close on its last request, explicitly asking the server to close the TCP Connection.

Connection: close
Copy the code

Currently, most browsers allow up to six persistent connections to the same domain name.

HTTP pipelining

Pipelining, which allows clients to send multiple requests simultaneously over the same TCP connection, was introduced. This further improves the efficiency of the HTTP protocol.

3. Unlike HTTP/1.0, there is a content-Length field;

Because a TCP connection can send multiple requests, multiple responses will now be sent, and there will have to be a mechanism to distinguish which response packets belong to. So this field is used to declare the length of the data for this response.

In HTTP/1.0, the browser found that the server had closed the TCP connection, indicating that all the packets had been received, so the Content-Length field was not required.

4. Compared to HTTP/1.0, 1.1 also added many verb methods: PUT, PATCH, HEAD, OPTIONS, DELETE.

In addition, the Host field has been added to the client request header to specify the domain name of the server.

HTTP / 1.1 faults:

1, “head-of-line blocking” : version 1.1 allows you to reuse TCP connections, but all data traffic on the same TCP connection is in order. The server does not process another response until it has processed one. If the response at the front is particularly slow, there will be a queue of requests waiting.

Solutions:

  • Allocate resources on the same page to different domain names to increase the connection upper limit. Chrome allows up to six TCP persistent connections to be established for the same domain name by default. If a persistent connection is used, only one TCP request can be processed in a pipe at a time. All other requests are blocked until the current request is completed. In addition, if 10 requests occur at the same time under the same domain name, four of them will be queued until the ongoing request completes.
  • Spriting is a technique for combining small images into one large image and then “slicing” the smaller images again using JavaScript or CSS.
  • Inlining is another technique for avoiding too many lining requests. It inserts the raw data of your images into urls inside your CSS file to reduce network lining.
  • Concatenation packages multiple small JavaScript files into a larger JavaScript file using tools such as Webpack, but changes in one file will result in a large amount of data being redownloaded from multiple files.

2. HTTP headers are huge

3, plaintext transmission – bring insecurity

4. The server cannot actively push

**SPDY **

As mentioned above, due to the shortcomings of HTTP/1.1, we can improve performance by combining scripts and style sheets, Sprite graphics, embedding images in CSS code, domain sharding, etc. These optimizations circumvented the protocol, however, until 2009, when Google unveiled its own SPDY protocol to address HTTP/1.1 inefficiencies. With SPDY, Google is officially reinventing HTTP itself. Reducing latency, compressing headers, and so on proved effective with SPDY, which eventually led to the birth of HTTP/2.

HTTP/2:

Brief introduction:

In 2015, HTTP/2 was released. It is not called HTTP/2.0 because the standards committee is not going to release any more sub-versions. The next new version will be HTTP/3. All data is transferred in binary, multiple requests within the same connection no longer need to be sent sequentially, header compression, and push (the server can initiate requests on its own) improve efficiency

  • HTTP/2 is a replacement for the current HTTP protocol (HTTP/1.x), but it is not a rewrite;
  • HTTP methods/status codes/semantics are the same as HTTP/1.x.
  • HTTP/2 is based on SPDY and is focused on performance, with one of the biggest goals being to use only one connection between the user and the site.
  • From the current situation, some of the top sites at home and abroad have basically implemented HTTP/2 deployment, using HTTP/2 can bring 20% to 60% efficiency improvement.

HTTP/2 consists of two specifications:

  1. Hypertext Transfer Protocol version 2 – RFC7540
  2. HPACK – Header Compression for HTTP/2 – RFC7541

New features in HTTP/2:

1. Binary protocol

HTTP/2 is spdy-based and is a binary protocol, while HTTP/1.x is a hypertext protocol. The HTTP/1.1 header is definitely text (ASCII encoded), and the data body can be either text or binary. HTTP/2 is a completely binary protocol. Headers and data bodies are binary and collectively referred to as “frames” : header and data frames.

HTTP/2 splits the request and response data into smaller frames, and they are binary encoded.

2. Multiplexing

HTTP/2 avoids “queue congestion” because HTTP/2 multiplexes TCP connections so that both the client and the browser can send multiple requests or responses at the same time in a single connection without having to follow the sequence. Two-way, real-time communication is called multitasking;

This link gives you a sense of how much faster HTTP/2 is than HTTP/1.

HTTP/2 introduces header compression

HTTP/2 allows a server to send unsolicited resources to a client. This is called server push.

The HTTP message

The information used for HTTP interaction is called HTTP packets. HTTP packets sent by the requesting end (client) are called request packets, and those sent by the responding end (server) are called response packets. Must contain an HTTP header

An HTTP packet consists of the header, blank line, and packet body. Usually, it is not necessary to have a message body.

Composition of request messages and response messages

As can be seen from the above figure, request message: request line, various header fields, blank line; Response message: status line, various header fields, message body;

There are generally four types of headers: general header, request header, response header and entity header.

Request message composition:

Request line: includes the method used for the request, request URI, and HTTP version. The following figure shows the composition of a request packet: POST method, request URI: /form/ Entry, and protocol version: HTTP/1.1

Request header fields: Various headers containing the various conditions and attributes of the request.

Response message:

Status line: contains the status code, reason phrase, and HTTP version indicating the response result;

Response header fields: various headers representing various conditions and attributes of the response;

Response body: The specific data, as shown in the HTML returned below

Ps: Precautions

1. In the start line (request line and status line), each two parts should be separated by a space, and the last part should be followed by a newline, strictly following the ABNF syntax specification;

2, header field:

  • Field names are case insensitive
  • The field name cannot contain Spaces or underscores_
  • The field name must be followed by:

3, a blank line

It’s important to distinguish the head from the entity.

Q: What if you deliberately put an empty row in the middle of the head?

Everything after a blank line is treated as an entity.

Use a browser to view messages on a web page

We use Network in Chrome’s developer tools to see the communication between the browser and the server.

Step1: enter www.baidu.com in the browser address bar, CTRL +f12 or choose “view”, “developer”, “developer tools” in the menu, you can display developer tools;

Step2: we click Network, make sure the first little red light is on (packet capture tool), Chrome will record all the communication between the browser and the server:

Step3: locate the first record in the Network and click, Request Headers will be displayed on the right, click view source on the right, we can see the Request sent by the browser to baidu server:

Step4: you can also see Response Headers. Click view source to display the original Response data returned by the server

Content-type indicates the Content of the response, in this case text/ HTML for the HTML web page. Browsers rely on the Content-Type to determine whether the Content in response is a web page or an image, video or music. Browser does not rely on the URL to determine the content of the response, so, even if the URL is http://example.com/abc.jpg, it is not necessarily a picture too.

Step5: click Response. The content of the Response body is the HTML source code.

When the browser read baidu home page HTML source code, it will parse HTML, display page, and then, according to the HTML inside a variety of links, then send HTTP request to Sina server, get the corresponding pictures, videos, Flash, JavaScript scripts, CSS and other resources, the final display of a complete page. So we see a lot of additional HTTP requests under the Network.

usingcurlTool View message

Step1: enter the following command in the command line tool

curl -v www.baidu.com
Copy the code

Of course, you can also use other capture tools to view, here is not a list.

HTTP request methods:

Note that method names are capitalized.

methods instructions Support version
GET Get resources. Requests the specified page information and returns the entity body. HTTP/1.0 and later are supported
POST Sending data to the server, transferring entity topics. The requesting server accepts the specified document as the new subordinate entity to the identified URI. HTTP/1.0 and later are supported
PUT Transfer files. HTTP/1.1 PUT has no authentication mechanism, so the average Web site doesn’t use this method either. HTTP/1.0 and later are supported
HEAD Only the header of the page is requested. HTTP/1.0 and later are supported
DELETE Asks the server to delete the specified page. The opposite of PUT. HTTP/1.1 DELETE also has no validation mechanism, so the average Web site would not use this method either. HTTP/1.0 and later are supported
OPTIONS Ask for supported methods. This method is used to query the methods supported for the resource specified by the request URI. (This may be used across domains, or for complex requests) HTTP/1.1 and above are supported
TRACE Track the path. This method lets the Web server return the previous request traffic to the client’s method. This method is not commonly used and is prone to XST (cross-site tracking) attacks. HTTP/1.1 and above are supported
CONNECT A tunnel protocol is required to connect the agent. SSL(Secure Sockets Layer) and TLS (Transport layer security) protocols are used to encrypt communication content and then transmit it through network tunnels. HTTP/1.1 and above are supported
LINK Request the server to establish a link relationship. HTTP/1.0 support,HTTP/1.1 deprecated
UNLINK Disconnect the link. HTTP/1.0 support, HTTP/1.1 deprecated

GET and POST

Then there are some specific differences:

  • fromThe cacheGET requests are actively cached by the browser, leaving a history, while POST does not by default.
  • From * *coding**, GET can only urL-encode, can only accept ASCII characters, and POST has no limit.
  • From * *parameter**, GET is generally placed at the back of the URL concatenation, so it is not secure, POST is placed in the request body, more suitable for transmitting sensitive information.
  • From * *security** POST is more secure than GET.
  • fromLength limitGET requests have a specific length limit, usually no more than 1024KB, while POST theoretically does not, but the browser itself has a limit.
  • From * *idempotenceThe point of view,GETisIdempotent, whilePOSTIt isn’t. (Power etc.Means that the same operation is performed and the result is the same.
  • From * *TCPGET and POST are both TCP connections, there is no real difference between them. However, due to the LIMITATIONS of HTTP/ browser, there are some differences in their application. GET generates one packet and POST generates two packets. For GET requests, the browser sends HTTP headers and data together, and the server responds with 200(return data). For POST, the browser sends the header, the server responds with 100 continue, the browser sends data, and the server responds200 ok (firefoxExcept for browsers, where the POST request only sends a TCP packet.)

HTTP status code and its meaning

The status code is responsible for describing the returned request results when the client sends a request to the server. The status code lets the user know if the server is processing the request normally or if an error has occurred.

The response code Meaning and Application
1 * * Represents a temporary response. Clients should be prepared to receive one or more 1XX responses before receiving regular responses.
100 The requester should continue to make the request. The server returns this code to indicate that it has received the first part of the request and is waiting for the rest.
101 Switch protocol. The requester has asked the server to switch protocols, and the server has confirmed and is ready to switch. Information returned by Upgrade for the request header. Indicates that the server is switching to the specified protocol. Such aswebsocket, upgrade tohttp2
2 * * The server successfully receives the client request
200 Success. The client request was successful. Typically, this means that the server has provided the requested web page.
201 Has been created. The request succeeds and the server creates a new resource. Usually used in POST or PUT requests to indicate that the request has been successful and that a new resource has been created. And returns the path in the response body.
202 Has been accepted. The request has been received, but there is no response, and an asynchronous request result will not be returned later. This status code is applicable to the scenario where other processes are waiting for processing or batch processing.
203 Non-authoritative information. The server has successfully processed the request, but the information returned may come from another source. It is used for mirroring and backing up other resources. Except for the previous situation, the first choice is 200.
204 No content. The server processed the request successfully and returned nothing, but the header information was useful. The user agent (browser) updates the cached header information. User agent: Software that runs on behalf of a user, such as a Web browser or mail reader.
205 Reset the content. Tells the user agent (browser) to reset the document that sent the request.
206 Part of the content. Indicates that a file has been partially downloaded. You can continue a corrupted download or split the download into multiple concurrent streams. The server successfully processed some of the GET requests. This status code is returned when the client uses the Range request header.curl -v --header "Range:bytes=0-3"Curl curl curl curl curl curl curlHTTP / 1.1 206 Partial Content
207 Multistate (WebDAV). This message should be preceded by an XML message that may contain several separate response codes, depending on how many sub-requests are made.
3 * * Redirection. For example, the browser might have to request a different page on the server, or repeat the request through a proxy server.
301 Permanently moved. This request and all subsequent requests should be forwarded to the specified URI. Some clients will change the request method to GET. Therefore, this status code is recommended for use in the GET and HEAD methods. The search engine updates the URL to the resource (in SEO ‘link-judge’ is sent to the new URL).
302 The object has moved. There may be new changes in the future. For form-based authentication, this message is typically expressed as “The object has moved.” The requested resource resides temporarily at a different URI. Since redirects may sometimes change, clients should continue to use requestURis for future requests. This response can only be cached if indicated in the CacheControl or Expires header field. Search engines do not change urls to resources.Application: Load balancing.
304 Unmodified. The document requested by the client is already in its cache and has not been modified since it was cached. The client uses a cached copy of the document instead of downloading the document from the server. If you want to use a 200 status code to achieve the same 304 effect, you need to enforce caching, requiring additional headers: cache-control, Expires, Vary
305 Use a proxy.
307 Temporary redirect. Basically the same as 302. The only difference is that this status code strictly forbids the browser from changing the original request mode and request body when a new URL requests the resource. For example, if you used POST, you’re going to use POST again. If you want to use the PUT method to modify a resource that does not exist on the server, use the 303 status code. If you want to change a POST method to GET, use 303.
308 Permanent redirect. Basically the same as 301. However, it is strictly forbidden to modify the request mode and request body.
4 * * Client error, domain name acceleration service has been stopped. For example, if a client requests a page that does not exist, the client does not provide valid authentication information.
400 The request syntax is wrong and the server cannot recognize it. For example, there is no host header field, or more than one host header field is set.
401 Access request validation failed. Lack of valid identity certificate, generally may not be logged in. Login usually solves the problem.
401.1 The login failed because the user name or password is invalid. Procedure
401.2 The login failed due to server configuration. Procedure
401.3 Not authorized due to ACL resource restrictions. Indicates that the NTFS permission problem exists. This error can occur even if you have appropriate permissions on the file you are trying to access. For example, if the IUSR account does not have access to the C:WinntSystem32Inetsrv directory, you will see this error.
401.4 Filter authorization failed.
401.5 ISAPI/CGI application authorization failed.
401.7 Access is denied by URL authentication policies on the Web server. This error code is specific to IIS 6.0.
402 Reserved for future use
403 The server refused to respond. Insufficient permissions.
404 The URL is invalid or the URL is valid but has no resources.
405 Method to disable. Request Method Is not allowed. However, the GET and HEAD modes are mandatory and cannot return this status code.
406 Don’t accept it. The resource type does not meet server requirements.
407 Agency authorization is required. Proxy authentication is required.
408 The request timed out. The server timed out while waiting for a request.
409 A server conflict occurred while completing a request. The server must include information about the conflict in the response.
410 Has been deleted. The server returns this response if the requested resource has been permanently deleted. 410 differs from 404 in that if a resource previously had a 410 code that is now permanently deleted, the site designer can specify a new location for the resource through the 301 code.
411 You need an effective length. The server will not accept requests that do not contain a content-Length header field with a valid Content Length.
412 Prerequisites are not met. A prerequisite error occurred when the client requested information.
413 Request entity is too large. The request was rejected because the requested entity was too large for the server to process. To prevent continuous requests from clients, the server may close the connection. If the server is temporarily unable to process it, a retry-after response is included.
414 The requested URI is too long. The requested URI (usually a web address) is too long for the server to process.
415 Unsupported media types. The server could not process the media format attached to the request
416 The scope requested by the client is invalid.
417 The server cannot satisfy Expect’s request headers.
5 * * Server error.
500 Server internal error. Not caught.
501 The server does not have the capability to complete the request. For example, the server may return this code if it does not recognize the request method.
502 Wrong gateway. The Web server, acting as a gateway or proxy server, received an invalid response from the upstream server. This type of error is generally related to the server itself (independent of the request) and load balancing.
503 The service is unavailable. Currently, the server cannot be used because the server is overloaded or has stopped maintenance. Usually, this is a temporary state. This is typically accompanied by a retry-after response header indicating the estimated time to restore the service.
504 The gateway timed out. As a gateway or proxy, the server cannot get the response from the upstream server and return it to the client in time.
505 The HTTP version is not supported. The sent request is not supported by the HTTP version server. This status code is returned if the request is sent over HTTP2 and the server does not support HTTP /2.

So many people may not remember, but the following common status code, is closely related to our front end, need to remember.

What do you know about HTTP response codes? What do they mean?

HTTP header field

HTTP header fields consist of header field name and header field value, separated by a colon:

There are generally four types of headers: general header, request header, response header and entity header.

Header field:

  • Field names are case insensitive

  • The field name cannot contain Spaces or underscores

  • The field name must be followed by:

  • Field values can have multiple values corresponding to a single HTTP header field

    Keep-Alive: timeout=15,max=100
    Copy the code

The following header fields are defined by the HTTP\1.1 specification

1. General header field

Header field name instructions
Cache-Control Controlling cache behavior
Connection Link management
Date Indicates the date and time when the HTTP packet is created
Pragma Packet instructions
Trailer The header of the packet tail
Trasfer-Encoding Specifies the transmission code of the packet body
Upgrade Upgrade to other protocols,

The first field Upgrade is used to check whether HTTP and other protocols can communicate with a higher version.

The parameter values can be used to specify a completely different communication protocol.
Via Proxy Server Information

The header field Via is used to track the transmission path of request and response messages between the client and the server.
Warning Error notification

2. Request header fields

Header field name instructions
Accept The type of media that the user agent can handle
Accept-Charset Preferred character set
Accept-Encoding Preferential encoding
Accept-Langulage Preferred language
Authorization Web Authentication Information
Expect Expect specific behavior from the server
From Email address of the user
Host The server where the resource is requested
If-Match Compare entity tags
If-Modified-Since Compares the update times of resources
If-None-Match Compare entity tags
If-Range Send scope requests for entity Byte when the resource is not updated
If-Unmodified-Since Compare resource update times (as opposed to if-modified-since)
Max-Forwards Maximum transmission hops
Proxy-Authorization Proxy servers require client authentication
Range Entity byte range request
Referer The original acquirer of the URI in the request
TE Priority of transmission encoding
User-Agent HTTP client program information

3. Response header fields

Header field name instructions
Accept-Ranges Whether to accept a byte range
Age The creation time of the resource
ETag Matching information of resources
Location The client redirects to the specified URI
Proxy-Authenticate The proxy server authenticates the client
Retry-After When to send the request again
Server Server information
vary Proxy server cache management information
www-Authenticate The server authenticates the client

4. Entity header field

Header field name instructions
Allow HTTP methods supported by the resource
Content-Encoding How an entity is encoded
Content-Language The natural language of entities
Content-Length Content size of the entity in bytes
Content-Location Replace the URI of the corresponding resource
Content-MD5 The packet digest of the entity
Content-Range The location range of the entity
Content-Type The media type of the entity body
Expires Entity expiration time
Last-Modified When the resource was last modified

5. Non-http /1.1 header fields

Cookie, SetCookie, contene-Disposition, etc

6. End-to-end and hop-by-hop headers

HTTP header fields are defined as the behavior of cached and uncached proxies, divided into two types.

End-to- End: The header in this category is forwarded to the final recipient of the request/response and must be stored in the response generated by the cache, specifying that it must be forwarded.

Hop-by-hop: The first item in this category is valid for a single forward and will not be forwarded because it passes the cache or proxy. In HTTP/1.1 and later versions, if a hop-by-hop header is to be used, the Connection header field must be provided.

The following are examples of hop – by – hop header fields in HTTP/1.1. Except for these eight header fields, all other fields belong to the end-to-end header.

  • Connection
  • Keep-Alive
  • Proxy-Authenticate
  • Proxy-Authorization
  • Trailer
  • TE
  • Transfer-Encoding
  • Upgrade

Common header attributes

field instructions The sample
Accept The type of response content that can be received Accept:text/plain (text type)
Accept-Charset Acceptable character set Accept-Charset: utf-8
Accept-Encoding The encoding of acceptable response content Accept-Encoding: gzip, deflate
Accept-Language List of acceptable response content languages Accept-Language: en-US
Accept-Datetime An acceptable time-dependent version of the response content Accept-Datetime: Sat, 26 Dec 2015 17:30:00 GMT
Authorization Authentication information about resources to be authenticated in HTTP Authorization: Basic OSdjJGRpbjpvcGVuIANlc2SdDE==
Cache-Control Whether caching is used in the request/reply Cache-Control: no-cache
Connection The type of connection the client wants to use preferentially Connection: keep-alive Connection: Upgrade
Content-Length The length of the request body in base 8 Content-Length: 348
Content-Type The MIME type of the request body Content-Type: application/x-www-form-urlencoded
Date The date and time the message was sent Date: Dec, 26 Dec 2015 17:30:00 GMT
Expect Indicates that the client is asking the server to perform a specific behavior Expect: 100-continue
From The email address of the user who initiated this request From: [email protected]
Host The server domain name and port number. The default port number can be omitted Host: www.a.com:80 or www.a.com
If-Match Mainly used for PUT, entity matching can only be operated If-Match: “9jd00cdj34pss9ejqiw39d82f20d0ikd”
If-Modified-Since Resource not modified returns 304 not modified If-Modified-Since: Dec, 26 Dec 2015 17:30:00 GMT
User-Agent The browser identity string User-Agent: Mozilla/
Upgrade Requires that the server be upgraded to a higher version protocol Upgrade: HTTP/2.0, SHTTP/1.3, IRC/6.9, RTA/ X11
Via Tell the server which agent made the request Via: 1.0 fred, 1.1 a.com.com (Apache/1.1)
Referer Indicates a jump to the previous page Referer: a.com/nodejs
Origin Initiate a request for cross-domain resource sharing Origin: www.a.com

Connection

The Connection header field does two things.

1. Control header fields that are no longer forwarded to agents

2. Manage persistent connections

Connection: close
Copy the code

The default connection for HTTP/1.1 is persistent. To do this, the client sends requests continuously over a persistent connection. When the server wants to explicitly disconnect, specify the value of the Connection header field as close.

Pragma

Pragma is a legacy field from prior HTTP/1.1 and is defined only as backward compatibility with HTTP/1.0. The specification defines a unique form, as shown below.

Pragma: no-cache
Copy the code

This header field is a generic header field, but is used only in requests sent by the client. The client will require all intermediate servers not to return cached resources.

If all intermediate servers can use HTTP/1.1 as a benchmark, then cache-control: no-cache is ideal. However, it is not practical to know the HTTP protocol version used by all intermediate servers. Therefore, the request will be sent with the following two header fields.

Cache-Control: no-cache
Pragma: no-cache
Copy the code

Cache-Control

You can manipulate how the Cache works by specifying an instruction for the header field cache-control.

Syntax format:

Instructions are optional arguments, separated by a ‘,’.

Cache-Control:private,max-age=0,no-cache
Copy the code

Cache request instruction

instruction parameter instructions
no-cache There is no Force revalidation to the source server
no-store There is no No content of the request or response is cached
Max-age = [seconds] necessary The maximum Age value of the response
Max-stale (= [SEC]) Can be omitted Receive an expired response
Min-fresh = [seconds] Must be The expected response within the specified time is still valid
no-transform There is no Agents cannot change media types
cache-extension New Instruction Token (token)

Cache response instruction

instruction parameter instructions
public There is no Caching of responses can be provided to any party
private Can be omitted Returns a response only to a specific user
no-cache Can be omitted The cache must be validated before being cached
no-store There is no No content of the request or response is cached
no-transform There is no Agents cannot change media types
must-revalidate There is no Cacheable but must be validated with the source server
proxy-revalidate There is no The intermediate cache server is required to validate the cached response
Max-age = [seconds] Must be The maximum Age value of the response
S-maxage = [seconds] Must be The maximum Age value of the public cache server response,max-ageIt looks similar, but the difference is that S-maxage is the cache time for the proxy server
cache-extension New Instruction Token (token)

Content-Type

A content-Type is a Type of web page that defines the Type of web file and the encoding of the web page, and determines the format and encoding in which the browser will read the file.

The Content-type header tells the client the Content Type of the Content actually returned.

In terms of character encoding, version 1.0 stipulated that the header information must be ASCII and the following data can be in any format. So when the server responds, it has to tell the client what format the data is in, and that’s what the Content-Type field is for.

Syntax format:

Content-Type: text/html; charset=utf-8
Content-Type: multipart/form-data; boundary=something
Copy the code

Common media format types are as follows:

Media format type instructions
text/html HTML format
text/plain Plain text format
text/xml XML format
image/gif GIF image format
image/jpeg JPG image format
image/png PNG image format

Media format types beginning with Application:

The media format type that begins with Application instructions
application/xhtml+xml XHTML
application/xml XML data format
application/atom+xml Atom XML aggregation format
application/json JSON data format
application/pdf PDF format
application/msword Word Document Format
application/x-www-form-urlencoded The most common way to submit data is by POST. EncType, the browser’s native form form. If the encType property is not set, it will eventually be usedapplication/x-www-form-urlencodedMethod to submit data. The form data is encoded as a key/value format and sent to the server (the default form submission format).
application/octet-stream Binary streaming data (such as common file downloads)

Another common media format is used when uploading files:

Media format type instructions
multipart/form-data This format is used when you need to upload files in a form

Cookie

Cookies, which manage the state between server and client, are widely used in Web sites, although they are not included in RFC2616, which standardizes HTTP/1.1. The working mechanism of Cookie is user identification and state management. In order to manage users’ status, Web sites temporarily write some data to users’ computers through Web browsers. Then when the user visits the Web site, the Cookie issued before can be retrieved through communication.

Is the header field of the Cookie service

Header field name instructions The first type
Set-Cookie Cookie information used to start state management Response header field
Cookie Cookie information received by the server Request header field

Cookie processing process:

1. When the client accesses the server for the first time, the server sends cookies to the client through the response header, and the properties are separated by semicolons and Spaces

2. The client saves the Cookie locally after receiving it

3. The Cookie will be sent to the server when the client requests the server in the future

Set-Cookie

Syntax format:

Set-Cookie: status=enable; expires=Tue, 05 Jul 2011 07:26:31 GMT; path=/; domain=.a.com;
Copy the code

When the server is ready to start managing the state of the client, various information is given in advance. The table below lists the field values for set-cookie.

Property of the set-cookie field

attribute instructions
NAME=VALUE The name and value assigned to the Cookie (required)
expires=DATE Cookie validity period (defaults to before browser closure if not explicitly specified)
Max-age = [seconds] The number of seconds after the Cookie expires (if not explicitly specified, the default is before the browser closes)
path=PATH Use the file directory on the server as the appropriate object for cookies (default to the file directory where the document resides if not specified)
Domain = domain name Domain name used as the Cookie object (default to the domain name of the server that created the Cookie if not specified)
Secure Cookies are sent only for SECURE HTTPS communication
HttpOnly Cookies cannot be accessed by JavaScript scripts to prevent XSS attacks

Expires attribute

The Expires attribute of a Cookie specifies the expiration date that the browser can send the Cookie. When the Expires attribute is omitted, it is only valid for the duration of the browser Session. This is usually limited to until the browser application is closed. In addition, once a Cookie is sent from the server side to the client, there is no way to explicitly delete the Cookie on the server side. However, the substantial deletion of client cookies can be achieved by overwriting expired cookies.

Cookie

Cookie: status=enable
Copy the code

The header field Cookie informs the server that the client will include the Cookie received from the server in the request when it wants HTTP state management support. When multiple cookies are received, they can also be sent in the form of multiple cookies.

Precautions for using cookies:

  • It may be tampered by clients. Verify the validity before using it
  • Don’t store sensitive data, such as user passwords and account balances
  • Use httpOnly for security
  • Minimize the size of cookies
  • Set the domain and path correctly to reduce data transfer

session

Session is another mechanism for recording the client’s state, except that cookies are stored in the client browser, while sessions are stored on the server

When the client browser accesses the server, the server records the client information on the server in a form called a session. The client browser only needs to look up the client’s status from the Session when revisiting

Cookies are different from sessions

  1. Cookie data is stored on the client’s browser and session data is stored on the server.
  2. Cookies are not very secure, so someone can analyze cookies that are stored locally and do cookie spoofing and you should use sessions for security purposes
  3. Sessions are stored on the server for a certain amount of time. Cookies should be used to reduce server performance when the number of accesses increases
  4. A single cookie can hold no more than 4K of data, and many browsers limit the number of cookies a site can hold to 20

Important information such as login information can be stored as session, and other information can be stored in cookies if necessary

The cache

Getting content over the web is slow and expensive. Large responses require multiple round trips between the client and server,

This delays the browser’s time to get and process the content, and can add to your visitor’s traffic bill. Therefore, cache and reuse the previous acquisition

Resource capability becomes a key aspect of performance optimization.

Cache role

  • Reduced redundancyThe data transfer, save the network fee.
  • Reduces the server load and greatly improves the siteperformance
  • Accelerated client loading of web pagesspeed

Classification of cache

Caching in a broad sense can be divided into the following four categories:

    1. Http Cache
    1. Service Worker Cache:

      The Service Worker borrows from the idea of the Web Worker, that is, JS runs outside the main thread and cannot access the DOM directly because it is out of the browser’s window. That said, it still helps us with a lot of useful features, such as offline caching, push notifications, and network proxies. The offline Cache is the Service Worker Cache.

    1. Memory Cache:

      Memory caching, which is the fastest in terms of efficiency. But it is the shortest in terms of lifetime, and when the rendering process is finished, the memory cache is gone.

    1. Push Cache :(server Push in HTTP/2)

HTTP cache

HTTP caching has a variety of rules, according to whether to re-send a request to the server, can be divided into mandatory cache, comparison cache.

  • Force caching does not need to interact with the server if it is in effect, whereas contrast caching does need to interact with the server whether it is in effect or not
  • Two types of cache rules can exist at the same time, and the force cache has a higher priority than the comparison cache. That is, when the force cache rule is executed, if the cache takes effect, the cache is directly used and the comparison cache rule is not executed

1. Strong cache

There are two types of caching in the browser, one is to send HTTP requests, the other is not to send HTTP requests.

The first step is to check for strong caching, which does not require sending an HTTP request.

So how does the browser check?

We know that when the browser requests data from the server without caching, the server returns the data along with the cache rule information contained in the response header.

Note:

In HTTP/1.0 and HTTP/1.1, this field is different. In the early days, HTTP/1.0, Expires was used, whereas HTTP/1.1 used cache-control. When both Expires and cache-control are present, cache-control takes precedence.

Expires

Expires: A response header returned by the server that tells the browser to retrieve data from the cache before the expiration date without having to request it again.

Like this:

Expires: Wed, 22 Apr 2020 08:41:00 GMT
Copy the code

Indicates that the resource will expire at 08:41 on April 22, 2020. If the resource expires, a request must be sent to the server.

This seems fine and reasonable, but there is a potential pitfall: the server’s time and the browser’s time may not be the same, and the expiration date returned by the server may be inaccurate. This approach was quickly abandoned in later HTTP/1.1 versions.

Cache-Control

Cache-control: In HTTP/1.1, request/response headers, Cache Control fields, precise Control of Cache policies.

The difference between Expires and Expires is that it doesn’t use a specific expiration date. Instead, it uses an expiration date to control the cache, and the corresponding field is max-age. Take this example:

Cache-Control:max-age=3600
Copy the code

This means that the cache is available within 3600 seconds, or one hour, after the response is returned. It can actually combine a large number of instructions and perform more scenarios of cache judgment, listing some key attributes as follows:

Public: Both client and proxy servers can cache. Because a request may pass through different proxy servers before reaching the target server, the result is that not only the browser can cache the data, but any proxy node in between can cache the data.

Private: In this case, only the browser can cache, the proxy server in the middle can not cache.

No-cache: skips the current strong cache and sends HTTP requests, that is, directly enters the negotiation cache phase.

No-store: very rude and does not cache in any form.

S-maxage: This is similar to max-age, but the difference is that s-maxage is the cache time for the proxy server.

Must -revalidate: Specifies that the cache will expire, and this field must be returned to the source server for validation once the cache expires.

Code test:

server.js

const http = require('http')
const fs = require('fs')

http.createServer(function (request, response) {
  console.log('request come', request.url)

  if (request.url === '/') {
    const html = fs.readFileSync('test.html'.'utf8')
    response.writeHead(200, {
      'Content-Type': 'text/html'
    })
    response.end(html)
  }

  if (request.url === '/script.js') {
    response.writeHead(200, {
      'Content-Type': 'text/javascript'.'Cache-Control': 'max-age=20'
    })
    response.end('console.log("script loaded")')
  }
}).listen(8888)

console.log('server listening on 8888')
Copy the code

cache.html

<! DOCTYPE html> <html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="Width = device - width, initial - scale = 1.0">
  <meta http-equiv="X-UA-Compatible" content="ie=edge"> <title> Test cache </title> </head> <body> </body> <script SRC ="/script.js"></script>
</html>
Copy the code

To execute server.js, run the node server.js command and open http://localhost:8888 in the browser

After 20 seconds beyond the time we set max-age=20, the page will look like when we first opened it.

2. Negotiate cache

When the resource cache times out, that is, the strong cache is invalidated, the negotiated cache is entered.

  • Comparison caches, as the name suggests, require a comparison to determine whether caches can be used.
  • The first time the browser requests data, the server returns the cache id along with the data to the client, which backs up both to the cache database.
  • When requesting data again, the client sends the backup cache ID to the server. The server checks the backup cache ID. After the check succeeds, the server returns the 304 status code to inform the client that the backup data is available.

After strong cache invalidation, the browser sends a request to the server with the corresponding cache tag in the request header. The server decides whether to use the cache based on this tag. There are two types of cache tags: Last-Modified and ETag.

Last-Modified

Last-modified: Response header, that is, the Last modification time. After the browser first sends a request to the server, the server adds this field to the response header.

If the browser receives it and requests it again, it carries the if-Modified-since field in the request header, which is the value of the last modification sent from the server.

If-modified-since: Request header, the last time the resource was Modified, told to the server by the browser.

If the server gets the if-modified-since field in the request header, it will be compared to the last modification time of the resource on the server:

  • If the value in the request header is less than the last modification time, it is time to update. Return the new resource, just like the normal HTTP request response flow.
  • Otherwise, 304 is returned, telling the browser to use cache directly.

ETag

ETag: Response header, resource identifier, told by the server to the browser.

ETag is a unique identifier generated by the server based on the contents of the current file. This value changes whenever the contents of the file are changed. The server gives this value to the browser via the response header.

The browser receives the value of ETag and sends it to the server as if-none-match in the request header on the next request.

If-none-match: request header, cache resource identifier, told by the browser to the server.

If the server receives if-none-match, it compares it with the ETag of the resource on the server:

  • If they are different, it’s time for an update. Return the new resource, just like the normal HTTP request response flow.
  • Otherwise returns304Tell the browser to use the cache directly.

Both comparisons

  1. inprecisionOn,ETagBetter thanLast-Modified. ETag is superior to ETag, which identifies resources according to the content, so that changes of resources can be accurately sensed. Last-modified, on the other hand, does not accurately perceive resource changes in some special cases. There are two main cases:
    • The resource file was edited but the file contents did not change, which also invalidates the cache.
    • Last-modified is the perceived unit of time in seconds. If a file changes multiple times within a second, then last-Modified does not reflect the change.
  2. In terms of performance,Last-ModifiedBetter thanETagAnd it’s easy to understand,Last-ModifiedIt’s just a point in time, andEtagHash values need to be generated based on the contents of the file.

In addition, if both approaches are supported, the server will prioritize ETag.

Last-ModifiedThere is a problem

  1. Some servers can’t get files exactlyLast Modified time, so there is no way to determine whether the file is updated by the last modification time.
  2. Some files are modified very frequently, making changes in seconds or less. The last-modified onlyAccurate to seconds.
  3. The last modification time of some files has changed, butThe content hasn't changed. We don’t want the client to think the file has changed.
  4. If the same file is located in more than oneCDNAlthough the content on the server is the same, the modification time is different.

The test code

We add negotiation cache:

server.js

const http = require('http')
const fs = require('fs')

http.createServer(function (request, response) {
  console.log('request come', request.url)

  if (request.url === '/') {
    const html = fs.readFileSync('test.html'.'utf8')
    response.writeHead(200, {
      'Content-Type': 'text/html'
    })
    response.end(html)
  }

  if (request.url === '/script.js') {
    response.writeHead(200, {
      'Content-Type': 'text/javascript'.'Cache-Control': 'max-age=20000000,no-cache'.'Last-Modified':'123'.'Etag':'666'
    })
    response.end('console.log("script loaded")')
  }
}).listen(8888)

console.log('server listening on 8888')
Copy the code

Restart the node server.js service and open the browser http://localhost:8888/;

We found that even though max-age=’20000000′, when the page is refreshed, the request is still sent, because the no-cache is also set, which will enter the negotiation cache. We set Etag and last-Modified, and when the page is refreshed for the second time, If-none-match and if-modified-since values corresponding to Etag and last-modified values were added to the request header.

Changes to pages caused by caching such as static resources do not change the issue

In HTTP /1.x, the browser does not actively know whether a static resource on the server has changed or not, and Expires or cache-control can only Control whether the Cache has expired, but the browser does not know whether a resource on the server has changed until the Cache Expires. Only when the cache expires will the browser send a request to the server. So at this time (cache does not expire time) we live changed static resources, browser still access the cache old resources, how to resolve?

The solution is to name the static resource file differently each time it goes online.

My general approach is:

Step1: We don’t cache HTML files and request the server every time we access HTML. So the browser gets the latest HTML resources every time. Step2: Add the original file name to the version number of each package, or time stamp, fingerprint (do not create a new file, < script SRC =”/script.js? _h=1.6wee1″ >), add hash (), this advantage is that you can know which static resources when packaged, which resources changed this time, if there is an error, need to backtrace the version, you can also quickly backtrace.

Simple case part code:

The first time: script.js that we use in HTML

<script src="http://www.localhost:8888/script.js?version=1.0.1"></script>
Copy the code

Ps: Download the script. Js file of version 1.0.1. If the browser accesses the HTML again and finds the same version of script.js as version 1.0.1, the local cache is used.

One day we need to change script.js, and our HTML file changes accordingly:

<script src="http://www.localhost:8888/script.js?version=1.0.2"></script>
Copy the code

Ps: By setting the HTML not cache, the HTML reference resource content changes to change the path of resources, so as to solve the problem of not knowing the update of resources in time.

It still needs to be optimized. Why? For example, the page references three CSS, A.css, B.CSS, and C.CSS. If all links are updated, it will cause the cache of B.csss and C.CSss to become invalid, which will not be wasted again. ! So what to do? Someone thought of linking file name and URL, using data summarization algorithm to extract summary information of file, summary information and file content one by one correspondence, there is a cache control basis that can be accurate to the granularity of a single file. But this is also not possible in large company projects, not optimal. In order to further improve the website performance, static resources and dynamic web pages will be clustered deployment, static resources will be deployed to the CDN node, such as Seven Cows or their own company’s CND, at this time is not everyone a question mark? Static resources and HTML in a different server, that the first online which?? So hard!! The project with small traffic volume can make the research and development students helpless and painful, wait until the middle of the night secretly online, first on the static resources, and then deploy the page, it seems that the problem is less.

Static resource optimization in large companies basically achieves the following things:

  • Configure extremely long local caching – saves bandwidth and improves performance
  • Use content summaries as the basis for cache updates – precise cache control
  • Static resource CDN deployment — Optimizing network requests
  • More resource publishing path for non-overwriting publishing – smooth upgrade

In addition, with Webpack packaging, with the help of plug-ins can be very easy to handle, using hash.

Cache judgment order

  • Cache-control = 200 from Cache; cache-control = 200 from Cache
  • If there is no cache-control Expires and Expires is within the range, return 200 from Cache.
  • Cache-control =no-cache or Expires Expires, the browser sends a request to the server.
  • 4. If ETag and last-Modified are both consistent, return 304. If any of them are inconsistent, return 200.

HTTP data negotiation (Content negotiation)

In HTTP, content negotiation is a mechanism that allows the user agent to choose the best match (for example, the natural language of the document, the format of the image, the format of the file, JSON, forms, or the encoding of the content) by providing different representations of the resource pointed to by the same URL.

When a resource is accessed, the selection of specific presentation forms is determined through content negotiation mechanisms, and there are multiple negotiation methods between client and server.

Request statement Accept:

Accept-encoding: indicates how the data is encoded. Accept-language: determines the Language of the returned message. User-agent: indicates some information about the browserCopy the code

Corresponding to this is the server Content:

Content-type: corresponding to Accept, Accept can Accept a variety of data formats, the content-type will select one of the data formats to return, and declare the returned data format at the time of return content-encoding: Content-language: Does the request return the corresponding Language? Content-language: Does the request return the corresponding LanguageCopy the code

Code test:

server.js

const http = require('http')
const fs = require('fs')
const zlib = require('zlib'// Import packet http.createserver (function (request, response) {
  console.log('request come', request.url)

  const html = fs.readFileSync('test.html'Response.writehead (200, writeHead {'Content-Type': 'text/html', / /'X-Content-Options': 'nosniff'
    'Content-Encoding': 'gzip'}) response.end(zlib.gzipsync (HTML)) // zip}).listen(8888) console.log('server listening on 8888')
Copy the code

test.html

<! DOCTYPE html> <html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="Width = device - width, initial - scale = 1.0">
  <meta http-equiv="X-UA-Compatible" content="ie=edge">
  <title>Document</title>
</head>
<body>
  <form action="/form" id="form" enctype="application/x-www-form-urlencoded">
    <input type="text" name="name">
    <input type="password" name="password">
    <input type="submit">
  </form>
</body>
</html>
Copy the code

To execute server.js, run the node server.js command and open http://localhost:8888 in the browser

Change the form submission mode in test.html to POST

  <form action="/form" method="POST" id="form" enctype="application/x-www-form-urlencoded">
Copy the code

The server will convert the body data according to the Content-type is Application/X-www-form-urlencoded

After refreshing the browser, fill in the form content and submit

Increase file transmission. When uploading files through forms, part of the files must be split separately. If files cannot be transmitted as strings, they must be transmitted as binary data. multipart/form-data

Test. HTML changes:

<! DOCTYPE html> <html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="Width = device - width, initial - scale = 1.0">
  <meta http-equiv="X-UA-Compatible" content="ie=edge">
  <title>Document</title>
</head>
<body>
  <form action="/form" method="POST" id="form" enctype="multipart/form-data">
    <input type="text" name="name">
    <input type="password" name="password">
    <input type="file" name="file">
    <input type="submit">
  </form>
  <script>
    var form = document.getElementById('form')
    form.addEventListener('submit'.function (e) {
      e.preventDefault()
      var formData = new FormData(form)
      fetch('/form', {
        method: 'POST',
        body: formData
      })
    })
  </script>
</body>
</html>
Copy the code

Our form format is encType =”multipart/form-data”

Content-type in the packet capture request header in the browser

Content-Type: multipart/form-data; boundary=----WebKitFormBoundarydHvHzymplSP4CAMk
Copy the code

The request body is:

------WebKitFormBoundarydHvHzymplSP4CAMk
Content-Disposition: form-data; name="name"

111
------WebKitFormBoundarydHvHzymplSP4CAMk
Content-Disposition: form-data; name="password"

222
------WebKitFormBoundarydHvHzymplSP4CAMk
Content-Disposition: form-data; name="file"; filename="16ba0ae535359e94.jpg"
Content-Type: image/jpeg


------WebKitFormBoundarydHvHzymplSP4CAMk--
Copy the code

A boundary = WebKitFormBoundarybwAbNlPF2bBcTLuA used to various parts of the form is submitted data divided

The server takes the form data and splits the data according to the split string.

The compression

Zlib module can be used to compress and decompress files, which can reduce the volume, speed up transmission and save bandwidth

Accept-encoding :gzip // Enable gzipCopy the code

The idea behind Gzip compression is to find recurring strings in a text file and temporarily replace them to make the entire text smaller. According to this principle, the higher the rate of code repetition in a file, the more efficient the compression and the greater the benefits of using Gzip. And vice versa.

Basically, Gzip is all server work, like Nginx

Compression object

The compression and decompression object is a readable and writable stream

  • Zlib. createGzip: Returns a Gzip stream object that uses the Gzip algorithm to compress data
  • Zlib. createGunzip: Returns the Gzip stream object and decompresses the compressed data using the Gzip algorithm
  • Zlib. createDeflate: Returns the Deflate stream object, using the Deflate algorithm to compress the data
  • Zlib.createinflate: Returns the Deflate stream object and uncompresses the data using the Deflate algorithm

Code examples:

var zlib = require('zlib');
var fs = require('fs');
var http = require('http');

var request = http.get({
    host: 'localhost'.path: '/index.html'.port: 9090.headers: {
        'accept-encoding': 'gzip,deflate'
    }
})

request.on('response'.function (response) {
    var output = fs.createWriteStream('test.txt');
    switch (response.headers['content-encoding']) {
        case 'gzip':
            response.pipe(zlib.createGunzip()).pipe(output);
            break;
        case 'deflate':
            response.pipe(zlib.createInflate()).pipe(output);
            break;
        default:
            response.pipe(output);
            break; }}); request.end();Copy the code

Href vs. SRC

Href (Hypertext Reference) specifies the location of a network resource to define a link or relationship between the current element or document and the desired anchor or resource defined by the current attribute. (The purpose is not to reference the resource, but to establish a connection so that the current tag can link to the target address.)

SRC source (abbreviated), refers to the location of an external resource that will be applied to the current tag location in the document.

Href vs. SRC

1. Different types of requested resources: href points to the location of the network resource and establishes the association with the current element (anchor) or the current document (link). When requesting SRC resources, it will download the resources it points to and apply them to documents, such as JavaScript scripts and IMG images.

Href is used to establish a link between the current document and the referenced resource. SRC replaces the current content;

3, browsers parse differently: When browsers parse to SRC, they ** pause the downloading and processing of other resources until the resource is loaded, compiled, and executed. The same applies to images and frames, similar to applying the pointed resource to the current content. This is why it is recommended to put JS scripts at the bottom rather than in the header.

HTTP Redirect

Redirect, and add Location to the Response Header, responseCode can be 3xx

Principle:

In the HTTP protocol, redirects are triggered by the servers by sending special responses (redirects). The status code of the HTTP redirection response is 3XX. When the browser receives a redirect response, it takes the new URL provided by the response and loads it immediately. Most of the time, the redirection operation is invisible to the user, except for a small performance penalty.

The different types of redirection mappings can be divided into three categories:

  • Permanent redirection: 301, 308. This indicates that the original URL should no longer be used and the new URL should be preferred. When the search engine robot encounters the status code, it triggers an update operation to modify the URL associated with the resource in its index library. Mostly used for site reconstruction.
  • Temporary redirects: 302, 303, 307. Sometimes the requested resource cannot be accessed from its standard address, but can be accessed from another location. Temporary redirection can be used in this case. The search engine does not log this new, temporary link. Temporary redirects can also be used to display temporary progress pages when creating, updating, or deleting resources.
  • Special redirection: 300, 304, 304 (Not Modified) redirects the page to the old local cache version (which has expired). ), and 300 (Multiple Choice) is a manual redirection: the body of a message rendered as a Web page in the browser contains a list of possible redirection links from which the user can choose.

Priority:

  1. The HTTP protocol’s redirection mechanism is always the first to trigger, even when no pages are delivered — and therefore no pages are read.
  2. The HTML redirection mechanism (‘ ‘) is triggered if the HTTP protocol redirection mechanism is not set.
  3. JavaScript’s redirection mechanism is always used as a last resort and only works if the client has JavaScript turned on.

Whenever possible, use the HTTP protocol for redirection rather than tags. If a developer changes the HTTP redirection map and forgets to change the HTML page redirection map, the two can become inconsistent, resulting in either an infinite loop or some other nightmare.

Setting method:

1. HTML redirection: This mechanism invalidates the browser’s back button: you can return to the page with the header, but jump to it immediately.

<head> 
  <meta http-equiv="refresh" content="0; URL=http://www.a.com/" />
</head>
Copy the code

2. JavaScript redirects

window.location = "http://www.a.com/";
Copy the code

3. Respond in the server

Let’s say we make a request on the customer service

const http = require('http')

http.createServer(function (request, response) {
  console.log('request come', request.url)

  if (request.url === '/') {
    response.writeHead(302, {  
      'Location': '/new' 
    })
    response.end()
  }
  if (request.url === '/new') {
    response.writeHead(200, {
      'Content-Type': 'text/html',
    })
    response.end('<div>this is content</div>')
  }
}).listen(8888)

console.log('server listening on 8888')

Copy the code

Ps: Be careful with permanent redirection. Once it is used, the server changes the routing Settings and the user will keep redirecting until the browser cache is cleared.

HTTP workflow

General communication process: First, the client sends a request to the server. After receiving the request, the server generates a response and returns it to the client.

An HTTP operation is called a transaction, and it works in four steps:

1) First, the client and server need to establish a connection. Just click on a hyperlink and the HTTP work begins.

2) After the connection is established, the client sends a request to the server in the format of a uniform resource Identifier (URL), protocol version number, followed by MIME information including request modifiers, client information, and possible content.

3) After receiving the request, the server will give the corresponding response information in the form of a status line, including the protocol version number of the information, a successful or error code, followed by MIME information including server information, entity information and possible content.

4) The client receives the information returned by the server and displays it on the user’s display screen through the browser. Then the client disconnects from the server.

If an error occurs in any of the preceding steps, the error message is returned to the client with display output. For the user, this process is done by HTTP itself. The user just clicks the mouse and waits for the information to appear.

HTTPS

Hypertext Transfer Protocol Over Secure Socket Layer (HTTPS) is a Secure HTTP channel. In short, IT is the Secure version of HTTP. That is, add the SSL/TLS layer to HTTP. The port number used is 443.

HTTPS = HTTP+TLS/SSL
Copy the code

Advantages of HTTPS communication:

  • 1) The key generated by the client can only be obtained by the client and the server;
  • 2) Only the client and server can get plaintext for encrypted data;
  • 3) The communication between the client and the server is secure.

The main differences between HTTPS and HTTP are as follows:

  1. For HTTPS, you need to apply for a certificate from a CERTIFICATE authority (CA). Generally, a few free certificates need to be paid.
  2. HTTP runs on TOP of TCP, and all transmitted content is in plain text. HTTPS runs on top of SSL/TLS, and SSL/TLS runs on top of TCP, and all transmitted content is encrypted.
  3. HTTP and HTTPS use completely different connections and use different ports, the former 80 and the latter 443.
  4. HTTP connections are simple and stateless; HTTPS is a network protocol that uses HTTP+SSL to encrypt transmission and authenticate identities. It effectively prevents hijackings by carriers and solves a major problem in preventing hijackings. It is more secure than HTTP.

The HTTPS protocol basically relies on TLS/SSL, and TLS/SSL relies on three basic algorithms:

  • Hash functions Hash functions verify the integrity of information

  • Symmetric encryption: The symmetric encryption algorithm uses a negotiated key to encrypt data. There is only one key, and the encryption and decryption methods are the same password with high encryption and decryption speed. Typical symmetric encryption algorithms include DES, AES, RC5, and 3DES.

    The main problem with symmetric encryption is the shared secret key. Unless your computer (client) knows the private key of another computer (server), it cannot encrypt and decrypt the communication flow. The solution to this problem is asymmetric secret keys.

  • Asymmetric encryption: Asymmetric encryption implements identity authentication and key negotiation, using two secret keys: public key and private key. The private key is kept by one party’s password (usually by the server) and the public key can be obtained by anyone on the other party.

    This type of key occurs in pairs (and the private key cannot be deduced from the public key, nor the public key from the private key). Encryption and decryption use different keys (public key encryption requires private key decryption, private key encryption requires public key decryption). Symmetric encryption is slow.

Code examples:

1, symmetric encryption:

const crypto = require('crypto');
function encrypt(data, key, iv) {
    let decipher = crypto.createCipheriv('aes-128-cbc', key, iv);
    decipher.update(data);
    return decipher.final('hex');
}

function decrypt(data, key, iv) {
    let decipher = crypto.createDecipheriv('aes-128-cbc', key, iv);
    decipher.update(data, 'hex');
    return decipher.final('utf8');
}

let key = '1234567890123456';
let iv = '1234567890123456';
let data = "hello";
let encrypted = encrypt(data, key, iv);
console.log("After data encryption :", encrypted);
let decrypted = decrypt(encrypted, key, iv);
console.log("After data decryption :", decrypted);
Copy the code

2, asymmetric encryption:

let { generateKeyPairSync, privateEncrypt, publicDecrypt } = require('crypto');
let rsa = generateKeyPairSync('rsa', {
    modulusLength: 1024.publicKeyEncoding: {
        type: 'spki'.format: 'pem'
    },
    privateKeyEncoding: {
        type: 'pkcs8'.format: 'pem'.cipher: 'aes-256-cbc'.passphrase: 'server_passphrase'}});let message = 'hello';
let enc_by_prv = privateEncrypt({
    key: rsa.privateKey, passphrase: 'server_passphrase'
}, Buffer.from(message, 'utf8'));
console.log('encrypted by private key: ' + enc_by_prv.toString('hex'));


let dec_by_pub = publicDecrypt(rsa.publicKey, enc_by_prv);
console.log('decrypted by public key: ' + dec_by_pub.toString('utf8'));
Copy the code

Using MD5 encryption:

var crypto = require('crypto');
var content = '123456';
var result = crypto.createHash('md5').update(content).digest("hex")
console.log(result);// 32-bit hex = 128-bit binary
Copy the code

Sha256 encryption:

const salt = '123456';
const sha256 = str= > crypto.createHmac('sha256', salt)
    .update(str, 'utf8')
    .digest('hex')

let ret = sha256(content);
console.log(ret);// 64-bit hexadecimal = 256 binary
Copy the code

The HTTPS process is as follows:

1) After the SSL client establishes a connection with the server through TCP (port 443), it requests the certificate during the general TCP connection negotiation (handshake).

The client sends a message to the server that contains a list of algorithms it can implement and other required messages. The SSL server responds with a packet that identifies the algorithm needed for the communication, and the server returns the certificate to the client. The certificate contains server information: the domain name. The company applying for the certificate, public key).

2) After receiving the certificate returned by the server, the Client determines the public issuing authority that issued the certificate and uses the public key of this institution to confirm whether the signature is valid. The Client also ensures that the domain name listed in the certificate is the domain name it is connecting to.

3) If the certificate is confirmed to be valid, a symmetric key is generated and encrypted using the server’s public key. It is then sent to the server, which decrypts it using its private key so that the two computers can begin communicating with symmetric encryption.

ps:

SSL: Secure sockets Layer (SSL), a secure transport protocol designed by Netscape primarily for use on the Web. This protocol is widely used on the WEB. Certificate authentication is used to ensure that the communication data between the client and the web server is encrypted and secure.

Transport Layer Security (TLS).

Cross domain

When a resource requests a resource from a different domain, protocol, or port than the server on which the resource itself resides, the resource makes a cross-domain HTTP request. Cross-domain resource sharing (CORS) is a mechanism that uses additional HTTP headers to tell browsers to allow Web applications running on one Origin (domain) to access specified resources from different source servers.

You can see: juejin.cn/post/684490…

HTTP CSP (Content Security Policy)

Content Security Policy (CSP) is an additional layer of Security to help detect and mitigate certain types of attacks, including cross-site scripting (XSS) and data injection attacks. These attacks can be used for anything from data theft to website destruction or distribution as versions of malware.

Syntax format:

Content-Security-Policy: default-src 'self'; img-src 'self' data:; media-src mediastream:
Copy the code

Supported policy directives:

1, the default – the SRC

The default-src directive defines the (default) security policies that are not specified by more precise directives. This directive contains the following directives:

  • child-src
  • connect-src
  • font-src
  • img-src:
  • media-src
  • object-src
  • script-src
  • style-src

Content source:

There are three types of content sources: source lists, keywords, and data

Key words:

‘None’ represents an empty set; That is, no URL is matched. Double quotation marks are required.

‘self’ means the same origin as the document, including the same URL protocol and port number. Double quotation marks are required.

The ‘unsafe-inline’ allows the use of inline resources such as inline

‘unsafe-eval’ allows methods such as eval() to create code from strings. Double quotation marks are required.

The code example

Webmasters expect all content to come from the site itself (excluding subdomains).

Content-Security-Policy: default-src 'self'
Copy the code

The webmaster wants to allow content from the trusted domain and all of its subdomains (it does not have to be the same as the domain set by the CSP).

Content-Security-Policy: default-src 'self' *.a.com
Copy the code

Webmasters want to allow users of Web applications to include images from any source in their content, but limit audio or video media to trusted providers, and limit all scripts to a specific server hosting trusted code.

Content-Security-Policy: default-src 'self'; img-src *; media-src media1.com media2.com; script-src userscripts.example.com
Copy the code

Here, by default, only content from a document source is allowed, except when:

  • Images can be loaded from anywhere (note the “*” wildcard).

  • Media is only allowed from Media1.com and media2.com (not from the subdomains of those sites).

  • Executable scripts are only allowed from userscripts.example.com.

For more information: developer.mozilla.org/zh-CN/docs/…

Web attack

The best defense against these hijackings is from the back end, where there is very little to do. And because the source code is exposed, it’s easy for attackers to bypass our defenses. However, this does not mean that it is meaningless for us to understand the relevant knowledge of this area. Many methods in this paper are also useful in other aspects.

Why is HTTP not secure? What are the disadvantages?

Communications use clear text (not encryption) and the content can be eavesdropped

The communicator is not authenticated, so it is possible to encounter camouflage

The integrity of the message could not be proved, so it may have been tampered with

1. It may be bugged

  • HTTP itself does not have the function of encryption, so it cannot encrypt the entire communication (the content of requests and responses communicated using HTTP). HTTP packets are sent in plaintext.
  • Because the Internet is made up of network facilities that connect all parts of the world, all data sent and received through some device can be intercepted or accessed. (For example, Wireshark, a well-known packet capture tool, is used to capture packets.) Even if the packets are encrypted, the packets can be detected. However, it is difficult or impossible to crack the packets

2. Certification issues

  • Unable to confirm that the server you are sending to is the real target server (probably disguised)
  • It is not possible to determine whether the client returned is the one received with true intent (possibly a disguised client)
  • It is impossible to determine whether the other party is communicating has access permission. Some important information on the Web server will be accepted even if it is meaningless request only to be sent to a specific user. Denial of Service (DoS) attacks on massive requests cannot be prevented.

3. It may be tampered with

A man-in-the-middle attack (MITM) in which an attacker intercepts and modifies a request or response in transit.

HTTPS solves these three problems

HTTPS is based on the HTTP protocol and uses SSL or TLS (known as SSL3.0) to encrypt data, verify the identity of the peer, and protect data integrity. Features are as follows:

  • Content encryption: the use of mixed encryption technology, the middle can not directly view the plaintext content
  • Authentication: Authenticates the client to access its own server through a certificate
  • Protect data integrity: Prevent transmitted content from being impersonated or tampered with by middlemen

HTTP Content Lenth limitation vulnerability causes denial of service attacks

When using the POST method, you can set ContentLenth to define the length of data to be sent. For example, ContentLenth:999999999. Memory will not be released until the transfer is complete. This method of attack leaves little or no trace.

Some ideas of denial-of-service attack using the characteristics of HTTP protocol

The server is busy processing forged TCP connection requests from the attacker and does not pay attention to normal requests from the client (after all, the rate of normal requests from the client is very small). In this case, the server fails to respond from the perspective of normal clients. This is called a SYNFlood attack (SYN flood attack) on the server.

Smurf and TearDrop use ICMP packets to attack Flood and IP fragment attacks. This paper uses the “normal connection” method to generate denial of service attacks.

Port 19 was already used for Chargen attacks in the early days, Chargen_Denial_of_Service, but! The method they used was to create a UDP connection between the two Chargen servers and make the server process so much information that it went DOWN. Then, there must be two conditions for killing a WEB server: 1. The Chargen service exists. 2. The HTTP service exists

Method: An attacker sends a forged source IP to N Chargen servers and sends a connection request (Connect). Chargen receives the connection and returns a character stream of 72 bytes per second (which is actually faster depending on network conditions) to the server.

HTTP header injection

Replaces line breaks in HTTP header values.

CSRF attacks

CSRF is called cross-site request forgery. If http://a.com has a GET interface, the id parameter is the following person ID, as follows:

http://a.com?id=12
Copy the code

Then I just need to write an IMG tag in one of my pages:

<img src="http://a.com?id=12" />
Copy the code

So as long as the user who has logged in http://a.com website opens my page, it will automatically follow me. Even if it is a POST request, it can be automatically noticed by submitting a form on the page. The CSRF attack is an implicit authentication mechanism from the Web! The Web’s authentication mechanism guarantees that a request is from a user’s browser, but it does not guarantee that the request was approved by the user. CSRF attacks are generally solved by the server. You can observe the following rules to defend against CSRF attacks:

  1. Get requests are not used to modify data
  2. A Cookie is setHTTP Only
  3. Interzone Settings are prohibited
  4. The request is accompanied by authentication information, such as a captcha or Token

CSRF attack defense

1. Verification code. Verification code is regarded as the most concise and effective defense against CSRF attacks.

2. Referer Check. According to the HTTP protocol, there is a field in the HTTP header called Referer, which records the source address of the HTTP request. With the Referer Check, you can Check whether the request is from a legitimate “source.” Referer Check is not only used to protect against CSRF attacks, but also for “preventing image theft”.

We can create a whitelist to record our requested url. When the browser sends a request to a url whose referer is not in the whitelist, it is considered to have received CSRF attack and is an illegal request, and the request should be rejected.

Server code:

if(req.headers.referer ! = ='http://www.c.com:8002/') {
    res.write('CSRF attacks');
    return;
}
Copy the code

3. Add token authentication

The key to defending against CSRF is to put information in the request that an attacker cannot forge and that does not exist in a Cookie. A randomly generated token can be added to the HTTP request as a parameter, and an interceptor can be established on the server side to verify the token. If there is no token in the request or the token content is incorrect, the request may be rejected as a CSRF attack.

XSS attacks

XSS Cross-site scripting refers to the use of vulnerabilities to inject malicious code into Web pages. The injected code is executed when users browse the page to achieve the specific purpose of the attack.

XSS attacks can be divided into three categories: reflective (non-persistent), storage (persistent), and DOM based.

Common injection methods

<a href="javascript:alert(1)"></a>
Copy the code
<iframe src="javascript:alert(1)" />
Copy the code
<img src='x' onerror="alert(1)" />
Copy the code
<video src='x' onerror="alert(1)" ></video>
Copy the code
<div onclick="alert(1)" onmouseover="alert(2)" ><div>
Copy the code

XSS attack defense

Mainstream browsers now have built-in safeguards against XSS, such as CSP. But it is also important for developers to find reliable solutions to prevent XSS attacks.

1, HttpOnly prevents Cookie hijacking:

response.addHeader("Set-Cookie"."uid=112; Path=/; HttpOnly")
Copy the code

2, input check, because the content that the user may input is the injected script, when saved to the server, before the display to the page will be executed before the injected content of the user input;

Solution: Check, filter, and escape any input from the user. Creates a trusted whitelist of characters and HTML tags, and filters or encodes characters or tags that are not in the whitelist.

In XSS defense, input check is generally used to check whether special characters such as <, > are contained in the data entered by users. If so, special characters are filtered or encoded. This method is also called XSS Filter.

Const decodingMap = {const decodingMap = {const decodingMap = {const decodingMap = {const decodingMap = {const decodingMap = {'< ': '<'.'> ': '>'.'" ': '"'.'& ': '&'.'& # 10; ': '\n'
}
Copy the code

3, output check, server output will also have problems. In general, with the exception of rich text output, you can use encoding or escaping to defend against XSS attacks when a variable is output to an HTML page. For example, sanitize-HTML is used to filter the output regularly and then output it to the page.

Prevents file injection attacks

Guard:

1. The file upload directory is set to unexecutable

2. Determine the file type. Set the file type whitelist by combining the MIME type with the file extension. For image files, you can use the image library function to further check if it is really an image.

3. Rename the file name.

4. The file server uses an independent domain name.

SQL injection

Protection against: separation of data and code, that is, using SQL preprocessing methods instead of string piecing together SQL statements (using placeholders for parameters?) .

XST processing

XST(Cross-site tracing) attack defense: Disable TRACE methods on Web servers.

Summary of answers to questions

Describes the TCP three-way handshake

  • First handshake: Establish a connection. The client sends a connection request, sends a SYN packet, and sets seQ to X (some value). The client then enters the SYN_SEND state and waits for confirmation from the server.
  • Second handshake: The server receives a SYN packet from the client. You need to acknowledge the SYN segment and send an ACK packet with the ACK set to X+1. It also sends a SYN request, setting seq to Y. The server sends all the above information to the client, and the server enters the SYN_RECV state.
  • Third handshake: After receiving ACK and SYN packets from the server, the client acknowledges them, sets ACK to Y+1 and SEQ to X+1, and sends an ACK packet to the server. After the ACK packet is sent, the client and server enter the ESTABLISHED state to complete the TCP three-way handshake.

The reason for the third handshake is to prevent the invalid connection request segment from suddenly being passed back to the server process and causing an error. Suppose the client sends the first connection request because the network is stuck, then the client sends another request and successfully establishes the connection. After the data transfer, the connection is released. In a certain period of time after the connection is released, the first packet segment from the client reaches the server and is confirmed by the server process. If there is no third handshake, the server will wait for the client to send data, wasting many resources.

Why is there a three-way handshake when you connect and a four-way handshake when you close?

A: After receiving a SYN request packet from the Client, the Server sends a SYN+ACK packet. ACK packets are used for reply, and SYN packets are used for synchronization. However, when the Server receives a FIN packet, the SOCKET may not be closed immediately. Therefore, the Server can only reply with an ACK packet to tell the Client, “I received the FIN packet you sent.” I can send FIN packets only after all packets on the Server are sent. Therefore, THE FIN packets cannot be sent together. Therefore, a four-step handshake is required.

What happens from the time you enter the URL to the time the page loads?

See: How Browers work

1. The user enters the URL.

2. The browser first queries the records in its cache, system cache, and router cache. If no, the browser queries the local host file and sends a domain name resolution request to the DNS server.

3. The browser sends a request to the DNS server to resolve the IP address corresponding to the domain name in the URL.

4, after resolving the IP address, according to the IP address and the default port 80 (if there is a port number in the URL you input), initiate TCP connection request to the specified port of the server, through the transport layer, network layer, data link layer, physical layer to the server, after three handshakes to establish TCP connection;

5. After three handshakes, data can be transmitted. The content to be sent by the client is constructed into HTTP request packets and encapsulated in TCP packets, which are sent to the specified port of the server through TCP.

6. The server parses the HTTP request, encapsulates it into an HTTP object based on the packet format, and returns it to the browser.

7. After data transmission, release the connection between client and server through four breakups.

8. The browser parses and renders the page according to the received response message

Ps: Browser parsing and rendering

The browser parses the response message and renders the page. Before rendering the page, you need to build the DOM tree and the CSSOM tree. The browser is a process of parsing and rendering.

Build a DOM tree

An HTML document will be parsed into a DOM tree with the document as the root. If JavaScript is encountered during parsing, parsing will be suspended and relevant files will be transferred and downloaded, which will cause blocking. Therefore, it is recommended that JavaScript scripts be placed at the end of the HTML file.

2. Build the CSSSOM tree

Browsers parse CSS based on external, internal, and inline styles to build CSSSOM trees.

3. Build render trees and layouts

Once the DOM and CSSOM trees are built, they merge into a render tree, and the browser confirms the location of each element on the page.

4. Page rendering and optimization

The browser draws the page based on the layout result and optimizes the page content to reduce CPU consumption.

After rendering, the entire page is in front of us.

How to implement communication between multiple tabs in the browser?

1. WebSocket: Because WebSokect is full-duplex communication, it can realize the communication before multiple tabs;

3. Localstorage: is the storage space shared by multiple browser tags, so it can be used to achieve communication between multiple tags;

The onstorage event and the storage event will only be triggered when the localStorage is modified on the current page. The localStorage event will not be triggered when the localStorage is modified on the current page.

window.onstorage = (e) = > {console.log(e)}
Copy the code

or

window.addEventListener("storage".function(event){ &emsp; $("#name"). Val (event. Key + "=" + event. NewValue); });Copy the code

Note quirks: Safari throws a QuotaExceededError exception when setting localStorge in traceless mode

4. Use cookie+setInterval

This method uses timer to constantly plug, is quite a waste of resources, although it can achieve the solution, but not elegant.

Ideas:

Set A setInterval timer on page A to refresh continuously to check whether the value of Cookies changes, and refresh if it does.

Since Cookies are readable in the same domain, the value of Cookies can be changed during the audit of page B, so page A can be obtained naturally.

Differences between TCP and UDP

TCP provides a connection-oriented, reliable data transfer service. Transmission in the form of a message segment;

UDP provides a connectionless, best-effort data transfer service. In the form of user datagrams.

The differences between TCP and UDP: Compared with UDP, TCP provides connection-oriented, byte stream, and reliable transmission.

What do you know about HTTP response codes? What do they mean?

Common status code

200 -OK: The request was successful, and the data sent by the client was processed normally. 301-moved Permanently redirected to another URL 302-found Permanently redirected to 303-see Other. This is similar to 302. But you must GET the 304-not Modified method. Match (if-match, if-modified-since, if-none_match, if-range, if-unmodified-since) 307-temporary Redirect 400-BAD Request A syntax error exists in the client Request packet. 401 -Unauthorized The client Request is not authorized. 403-forbidden The client Request is rejected by the server. 404 - Not Found The requested resource cannot be Found on the Server. 500 - Internal Server Error 503-service Unavailable The Server is overloaded or is stopping maintenanceCopy the code

HTTP protocol workflow?

  • 1. Address resolution

  • 2. Encapsulate HTTP request packets

  • 3. Encapsulate TCP packets and establish TCP links (TCP three-way handshake)

  • 4. The client sends request commands

  • 5. The server responds

  • 6. The server closes the TCP connection

What is a long link and why is it needed?

HTTP/1.1 enables persistent connections by default. After data transmission is complete, keep TCP connections open (no RST packets, no four-way handshake) and wait to continue data transmission over this channel under the same domain name (multiple HTTP requests and responses can be transmitted on a TCP connection).

Why long connections?

The cost of creating a TCP connection is high because of the three-way handshake between the client and the server and the slow start. The more external resources a web page loads, the more this problem becomes.

As a result, HTTP 1.0 has poor performance. Workaround: When sending the request, set the field:

Connection: keep-alive
Copy the code

This field requires that the server not close the TCP connection so that other requests can be reused. The server also responds to this field. A TCP connection is established that can be reused until the client or server actively closes the connection. However, this is not a standard field and may not behave consistently across implementations, so it is not a fundamental solution.

HTTP/1.1 does not declare Connection: keep-alive by default.

Why does HTTP/2 channel multiplexing improve performance?

In HTTP/2, there are four major features: header compression, binary streaming, server push, and multiplexing.

Why does HTTP /2 channel multiplexing improve performance?

  • The same domain name occupies only one TCP connection. Multiple requests and responses are sent in parallel using the same connection. In this way, the download process of the entire page resources requires only a slow start and avoids the problem caused by multiple TCP connections competing for bandwidth.
  • Multiple requests/responses are sent in parallel and interleaved, without affecting each other.
  • In HTTP/2, each request can have a priority value of 31 bits. 0 indicates the highest priority. A higher value indicates a lower priority. With this priority value, clients and servers can take different policies when dealing with different streams to optimally send streams, messages, and frames.

What optimizations for HTTP/1.1 don’t apply to HTTP/2?

  • JS file merge. One of the main directions of our optimization now is to reduce the number of HTTP requests as far as possible. For the code in our project, when we develop modules, when we go online, we will compress and merge all the codes into a file, so that no matter how many modules, we will request a file, reducing the number of HTTP requests. But there is a very serious problem with this: file caching. When we had 100 modules and one module changed something, the way it was before, the entire file browser had to be redownloaded and could not be cached. Now we have itHTTP/2In this way, modules can be compressed independently without affecting other modules that have not been modified.
  • Multiple domains improve browser download speed. Before, we had an optimization to put CSS files and JS files under two domain names, so that the browser can download these two types of files at the same time, avoiding the browser’s limit of 6 channels, the disadvantages of this are obvious, 1.DNS resolution time will be longer. 2. Increased pressure on the server. There are theHTTP/2And then, according to the principle above, we don’t have to do that, it’s cheaper.

Some existing problems with HTTP1.0 and 1.1

  1. Http1.x requires reconnection every time data is transferred, which adds a significant amount of latency, especially on mobile.
  2. Http1.x transmits data in plain text. The client and server cannot verify the identity of each other.
  3. When http1.x is used, the content in the header is too large, which increases the transmission cost to some extent, and the header does not change much with each request, especially on the mobile end.
  4. Although http1. x supports keep-alive to compensate for the delay caused by multiple connection creation, using keep-alive too much can also impose significant performance pressure on the server, and for services where a single file is constantly requested (such as image hosting sites), Keep-alive can have a significant impact on performance because it keeps the connection unnecessarily long after the file has been requested.

The difference between HTTPS and HTTP

  1. For HTTPS, you need to apply for a certificate from a CERTIFICATE authority (CA). Generally, a few free certificates need to be paid.
  2. HTTP runs on TOP of TCP, and all transmitted content is in plain text. HTTPS runs on top of SSL/TLS, and SSL/TLS runs on top of TCP, and all transmitted content is encrypted.
  3. HTTP and HTTPS use completely different connections and use different ports, the former 80 and the latter 443.
  4. HTTP connections are simple and stateless; HTTPS is a network protocol that uses HTTP+SSL to encrypt transmission and authenticate identities. It effectively prevents hijackings by carriers and solves a major problem in preventing hijackings. It is more secure than HTTP.

How do I change HTTP to HTTPS?

If a site is replacing HTTP with HTTPS for its entire site, the following concerns may be required:

  1. Install CA certificates, general certificates are required to charge,
  2. After purchasing the certificate, configure your own domain name on the website provided by the certificate. After downloading the certificate, configure your own Web server and modify the code.
  3. HTTPS slows down user access speed. For an SSL handshake, HTTPS reduces speed to a certain extent. However, the impact of HTTPS on speed is acceptable as long as it is properly optimized and deployed. In many cases, HTTPS is as fast as HTTP, and even faster if you use SPDY.
  4. HTTPS reduces the access speed, but it is more important to worry about the CPU pressure on the server side. A large number of key algorithm calculations in HTTPS consume a lot of CPU resources. Only with sufficient optimization, the machine cost of HTTPS will not increase significantly.

What is caching?

A repository that caches HTTP so that copies of commonly used pages can be kept closer to the client.

HTTP caching mechanism

HTTP caching is divided into strong caching and negotiated caching (contrast caching).

HTTP caching has a variety of rules, according to whether to re-send a request to the server, can be divided into mandatory cache, comparison cache.

  • Force caching does not need to interact with the server if it is in effect, whereas contrast caching does need to interact with the server whether it is in effect or not
  • Two types of cache rules can exist at the same time, and the force cache has a higher priority than the comparison cache. That is, when the force cache rule is executed, if the cache takes effect, the cache is directly used and the comparison cache rule is not executed

How to make efficient use of cache and live front-end code?

1, the cache time is too long, released online, the client still uses the cache, there will be bugs

2. The cache time is too short, and too many files are repeatedly loaded, which wastes bandwidth

My general approach is:

Step1: We don’t allow HTML files to be cached (static resources (CSS, JS, image, audio, etc.), and request the server every time the HTML is accessed. So the browser gets the latest HTML resources every time.

Step2: Add the original file name of static resources (CSS, JS, image, audio, etc.) with the version number of each package, or time stamp and fingerprint. The advantage of this is that you can know which static resources were packaged and which resources were changed this time. If there is an error, you need to go back to the version, and you can also quickly go back. It’s still not optimal;

The static resource optimization solution for large companies is to rename the resource file with the summary information of the file, and put the summary information in the resource file publishing path, so that the content of the modified resource becomes a new file published online, and does not overwrite the existing resource file. In the online process, the first full deployment of static resources, and then gray deployment page, the whole problem is more perfect to solve.

Therefore, the static resource optimization program of large companies basically needs to achieve the following things:

Configure long time local cache — save bandwidth, improve performance — Use content summary as the basis of cache update — Accurate cache control static resource CDN deployment — Optimize network request for more resource publishing path to achieve non-overwrite publishing — smooth upgrade

Step3: First go online with static resources and then go online with HTML

We can use webpack engineering tools to solve.

How does HTTP handle transfer of large files?

HTTP for large file transfer scenarios can be resolved by using range requests, where the client transfers the resource in parts.

Generally: compression, chunking, range request, multi-terminal data process.

What about scope requests?

The response header uses: accept-ranges to tell the client that range requests are supported;

The request header uses: Rang to specify which part of the request. Range: bytes=x-y

Bytes =x-y Range format:

  • 0-y indicates from the beginning to the y-th byte.
  • Y – indicates the y-th byte to the end of the file.
  • -y indicates the last y bytes of the file.
  • X-y represents the contents of the x-y byte range of the file

Upon receiving the request, the server first verifies that the range is valid and returns a 416 error code if it is out of bounds, otherwise it reads the fragment and returns a 206 status code.

At the same time, the server needs to add the Content-range field, which is formatted differently depending on the Range field in the request header.

For a single segment request, the following response is returned:

HTTP/1.1 206 Partial Content Content-Length: 10 Accept-ranges: bytes Content-range: bytes 0-9/100...Copy the code

Note the Content-range fields, 0-9 for the return of the request and 100 for the total size of the resource, which is easy to understand.

Many piece of data

Let’s look at multi-segment requests. The resulting response would look like this:

HTTP/1.1 206 Partial Content
Content-Type: multipart/byteranges; boundary=00000010101
Content-Length: 189
Connection: keep-alive
Accept-Ranges: bytes


--00000010101
Content-Type: text/plain
Content-Range: bytes 0-9/96


--00000010101
Content-Type: text/plain
Content-Range: bytes 20-29/96

eex jspy e
--00000010101--
Copy the code

Here’s a key field: Content-Type: multipart/byteranges; Boundary =00000010101, which represents the amount of information is as follows:

  • The request must be a multi-segment data request
  • The delimiter in the response body is 00000010101

Therefore, segments of data in the response body are separated by the delimiter specified here, and the final delimiter is terminated by –.

What is agency?

A proxy is an HTTP intermediary entity between a client and a server. Receives all HTTP requests from clients and forwards them to the server (with possible modifications).

What is Agent?

User Agent An Agent is a client program that initiates HTTP on behalf of users. Take Web browsers. There are also proxies that automatically send HTTP requests and retrieve content, such as “Web spiders” or “Web bots.”

How to understand HTTP proxies?

In the Web, an HTTP proxy is an entity between a client and a Web server. It can both initiate requests from the client and return responses like a Web server.

The primary difference between a proxy and a gateway is that a proxy can only forward one protocol, whereas a gateway can forward multiple protocols.

HTTP proxies exist in two forms, which are briefly described as follows:

The first is a generic proxy described in RFC 7230-HTTP /1.1: Message Syntax and Routing (i.e., revised RFC 2616, the first part of the HTTP/1.1 protocol). This proxy acts as a “man in the middle”, serving as a server to clients connected to it; For the server to connect to, it is the client. It is responsible for sending HTTP packets back and forth between the two ends.

The second protocol is Tunneling TCP based protocols through Web proxy Servers. It communicates through the Body part of THE HTTP protocol, and realizes any APPLICATION layer protocol proxy based on TCP through HTTP. This proxy uses THE CONNECT method of HTTP to establish connections, but CONNECT was not originally part of RFC 2616-HTTP /1.1 until the HTTP/1.1 revision was released in 2014. CONNECT and tunnel proxy are described in RFC 7231-HTTP /1.1: Semantics and Content. In fact, this kind of proxy has long been widely implemented.

What is a gateway?

A gateway is a special type of server that acts as an intermediary entity for other servers. Usually used to convert HTTP traffic to other protocols.

What is a tunnel?

A tunnel is an HTTP application that blindly forwards raw data between two connections once it is established. A common use is to carry encrypted Secure Sockets Layer (SSL) traffic over HTTP connections so that SSL traffic can pass through a firewall that only allows Web traffic.

— — — — — — — — — — — — — — — — — –

This article analysis or answers for reference only, the content of the reference to the following information ~~

References:

Illustrated HTTP

The Definitive GUIDE to HTTP

Taligarsiel.com/Projects/ho…

Cloud.tencent.com/document/pr…

Httpwg.org/specs/rfc75…

www.runoob.com/http/http-c…

Ye11ow. Gitbooks. IO/http2 – expla…

Github.com/creeperyang…

www.cnblogs.com/coco1s/p/57…

Developer.mozilla.org/zh-CN/docs/…

Developer.mozilla.org/zh-CN/docs/…