Recently, I have arranged the high frequency test questions on the front face and shared them with you to learn. If you have any questions, please correct me!

Note: 2021.5.21 has been updated to modify some errors in this article. Add a mind map, according to their own interview experience and the interview on the platform such as Niuker.com, the frequency of the interview questions will be roughly divided, can be targeted to review.

The following is a series of articles.

[1] “2021” high frequency front face test summary HTML article

[2] “2021” high-frequency front end of the CSS section

[3] 2021

[4] 2021

[5] “2021”

[6] “2021”

【7】 “2021” High-Frequency front end test questions summary

[8] “2021” high frequency front end test questions summary

[9] “2021” high frequency front end test question summary of computer network article

[10] “2021” high frequency front end test summary of browser principles

[11] the performance optimization of “2021” high frequency test paper summary

[12] “2021” high frequency front end of the handwritten code of the test summary

[13] “2021” high frequency front end test summary code output results

1. HTTP protocol

1. Differences between GET and POST requests

Post and Get are two methods for HTTP requests. The differences are as follows:

  • Application scenario: A GET request is an idempotent request. It is usually used in scenarios that have no impact on server resources, such as a request for a web page. While Post is not an idempotent request, it is typically used in situations that have an impact on server resources, such as registering users.
  • Cache or not: Because the two scenarios are different, browsers generally cache Get requests, but rarely Post requests.
  • The format of the sent packet is as follows: The entity part of the Get request packet is empty, and the entity part of the Post request packet is usually the data sent to the server.
  • Security: Get requests can be sent to the server with the request parameters in the URL, which is less secure than Post requests because the requested URL will remain in the history.
  • Request length: The browser limits the URL length, so it affects the length of the data sent by the GET request. This restriction is set by the browser, not the RFC.
  • Parameter types: Post parameter passing supports more data types.

2. Differences between POST and PUT requests

  • A PUT request sends data to the server to modify the content of the data, but does not increase the type of data, which means that no matter how many times the PUT operation is performed, the result is the same. (It is understandable to update data)
  • A POST request is to send data to the server. The request changes resources such as the type of data, and it creates new content. (Can be understood as creating data)

3. Common HTTP request headers and response headers

HTTP Request Header Common Request headers:

  • Accept: The type of content the browser can handle
  • Accept-charset: The set of characters that the browser can display
  • Accept-encoding: Compression Encoding that the browser can handle
  • Accept-language: indicates the current Language of the browser
  • Connection: Indicates the Connection type between the browser and the server
  • Cookie: Any Cookie set by the current page
  • Host: The domain where the page making the request resides
  • Referer: The URL of the page from which the request was made
  • User-agent: User Agent string of the browser

HTTP Responses Header

  • Date: indicates the time when the message is sent. The description format of time is defined by RFC822
  • Server: indicates the server name
  • Connection: Indicates the Connection type between the browser and the server
  • Cache-control: controls the HTTP Cache
  • Content-type: What MIME type does the following document belong to

There are four common content-type attribute values:

(1) Application/X-www-form-urlencoded: the browser’s native form, if you don’t set the encType attribute, will eventually submit data as Application/X-www-form-urlencoded. The data submitted in this way is stored in the body. The data is encoded as key1=val1&key2=val2. Both key and val are URL transcoded.

(2) Multipart /form-data: This is also a common POST submission method, which is usually used when uploading a form file.

(3) Application/JSON: The server message body is a serialized JSON string.

(4) Text/XML: This method is mainly used to submit data in XML format.

4. HTTP status code 304 is good or bad

In order to improve the website access speed, the server specified the caching mechanism for some of the previously visited pages. When the client requests these pages, the server will judge whether the page is the same as before according to the cached content. If it is the same, it will directly return 304.

The status code 304 should not be considered an error, but rather a response to the client’s cache.

Search engine spiders prefer sites with updated content sources. Through a specific time to crawl the site to return the status code to adjust the crawl frequency of the site. If the site has been in a state of 304 for a certain period of time, then the spider may reduce the number of times to crawl the site. On the contrary, if the frequency of site change is very fast, every crawl can obtain new content, then accumulate over time, the return rate will also improve.

Cause of more 304 status codes:

  • The page update cycle is long or not updated
  • Pure static page or force static HTML generation

Too many 304 status codes can cause the following problems:

  • The site snapshot stops;
  • Reduced collection;
  • The weight goes down.

5. Common HTTP request methods

  • GET: Obtains data from the server.
  • POST: The entity is submitted to the specified resource, usually resulting in the modification of the server resource.
  • PUT: Upload files and update data.
  • DELETE: deletes the object on the server;
  • HEAD: Obtains the packet header. Compared with GET, the body of the packet is not returned.
  • OPTIONS: Asks for supported request methods for cross-domain requests;
  • CONNECT: Requires to establish a tunnel to communicate with the proxy server and use the tunnel for TCP communication.
  • TRACE: Displays the request received by the server. It is used for testing or diagnosis.

6. OPTIONS Request method and application scenario

OPTIONS is one of the HTTP request methods in addition to GET and POST.

The OPTIONS method is used to Request the resources identified by the request-URI to be available during the Request/response communication. In this way, the client can decide what action is necessary for the resource, or know the performance of the server, before taking a specific resource request. The response to this request method cannot be cached.

The OPTIONS request method has two main uses:

  • Get all HTTP request methods supported by the server;
  • Used to check access permissions. For example, when CORS cross-domain resource sharing is performed, for complex requests, the OPTIONS method is used to send sniffing requests to determine whether the access permission to the specified resource is available.

7. What are the differences between HTTP 1.0 and HTTP 1.1?

HTTP 1.0 and HTTP 1.1 have the following differences:

  • For connections, Http1.0 defaults to non-persistent connections, while HTTP1.1 defaults to persistent connections. Http1.1 uses persistent connections to reuse the same TCP connection for multiple HTTP requests to avoid the delay of establishing a connection each time a non-persistent connection is used.
  • For example, the client only needs a part of an object, but the server sends the whole object. Http1.1 introduces the range header field in the request header, which allows only a certain part of the resource to be requested. The return code is 206 (Partial Content), which gives the developer the freedom to make the most of the bandwidth and connection.
  • For caching, http1.0 uses if-modified-since and Expires in headers as the criteria for caching. Http1.1 introduces more caching controls. For example, there are Etag, if-unmodified-since, if-match, if-none-match, and many more alternative cache headers to control the cache policy.
  • The host field is added to http1.1 to specify the domain name of the server. Http1.0 assumes that each server is bound to a unique IP address, so the URL in the request message does not pass the hostname. But with the development of virtual host technology, there can be multiple virtual hosts on a physical server, and they share an IP address. Hence the host field, which allows you to send requests to different web sites on the same server.
  • Compared with HTTP1.0, Http1.1 also adds many new request methods, such as PUT, HEAD, OPTIONS, and so on.

8. Differences between HTTP 1.1 and HTTP 2.0

  • Binary protocol: HTTP/2 is a binary protocol. In HTTP/1.1, the header of the packet must be text (ASCII encoding), and the data body can be either text or binary. HTTP/2, on the other hand, is a completely binary protocol. Both the header and the data body are binary and are collectively called “frames,” which can be divided into header frames and data frames. The concept of frames is the basis of its multiplexing.
  • Multiplexing: HTTP/2 multiplexes. HTTP/2 still multiplexes TCP connections, but within a connection, both the client and the server can send multiple requests or responses simultaneously and do not have to send them in sequence, thus avoiding “queue head congestion” [1].
  • Data flow: HTTP/2 uses the concept of data flow because HTTP/2 packets are sent out of sequence, and successive packets within the same connection may belong to different requests. Therefore, the packet must be marked to indicate which request it belongs to. HTTP/2 refers to all packets for each request or response as a data stream. Each data stream has a unique number. When a packet is sent, it must be marked with a data stream ID to distinguish which data stream it belongs to.
  • Header compression: HTTP/2 implements header compression, and since the HTTP 1.1 protocol is stateless, all information must be attached to each request. Therefore, many fields of the request are repeated, such as Cookie and User Agent, the same content must be attached to each request, which wastes a lot of bandwidth and affects speed. HTTP/2 optimizes this by introducing a header compression mechanism. On the one hand, headers are compressed using GZIP or COMPRESS before being sent. On the other hand, both the client and the server maintain a header table. All fields are stored in the table and an index number is generated. In the future, instead of sending the same field, only the index number is sent.
  • Server push: HTTP/2 allows the server to proactively send resources to the client without being asked. This is called server push. Use server push to push the necessary resources to the client in advance, thus reducing the latency. It should be noted here that the server under HTTP2 is actively pushing static resources, and WebSocket and the use of SSE and other ways to send real-time data to the client push is different.

[1] The head of the queue is blocked:

Queue header blocking is caused by HTTP’s basic request-reply model. HTTP dictates that packets must be sent “once received”, which forms a first-in, first-out “serial” queue. Requests in the queue have no priority, only the order in which they were queued. The first requests are processed first. If the head of the queue is delayed because it is processed too slowly, then all subsequent requests in the queue have to wait along with them, and as a result other requests incur undue time costs, resulting in the head of the queue being blocked.

9. The differences between HTTP and HTTPS

The differences between HTTP and HTTPS are as follows:

  • HTTPS requires a CA certificate, which is expensive. HTTP does not;
  • HTTP is a hypertext transfer protocol, and information is transmitted in plain text. HTTPS is a secure SSL encryption transfer protocol.
  • The port number varies with the connection mode. The HTTP port number is 80, and the HTTPS port number is 443.
  • HTTP connections are simple and stateless; HTTPS is a network protocol built with SSL and HTTP for encrypted transmission and identity authentication. It is more secure than HTTP.

10. The reason for the URL length limit of the GET method

In fact, the HTTP protocol specification does not limit the length of the URL requested by the GET method. This limit is limited by the specific browser and server. IE limits URL length to 2083 bytes (2K+35). Since Internet Explorer has a minimum allowed URL length, it works fine in all browsers as long as the URL does not exceed 2083 bytes during development.

The length value of GET = URL (2083- (your Domain+Path) -2(2Is it in a GET request? = length of two characters)Copy the code

Let’s take a look at the url length limits that major browsers place on get methods:

  • Microsoft Internet Explorer (Browser) : Internet Explorer has a maximum limit of 2083 characters for urls. If the limit exceeds this number, the submit button does not respond.
  • Firefox: The URL length of the Firefox Browser is limited to 65,536 characters.
  • Safari (Browser) : The MAXIMUM URL length is 80,000 characters.
  • Opera (Browser) : The maximum URL length is 190,000 characters.
  • Google (Chrome) : The maximum URL length is 8182 characters.

Mainstream servers restrict the length of urls in get methods:

  • Apache (Server) : The maximum URL length is 8192 characters.
  • Microsoft Internet Information Server(IIS) : Can accept a maximum url length of 16,384 characters.

Based on the data above, you can see that the URL in the GET method is no longer than 2083 characters long, so that all browsers and servers are likely to work.

11. What happens when you type Google.com into your browser and press Enter?

(1) URL parsing: First, the URL will be parsed to analyze the transport protocol needed to be used and the path of the requested resource. If the protocol or host name in the URL is invalid, the content entered in the address bar is passed to the search engine. If there is no problem, the browser checks whether there are illegal characters in the URL. If there are illegal characters, the browser escapes them before proceeding to the next step.

(2) Cache judgment: The browser will determine whether the requested resource is in the cache. If the requested resource is in the cache and not invalid, it will use it directly; otherwise, it will send a new request to the server.

(3) DNS resolution: The next step is to obtain the IP address of the domain name in the entered URL. First, it will determine whether there is a local cache of the IP address of the domain name. If there is, it will use it; if not, it will send a request to the local DNS server. Local DNS server will check whether there is a cache, if not will be the first to root name servers initiate the request, after get in charge of the top-level domain name server address, again to the top-level domain name server requests, and gains the authority in charge of domain name server address, and then to the authority of the domain name server request, finally get the IP address of the domain name, The local DNS server then returns the IP address to the requesting user. The request sent by a user to the local DNS server is a recursive request, and the request sent by the local DNS server to the DNS servers at different levels is an iterative request.

(4) Obtain THE MAC address: After the browser obtains the IP address, it also needs to know the MAC address of the destination host for data transmission, because the application layer sends data to the transport layer, and the TCP protocol specifies the source port number and destination port number, and then sends the data to the network layer. The network layer uses the local IP address as the source address and the obtained IP address as the destination address. The MAC address of the local host is used as the source MAC address. The destination MAC address needs to be processed in different cases. By matching the IP address with the subnet mask of the host, we can determine whether the IP address and the requesting host are in the same subnet. If they are in the same subnet, we can use the APR protocol to obtain the MAC address of the destination host. If they are not in the same subnet, then the request should be forwarded to the gateway, which will forward the request. In this case, you can also obtain the MAC address of the gateway through ARP. In this case, the MAC address of the destination host must be the gateway address.

(5) the TCP three-way handshake: here is a TCP connection is established the three-way handshake process, first of all, the client to the server sends a SYN connection request message and a random number, after the server receives a request to the server sends a SYN ACK packet, confirm the connection request, and also to the client sends a random number. After receiving the acknowledgement from the server, the client enters the connection establishment state and sends an ACK packet to the server. After receiving the acknowledgement, the server also enters the connection establishment state. At this time, the connection between the two sides is established.

(6) HTTPS handshake: If the HTTPS protocol is used, there is also a four-way handshake process of TLS before communication. First, the client sends the version number of the protocol used, a random number, and the encryption method available to the server. The server side receives, confirms the encryption method, also sends a random number and its own digital certificate to the client. After receiving the certificate, the client first checks whether the digital certificate is valid. If the certificate is valid, the client generates a random number, encrypts the random number using the public key in the certificate, and then sends it to the server. In addition, the client provides a hash value of all the previous contents for the server to verify. After receiving the data, the server uses its private key to decrypt the data and sends a hash value of all the previous data to the client for verification. At this time, both parties have three random numbers. According to the encryption method agreed before, use these three random numbers to generate a secret key. Before communication between the two parties, use this secret key to encrypt data before transmission.

(7) Return data: When the page request is sent to the server, the server will return an HTML file as a response. After the browser receives the response, it begins to parse the HTML file and start the page rendering process.

(8) Page rendering: The browser will first build the DOM tree according to the HTML file, and then build the CSSOM tree according to the CSS file parsed. If the script tag is encountered, will the terminal contain the defer or async property? Otherwise, the loading and execution of the script will cause the page rendering to block. When the DOM tree and CSSOM tree are set up, build the render tree from them. Once the render tree is built, the layout is made according to the render tree. After the layout is complete, the page is finally drawn using the browser’s UI interface. At this point the entire page is displayed.

(9) TCP four-wave: The last step is the four-wave process of TCP disconnection. If the client thinks the data is sent, it needs to send a connection release request to the server. After receiving the connection release request, the server tells the application layer to release the TCP connection. It then sends an ACK packet and enters the CLOSE_WAIT state, indicating that the connection between the client and the server has been released and that it is no longer receiving data from the client. But because TCP connections are bidirectional, the server can still send data to the client. The server will continue to send data if there is still data to be sent. After that, it will send a connection release request to the client, and then enter the last-ACK state. After receiving the release request, the client sends an acknowledgement reply to the server. In this case, the client enters the time-wait state. The state lasts for 2MSL. If the server does not resend the request within the period, the server enters the CLOSED state. When the server receives the acknowledgement, it enters the CLOSED state.

12. Keep -alive

In HTTP1.0, the default is to create a new connection between the client and the server on each request/response, and disconnect immediately after completing the connection, which is called a short connection. When keep-alive mode is used, the keep-alive function keeps the connection between the client and the server Alive. When a subsequent request to the server occurs, the keep-alive function avoids establishing or reestablishing the connection. This is called a persistent connection. Its use method is as follows:

  • HTTP1.0 does not send keep-alive by default, so you must manually configure sending to Keep the connection aliveConnection: keep-aliveField. If you want to disconnect the keep-alive connection, send this messageConnection:closeField;
  • HTTP1.1 specifies the default to keep the TCP connection open, waiting for the same domain name to continue to transfer data over the channel. If you want to close it, the client needs to send itConnection: closeHeader field.

The process of building keep-alive:

  • The client adds the send Connection field in the header of the request packet to the server
  • The server receives the request and processes the Connection field
  • The server sends back the Connection: keep-alive field to the client
  • Description The client received the Connection field
  • The keep-alive connection is established successfully. Procedure

The server automatically disconnects (i.e., without keep-alive) :

  • The client only sends a content packet to the server (without the Connection field)
  • The server receives the request and processes it
  • The server returns the requested resource from the client and closes the connection
  • The client receives the resource, but finds no Connection field. The Connection is disconnected

Client request disconnect procedure:

  • The client sends the Connection: CLOSE field to the server
  • The server receives the request and processes the Connection field
  • The server sends back the response resource and disconnects
  • The client receives the resource and disconnects

Advantages of keeping Alive:

  • Less CPU and memory usage (due to fewer simultaneous open connections);
  • Pipelining of HTTP to allow requests and responses;
  • Reduced congestion control (reduced TCP connections);
  • Reduced latency for subsequent requests (no more handshaking);
  • You do not need to close the TCP connection to report an error.

Disadvantages of keeping Alive:

  • A Tcp connection that lasts for a long time consumes system resources.

13. How does HTTP behave when there are multiple images on the page?

  • inHTTP 1, the maximum number of TCP connections to a domain name is 6, so the browser will request multiple times. You can useMulti-domain deploymentTo solve. This increases the number of simultaneous requests and speeds up the acquisition of page images.
  • inHTTP 2Because HTTP2 supports multiplexing, multiple HTTP requests can be sent within a TCP connection.

14. What is HTTP2’s header compression algorithm?

HTTP2’s header compression is the HPACK algorithm. A “dictionary” is built on both the client and the server, the index number is used to represent the repeated string, and the Huffman code is used to compress the integer and string, which can achieve a high compression rate of 50%~90%.

To be specific:

  • The “header table” is used on the client and server to track and store the key-value pairs sent before. For the same data, it is no longer sent through each request and response.
  • The header table exists throughout the lifetime of the HTTP/2 connection and is updated incrementally by both the client and the server.
  • Each new header key-value pair is either appended to the end of the current table or replaces the previous value in the table.

For example, in the following two requests, request one sends all header fields, and request two sends only differential data, which reduces redundant data and overhead.

15. What is the HTTP request packet?

A request packet consists of four parts:

  • The request line
  • The request header
  • A blank line
  • Request body

Among them:(1) The request line includes the request method field, URL field, and HTTP protocol version field. They are separated by Spaces. For example, GET /index.html HTTP/1.1. (2) Request header: The request header consists of keyword/value pairs. Each line consists of one pair. Keywords and values are separated by colons (:)

  • User-agent: indicates the type of the browser that generates the request.
  • Accept: List of content types recognized by the client.
  • Host: Indicates the requested Host name. Multiple domain names are allowed to have one IP address, that is, a virtual Host.

(3) Request body: data carried by requests such as POST and PUT

16. What is the HTTP response packet?

A request packet consists of four parts:

  • Response line
  • Response headers
  • A blank line
  • Response body

  • Response line: Consists of the network protocol version, status code, and the reason phrase for the status code, such as HTTP/1.1 200 OK.
  • Response header: consists of response radicals
  • Response body: Data that the server responds to

17. The advantages and disadvantages of HTTP protocol

HTTP is a hypertext transfer protocol that defines the format and mode of exchanging packets between a client and a server. By default, port 80 is used. It uses TCP as the transport layer protocol to ensure the reliability of data transmission.

The HTTP protocol has the following advantages:

  • Supports client/server mode
  • Simple and fast: When a client requests service from a server, only the request method and path need to be passed. Because the HTTP protocol is simple, the program size of the HTTP server is small, so the communication speed is very fast.
  • Connectionless: Connectionless limits processing to one request per connection. After the server processes the customer’s request and receives the customer’s reply, it disconnects, which can save the transmission time.
  • Stateless: THE HTTP protocol is stateless, where the state refers to the context information of the communication process. The lack of state means that if subsequent processing requires the previous information, it must be retransmitted, which may result in a larger amount of data being transmitted per connection. On the other hand, the server responds faster when it doesn’t need the previous information.
  • Flexibility: HTTP allows the transfer of data objects of any type. The Type being transmitted is marked by content-type.

The HTTP protocol has the following disadvantages:

  • Stateless: HTTP is a stateless protocol and the HTTP server does not hold any information about the client.
  • Plaintext transmission: The packets in the protocol are in text form. Therefore, the packets are directly exposed to the outside world and are not secure.
  • unsafe

(1) The communication is in plain text (not encrypted), and the content may be eavesdropped; (2) It does not verify the identity of the communicating party, so it may encounter disguise; (3) The integrity of the message cannot be proved, so it may have been tampered with;

18. Talk about HTTP 3.0

HTTP/3 based on UDP protocol to achieve similar TCP multiplexing data flow, transmission reliability and other functions, this set of functions is called QUIC protocol.

  1. Traffic control and transmission reliability: QUIC adds a layer to UDP to ensure data transmission reliability. It provides packet retransmission, congestion control, and other features found in TCP.
  2. Integrated TLS encryption: Currently QUIC uses TLS1.3, reducing the number of RTT spent on handshakes.
  3. Multiplexing: There can be multiple independent logical data streams on the same physical connection, which realizes the separate transmission of data streams and solves the queue head blocking problem of TCP.

  1. Quick handshake: Based on UDP, 0 to 1 RTT can be used to establish a connection.

What is the performance of the HTTP protocol

The HTTP protocol is based on TCP/IP and uses a request-reply communication pattern, so it is in these two points that performance is critical.

  • A long connection

The HTTP protocol has two connection modes, one is persistent, the other is non-persistent. (1) Non-persistent connection means that the server must establish and maintain a completely new connection for each requested object. (2) In continuous connection, the TCP connection is not closed by default and can be multiplexed by multiple requests. The benefit of using continuous connections is that you can avoid the time spent each time a three-way handshake is established for a TCP connection.

Different connection modes are used for different versions:

  • In HTTP/1.0, each time a request is made, a new TCP connection (three-way handshake) is created, and the request is serial, making the TCP connection establish and disconnect fearless, increasing the communication overhead. This version uses non-continuous connections, but you can ask the server not to close the TCP Connection by adding Connection: keep-a live on request.
  • In HTTP/1.1, the communication mode of persistent connection is proposed, also called persistent connection. The advantage of this method is to reduce the extra overhead caused by repeated establishment and disconnection of TCP connections and reduce the load on the server. Continuous connection is the default for this and later versions. Most browsers currently support up to six simultaneous persistent connections for the same domain.

  • Pipeline network transmission

HTTP/1.1 uses the long-connection approach, which makes pipeline network traffic possible.

Pipeline (network transmission) refers to: in the same TCP connection, the client can initiate multiple requests, as long as the first request sent, do not have to wait for its back, can send a second request out, can reduce the overall response time. But the server responds to the requests in order. If the first response is very slow, there will be many requests waiting in the queue. This is called a head jam.

  • Team head congestion

HTTP packets must be sent and received one after the other, but the tasks are queued and executed in a sequence. If the first request in the queue is too slow, the processing of subsequent requests will be blocked. This is the HTTP queue header blocking problem.

Solution of queue blocking: (1) Concurrent connection: for a domain name to allow the allocation of multiple long connections, then equivalent to increasing the task queue, not a queue of tasks block all other tasks. (2) Domain name sharding: The domain name is divided into many secondary domain names, which all point to the same server. The number of concurrent long connections increases, which solves the problem of queue head blocking.

20. What are the components of a URL

The following URL, for example: www.aspxfans.com:8080/news/index….

As you can see from the URL above, a complete URL consists of the following sections:

  • Protocol part: The protocol part of the URL is HTTP:, which indicates that the web page uses the HTTP protocol. Many protocols can be used on the Internet, such as HTTP, FTP, and so on. In this example, THE HTTP protocol is used. “//” after “HTTP” is the delimiter;
  • Domain name: The domain name of the URL is www.aspxfans.com. An IP address can also be used as a domain name in a URL
  • Port part: The port part is followed by the domain name. The domain name and port part are separated by colons (:). The port is not a required part of a URL. If the port is omitted, the default port is used (80 for HTTP and 443 for HTTPS).
  • Virtual directory: The virtual directory starts from the first slash (/) to the last slash (/) after the domain name. The virtual directory is also not a required part of a URL. In this example, the virtual directory is “/news/”;
  • File name: from the last “/” after the domain name to “?” So far, is the filename part, if there is no “?” , it is the file part from the last “/” after the domain name to the “#”. If there is no “?” And “#”, then from the last “/” after the domain name to the end, is the filename part. In this example, the file name is “index.asp”. The filename is not a required part of a URL. If omitted, the default filename is used.
  • Anchor section: From the “#” to the end, it is the anchor section. The anchor part in this example is “name.” The anchor section is also not a required part of a URL;
  • Parameter part: from “? The part between the start and “#” is the parameter part, also known as the search part and the query part. In this example, the parameter part is boardID= 5&id =24618&page=1. Parameters can have more than one parameter and are separated by an ampersand (&).

21. What are the HTTP request headers associated with caching

Strong cache:

  • Expires
  • Cache-Control

Negotiation cache:

  • Etag, If – None – Match
  • The last-modified, If – Modified – Since

2. HTTPS protocol

1. What is HTTPS?

Hypertext Transfer Protocol Secure (HTTPS) is a transport Protocol for Secure communication over computer networks. HTTPS communicates over HTTP, using SSL/TLS to encrypt packets. The main purpose of HTTPS is to provide identity authentication for the website server and protect the privacy and integrity of the exchanged data.HTTP protocolClear transmissionInformation, existenceInformation eavesdropping,Information to tamper withandInformation hijackedThe risks that the protocol TLS/SSL hasThe authentication,Information encryptionandIntegrity checkTo avoid such problems.

The main responsibility of the security layer is to encrypt the HTTP request data and decrypt the RECEIVED HTTP content.

2. Working principle of TLS/SSL

TLS/SSL is a Transport Layer Security protocol between TCP and HTTP. It does not affect the original TCP and HTTP protocols. Therefore, you do not need to modify the HTTP page to use HTTPS.

The implementation of TLS/SSL depends on three basic algorithms: hash, symmetric encryption, and asymmetric encryption. The functions of these three types of algorithms are as follows:

  • Verify the integrity of information based on the hash function
  • The symmetric encryption algorithm uses the negotiated secret key to encrypt data
  • Asymmetric encryption implements identity authentication and key negotiation

(1) Hash function

Common hash functions include MD5, SHA1, and SHA256. The characteristic of this function is unidirectional irreversible, very sensitive to input data, and the length of output is fixed. Any modification of data will change the result of the hash function, which can be used to prevent information tampering and verify the integrity of data.

Features: In the process of information transmission, the hash function can not realize the tamper proof of information, because the transmission is plaintext transmission, middleman can modify the information after recalculating the summary of information, so it is necessary to encrypt the transmitted information and the summary of information.

(2) Symmetric encryption

In symmetric encryption, both parties use the same secret key to encrypt and decrypt data. However, the problem with symmetric encryption is how to ensure the security of the transmission of the secret key, because the secret key will still be transmitted over the network, once the secret key is acquired by others, then the whole encryption process is useless. That’s where asymmetric encryption comes in.

Common symmetric encryption algorithms include AES-CBC, DES, 3DES, and AES-GCM. The same secret key can be used to encrypt and decrypt messages. In order to obtain information and prevent eavesdropping, the communication method is one-to-one.

Features: The advantage of symmetric encryption is that information is transmitted one to one, and the same password is shared. Password security is the basis of information security. The server communicates with N clients, and N password records need to be maintained and passwords cannot be changed.

(3) Asymmetric encryption

The way asymmetric encryption works is that we have two secret keys, a public key and a private key. The public key is public, the private key is private. Data encrypted with a private key can be decrypted only with the corresponding public key. Data encrypted with a public key can be decrypted only with the corresponding private key. We can publish the public key, and any customer who wants to communicate with us can use the public key we provide to encrypt the data, and then we can use the private key to decrypt the data, so that the data is safe. However, one disadvantage of asymmetric encryption is that the encryption process is very slow. Therefore, if asymmetric encryption is used for every communication, it will lead to a long waiting time.

Common asymmetric encryption algorithms include RSA, ECC, and DH. The secret keys come in pairs, usually called the public key (public) and the private key (secret). Public-key encryption only the private key can decrypt the information, only the public key and private key encrypted information so mastering the public key cannot decrypt the information each other, between the client and server is encrypted only communication, the server can realize the one-to-many communication, the client can also be used to validate master private key to the identity of the server.

Characteristics: Asymmetric encryption is characterized by one-to-many information. The server only needs to maintain a private key to communicate with multiple clients, but the information sent by the server can be decrypted by all clients, and the calculation of this algorithm is complex, and the encryption speed is slow.

Based on the above algorithm features, TLS/SSL works in a way that the client uses asymmetric encryption to communicate with the server, realize identity authentication and negotiate the secret key used in symmetric encryption. The symmetric encryption algorithm uses the negotiated secret key to encrypt the information and the information summary. Different nodes use different symmetric secret keys to ensure that the information can only be obtained by both communication parties. This solves the problem of both methods.

3. What are digital certificates?

The current method is not necessarily secure, because there is no way to be sure that the resulting public key is a secure one. There may be an intermediary who intercepts the public key sent to us and then sends us his own public key. When we encrypt the message with his public key, he can decrypt the message with his private key. Then he sends the same message to each other pretending to be us, so that our information has been stolen without him knowing it. To solve this problem, you can use digital certificates.

A Hash algorithm is used to encrypt the public key and other information to generate a message digest, which is then encrypted by a trusted Certification Authority (CA) with its private key to form a signature. The final combination of the original information and the signature is called a digital certificate. When receiving a digital certificate, the receiver uses the same Hash algorithm to generate a digest based on the original information, and then uses the public key of the notary office to decrypt the digest in the digital certificate. Finally, the receiver compares the decrypted digest with the generated digest to find out whether the obtained information is changed.

This method is the most important is the reliability of the certification center, the general browser will be built in some top-level certification center certificate, equivalent to we automatically trust them, only in this way to ensure the security of data.

4. HTTPS communication (handshake) process

The HTTPS communication process is as follows:

  1. The client sends a request to the server. The request contains the protocol version number used, a random number generated, and the encryption method supported by the client.
  2. After receiving the request, the server side confirms the encryption method used by both parties, and gives the server certificate and a random number generated by the server.
  3. After confirming that the server certificate is valid, the client generates a new random number, encrypts the random number using the public key in the digital certificate, and then issues it to the server. It also provides a hash value for all the previous content for the server to verify.
  4. The server uses its own private key to decrypt random numbers sent by the client. And provide the hash values of all the previous content for the client to verify.
  5. The client and server use the first three random numbers according to the convention encryption method to generate a conversation secret key, which will be used to encrypt the information during the subsequent conversation.

5. Features of HTTPS

HTTPS has the following advantages:

  • HTTPS authenticates users and servers to ensure that data is sent to the correct client and server.
  • HTTPS can be used for encrypted transmission and identity authentication. The communication is more secure, preventing data from being stolen or modified during transmission and ensuring data security.
  • HTTPS is the most secure solution under the current architecture. Although it is not absolutely secure, it greatly increases the cost of man-in-the-middle attack.

The disadvantages of HTTPS are as follows:

  • HTTPS requires encryption and decryption of both the server and the client, which consumes more server resources and is complicated.
  • The HTTPS handshake phase is time-consuming and increases the page loading time.
  • SSL certificates require a fee. The more powerful the certificate, the higher the fee.
  • HTTPS connection consumes much more resources on the server, and it costs more to support websites with more visitors.
  • An SSL certificate must be bound to an IP address. One IP address cannot be bound to multiple domain names.

6. HTTPSHow is it safe?

Understand two concepts:

  • Symmetric encryption: Both communication parties use the same secret key for encryption and decryption. Although symmetric encryption is very simple and has good performance, it cannot solve the problem of sending the secret key to each other for the first time, and the secret key is easy to be intercepted by hackers.
  • Asymmetric encryption:
  1. Private key + public key = key pair
  2. Data that is encrypted with the private key can be decrypted only with the corresponding public key. Data that is encrypted with the public key can be decrypted only with the corresponding private key
  3. Since each communication party has its own key pair, both parties send their public keys to each other before communication
  4. Then the other party takes the public key to encrypt the data and responds to the other party. When the other party gets there, the other party uses its private key to decrypt the data

Although asymmetric encryption is more secure, it is slow and affects performance.

Solution:

In the combination of the two encryption methods, the symmetric encryption key is encrypted using the public key of asymmetric encryption, and then sent. The receiver decrypts the symmetric encryption key using the private key to obtain the symmetric encryption key. Then the two parties can use symmetric encryption to communicate.

Then there is another problem, the middleman problem: If there is a middleman between the client and the server, the middleman only needs to replace the public key sent by the client and the server with his own public key. In this way, the middleman can easily decrypt all the data sent by the client and the server.

So at this time, a secure third party certificate (CA) is needed to prove the identity of the identity, to prevent man-in-the middle attack. A certificate includes the issuer, purpose, public key, private key, HASH algorithm, and expiration time of the certificate.

But the question is, if the middleman tampered with the certificate, then the identity certificate is invalid? This proof is bought for nothing, this time needs a new technology, digital signature.

In digital signature, the CA HASH algorithm is used to HASH the contents of the certificate to obtain a summary, and then the CA private key is used to encrypt the summary. Finally, a digital signature is formed. When someone sends his certificate, I use the same Hash algorithm to generate the message digest again. Then I use the CA’s public key to decrypt the digital signature and get the message digest created by the CA. By comparing the two, I can know whether the message digest is tampered with. At this time can ensure the maximum degree of communication security.

3. HTTP status code

Category of status code:

category why describe
1xx Informational status code The accepted request is being processed
2xx Success status code The request is processed normally
3xx Redirection(Redirection status code) Additional operations need to be performed once the request is completed
4xx Client Error status code The server could not process the request
5xx Server Error status code The server failed to process the request

1. 2XX (Success status code)

Status code 2XX indicates that the request was processed properly.

(1) 200 OK

200 OK indicates that the request from the client is processed properly by the server.

(2) 204 No Content

The status code indicates that the request sent by the client has been processed by the server, but no content is returned. The response packet does not contain the body of the entity. Usually used when you only need to send information from the client to the server, but the server does not need to send content to the client.

(3) 206 Partial Content

The status code indicates that the client made the scope request and the server performed this part of the GET request. The response packet contains the entity Content in the content-range Range.

2. 3XX (Redirection status code)

The result of the 3XX response indicates that the browser needs to perform some special processing to process the request correctly.

(1) 301 Moved Permanently

Permanent redirection. This status code indicates that the requested resource has been assigned a new URI and that the URI specified by the resource should be used in the future. The new URI is specified in the Location header field of the HTTP response header. If the user has already saved the original URI as a bookmark, the bookmark is resaved based on the new URI in the Location. At the same time, as search engines crawl new content, they replace old urls with redirected ones.

Usage scenario:

  • When we want to change the domain name, the old domain name is no longer used, the user accesses the old domain name with 301 redirect to the new domain name. In fact, is to tell the search engine included domain name needs to be included in the new domain name.
  • In the search engine search results appear without the WWW domain name, but with the WWW domain name is not included, this time can use 301 redirect to tell the search engine our target domain name is which one.

(2) 302 Found

Temporary redirection. This status code indicates that the requested resource has been allocated to a new URI and that the user is expected to access the resource using the new URI. Similar to 301 Moved Permanently, but 302 represents a temporary redirect, not a permanent redirect. This means that the URI of the moved resource may change in the future. If the user saves the URI as a bookmark, instead of updating the bookmark as if the 301 status code were present, the user still retains the URI of the page that returned the 302 status code. At the same time, search engines grab new content and keep old urls. Because the server returns a 302 code, the search engine thinks the new url is temporary.

Usage scenario:

  • When we are doing the activity, the login to the home page automatically redirects to the activity page.
  • Unlogged users access the user center and are redirected to the login page.
  • Visit the 404 page to redirect to the home page.

(3) 303 See Other

The status code indicates that because the requested resource has another URI, the requested resource should be directed to using the GET method. The 303 status code has similar functions to the 302 Found status code, but the 303 status code indicates that the client should use the GET method to obtain resources.

The 303 status code, usually returned as a result of a PUT or POST operation, indicates that the redirect link is not directed to the newly uploaded resource, but to another page, such as a message confirmation page or upload progress page. Always use GET to request a redirected page.

Note:

  • When the response status codes 301, 302, and 303 are returned, almost all browsers change POST to GET and delete the body in the request packet. After that, the request is automatically sent again.
  • The 301 and 302 standards forbid changing POST methods to GET methods, but everyone does.

(4) 304 Not Modified

Browser cache related. This status code indicates that when a client sends a conditional request, the server allows the request to access the resource, but the conditional request is not met. When the 304 status code returns, it does not contain the body of any response. Although 304 is classified in the 3XX category, it has nothing to do with redirects.

Conditional request (Http conditional request) : The request packet uses the Get method. The request packet contains any header in if-match, if-none-match, if-modified-since, if-unmodified-since, or if-range.

The status code 304 is not an error, but tells the client that there is a cache and uses the data in the cache directly. Only the header information is returned to the page, but there is no content, which improves the performance of the page to some extent.

(5) 307 Temporary Redirect

307 indicates temporary redirection. This status code has the same meaning as 302 Found, even though the 302 standard prohibits POST from becoming GET.

The 307 will comply with browser standards and will not change from POST to GET. But when it comes to the behavior of processing requests, different browsers will have different situations. The specification requires the browser to continue Posting content to the address of Location. The specification requires the browser to continue Posting content to the address of Location.

3. 4XX (Client Error)

The response from 4XX indicates that the client is the cause of the error.

(1) 400 Bad Request

The status code indicates that a syntax error exists in the request packet. When an error occurs, modify the content of the request and send the request again. In addition, the browser treats the status code as if it were 200 OK.

(2) 401 Unauthorized

The status code indicates that the request to be sent must have HTTP authentication information, such as BASIC authentication and DIGEST authentication. If the request has been made once before, the user authentication fails

The response returned with 401 must contain a wwW-Authenticate header applicable to the requested resource to challenge the user information. When the browser receives the 401 response for the first time, the authentication dialog window is displayed.

The following will occur in 401:

  • 401.1 – Login failed.
  • 401.2 – Login failed due to server configuration.
  • 401.3 – Not authorized due to ACL restrictions on resources.
  • 401.4 – Filter authorization failed.
  • 401.5 – ISAPI/CGI application authorization failed.
  • 401.7 – Access denied by the URL authorization policy on the Web server. This error code is specific to IIS 6.0.

(3) Forbidden

The status code indicates that access to the requested resource has been denied by the server. The server does not need to give a detailed reason, but it can be explained in the body of the response packet entity. After entering this state, you cannot continue the authentication. This access is permanently forbidden and is closely related to the application logic.

IIS defines a number of different 403 errors that indicate the more specific cause of the error:

  • 403.1 – Access is denied.
  • 403.2 – Read access forbidden.
  • 403.3 – Write access denied.
  • 403.4 – SSL is required.
  • 403.5 – Requires SSL 128.
  • 403.6 – The IP address was rejected.
  • 403.7 – Client certificates are required.
  • 403.8 – Site access denied.
  • 403.9 – Too many users.
  • 403.10 – Invalid configuration.
  • 403.11 – Password change.
  • 403.12 – Access to the mapping table is denied.
  • 403.13 – The client certificate is revoked.
  • 403.14 – Reject directory list.
  • 403.15 – Client access permission exceeded.
  • 403.16 – The client certificate is untrusted or invalid.
  • 403.17 – The client certificate has expired or has not taken effect
  • 403.18 – The requested URL could not be executed in the current application pool. This error code is specific to IIS 6.0.
  • 403.19 – CGI cannot be performed for clients in this application pool. This error code is specific to IIS 6.0.
  • 403.20 – Passport login failed. This error code is specific to IIS 6.0.

(4) 404 Not Found

The status code indicates that the requested resource could not be found on the server. In addition, it can also be used when the server side rejects a request and does not want to give a reason. 404 occurs in the following cases:

  • 404.0 – (None) – No file or directory was found.
  • 404.1 – Web site cannot be accessed on the requested port.
  • 404.2 – Web services extension locking policy blocks this request.
  • 404.3 – MIME mapping policy blocks this request.

(5) the 405 Method is Not Allowed

The status code indicates that the method requested by the client is recognized by the server, but the server does not allow the use of the method. GET and HEAD methods that the server should always allow the client to access. A client can use the OPTIONS method (precheck) to see what access methods the server allows, as shown below

Access-Control-Allow-Methods: GET,HEAD,PUT,PATCH,POST,DELETE
Copy the code

4. 5XX (Server Error)

The response from 5XX indicates that the server itself has an error.

(1) 500 Internal Server Error

This status code indicates that an error occurred while the server was executing the request. It could be a bug or some temporary glitch in the Web application.

(2) 502 Bad Gateway

This status code indicates that the server acting as a gateway or proxy received an invalid response from the upstream server. Note that 502 errors are usually not fixed by the client, but rather by a passing Web server or proxy server. 502 occurs in the following cases:

  • 502.1 – THE CGI (Common Gateway Interface) application timed out.
  • 502.2 – CGI (Common Gateway Interface) application error.

(3) 503 Service Unavailable

This status code indicates that the server is temporarily under load or is down for maintenance and is now unable to process requests. If you know in advance how long it will take to unwind the situation, you’d better write the RetryAfter header field back to the client.

Usage scenario:

  • When the server is down for maintenance, 503 is used to respond to requests.
  • Nginx returns 503 if the rate limit is exceeded.

(4) 504 Gateway Timeout

This status code indicates that the gateway or proxy server cannot get the desired response within the specified time. It is a new addition to HTTP 1.1.

Usage scenarios: Code execution times out, or an infinite loop occurs.

5. To summarize

(1) 2XX succeeded

  • 200 OK: indicates that the request from the client was processed correctly on the server
  • 204 No content: The request is successful, but the response message does not contain the body of the entity
  • 205 Reset Content: indicates that the request succeeds, but the response message does not contain the body of the entity. Different from 204 response, however, it requires the requester to Reset the Content
  • 206 Partial Content, making scope request

(2) 3XX redirection

  • 301 Moved, permanently redirected, indicating that resources have been allocated to new urls
  • 302 found, temporary redirection, indicating that the resource was temporarily assigned a new URL
  • 303 see other: indicates that the resource has another URL. Use the GET method to obtain the resource
  • 304 Not modified: Indicates that the server allowed access to the resource, but the request condition was not met
  • Temporary redirect is similar to 302, but the client is expected to keep the request method unchanged and send requests to new addresses

(3) The 4XX client is incorrect

  • 400 Bad Request: the request packet has syntax errors
  • 401 Unauthorized: requests are sent with HTTP authentication information
  • 403 Forbidden: indicates that the server denies access to the requested resource
  • 404 Not found: The requested resource was not found on the server

(4) 5XX server error

  • 500 Internal sever Error: indicates that an error occurs when the server executes the request
  • 501 Not Implemented: The server does Not support a function required by the current request
  • 503 Service Unavailable: Indicates that the server is temporarily overloaded or down for maintenance and cannot process requests

6. It’s also a redirect.307.303.302The difference between?

302 is the protocol status code of HTTP1.0. In the http1.1 version, in order to refine the 302 status code, two 303 and 307 came out. 303 specifies that the client should use the GET method to obtain the resource, which will redirect the POST request to get request. The 307 will follow browser standards and will not change from POST to GET.

Iv. DNS Protocol Introduction

1. What is DNS protocol

Concept: DNS is short for Domain Name System, which provides a host Name to IP address translation service, also known as the Domain Name System. It is a distributed database composed of hierarchical DNS servers. It is an application-layer protocol that defines how hosts query the distributed database. It makes it easier for people to access the Internet without having to remember a string of IP addresses that can be read directly by a machine.

Function: After a domain name is resolved to an IP address, the client sends a domain name query request to the DNS server (the DNS server has its own IP address), and the DNS server informs the client of the IP address of the Web server.

2. Does DNS use BOTH TCP and UDP?

The DNS occupies port 53 and uses both TCP and UDP. (1) The TCP protocol is used during the zone transmission

  • The secondary DNS server queries the primary DNS server periodically (usually 3 hours) to check whether the data has changed. If any changes are made, a regional transmission is performed to synchronize the data. TCP is used for zone transport instead of UDP because the amount of data that can be transmitted synchronously is much larger than the amount of data that can be sent in response to a request.
  • TCP is a reliable connection that ensures data accuracy.

(2) Use UDP protocol in domain name resolution

  • The client queries the domain name from the DNS server. Generally, the returned content does not exceed 512 bytes and is transmitted using UDP. The DNS server loads less and responds faster without the three-way handshake. In theory, the client can specify TCP for the DNS server query, but in fact, many DNS servers only support UDP query packets.

3. Complete DNS query process

The DNS server resolves a domain name as follows:

  • First, it searches for the corresponding IP address in the browser cache. If the IP address is found, it returns directly. If the IP address cannot be found, it continues to the next step
  • The DNS server sends the request to the local DNS server and searches for the request in the cache of the local DNS server. If the request is found, the DNS server returns the search result. If the request is not found, the DNS server continues to the next step
  • The local DNS server sends a request to the root DNS server, and the root DNS server returns a TOP-LEVEL DNS server address of the queried domain
  • The local DNS server sends a request to the top-level DNS server. The receiving server queries its own cache and returns the query result if there is a record, or the address of the relevant authoritative DNS server at the next level if there is no record
  • The local DNS server sends a request to the authoritative DNS server, and the DNS server returns the corresponding result
  • The local DNS server saves the returned results in the cache for future use
  • The local DNS server returns the result to the browser

For example, to query the IP address of www.baidu.com, the browser first searches for the cache of the domain name. If the cache does not exist, the request is sent to the local DNS server. The local DNS server determines whether the cache of the domain name exists. The root DNS server returns a list of IP addresses for the top-level DNS server responsible for.com. The local DNS server then sends a request to one of the.com TOP-LEVEL DNS servers, and the.com TOP-LEVEL DNS server returns the IP address list of the.baidu authoritative DNS server. Then the local DNS server sends a request to one of the authoritative DNS servers, and the authoritative DNS server returns a list of IP addresses corresponding to the host name.

4. Iterative query and recursive query

In fact, DNS resolution is a process involving both iterative and recursive queries.

  • Recursive query refers to that after the query request is sent, the DNS server sends the request on behalf of the DNS server at the next level and returns the final query result to the user. With a recursive query, the user only needs to issue a query request once.
  • An iterative query is when the DNS server returns the result of a single query after a query request. Queries at the next level are requested by the user. With iterative queries, the user needs to make multiple query requests.

The way we usually send a request to the local DNS server is a recursive query, because we only need to make the request once, and then the local DNS server returns the final result of the request. The process of the local DNS server to request other DNS servers is an iterative query process, because each DNS server returns only one query result, and the local DNS server performs the query at the lower level by itself.

5. DNS records and packets

The DNS server stores information in the form of resource records. Generally, each DNS response packet contains multiple resource records. The specific format of a resource record is

(Name, Value, Type, TTL)Copy the code

Where TTL is the lifetime of the resource record, it defines how long the resource record can be cached by other DNS servers.

There are four common values of Type: A, NS, CNAME, and MX. Different values of Type indicate different meanings of resource records.

  • If Type = A, Name is the host Name and Value is the IP address corresponding to the host Name. Therefore, A resource record for A provides A standard host name to IP address mapping.
  • If Type = NS, Name is a domain Name and Value is the host Name of the DNS server responsible for the domain Name. This record is used in THE DNS chain query to return the information about the DNS server to be queried at the next level.
  • If Type = CNAME, Name is an alias and Value is the canonical host Name of the host. This record is used to return a standard host name corresponding to the host name to the query host, so that the query host can query the IP address of the host name. The main purpose of a host alias is to provide a simple, memorable alias for complex host names.
  • If Type = MX, Name is the alias of a mail server and Value is the canonical host Name of the mail server. It works the same way as cnames, in order to overcome the memory disadvantage of standardizing host names.

5. Network model

1. OSI seven-layer model

ISOIn order to better make the network application more popular, launchedOSIReference model.

(1) Application layer

The layer closest to the user in the OSI reference model provides the application interface to the computer user as well as various network services directly to the user. Our common application layer network service protocols are: HTTP, HTTPS, FTP, POP3, SMTP and so on.

  • There are often requests for data in the client and server, and this is the time to use itHyper Text Transfer Protocol (HTTP)orhttpsWe often use this protocol when designing data interfaces on the back end.
  • FTPFile transfer protocol, in the development process, not personally involved, but I think, in some resource sites, such asBaidu net disk ' 'thunderboltIt should be based on this agreement.
  • SMTPisSimple Mail Transfer Protocol. In one project, this protocol was used to enable users to log in with email verification codes.

(2) Representation layer

The presentation layer provides various encoding and conversion functions for application layer data to ensure that the data sent by the application layer of one system can be recognized by the application layer of another system. If necessary, this layer provides a standard representation for converting multiple data formats within a computer into a standard representation for communication. Data compression and encryption are also among the transformation capabilities provided by the presentation layer.

In project development, data can be codec using Base64 in order to facilitate data transmission. If divided by function, Base64 should work at the presentation layer.

3) Session layer

The session layer is responsible for establishing, managing, and terminating communication sessions between presentation layer entities. Communication in this layer consists of service requests and responses between applications in different devices.

(4) Transport layer

The transport layer establishes an end-to-end link between hosts. The transport layer provides end-to-end reliable and transparent data transmission services for upper-layer protocols, including error control and flow control. This layer shields the details of the data communication from the lower layer to the higher layer, so that the higher layer users only see a host to host, user-controlled and configurable, reliable data path between the two transmission entities. This is where we usually talk about TCP and UDP. The port number is the “end” here.

(5) Network layer

This layer establishes the connection between two nodes through IP addressing, selects the appropriate routing and switching nodes for the packets sent by the transport layer at the source end, and sends them to the transport layer at the destination end according to the correct address. This is commonly referred to as the IP layer. This layer is often referred to as the IP protocol layer. IP protocol is the foundation of the Internet. We can understand that the network layer determines the transmission route of packets, while the transport layer determines the transmission mode of packets.

(6) Data link layer

Bits are combined into bytes, bytes are combined into frames, media is accessed using link-layer addresses (Ethernet uses MAC addresses), and error detection is performed. Comparing the network layer with the data link layer, we can understand that the network layer plans the transmission route of packets, and the data link layer is the transmission route. However, error control has been added to the data link layer.

(7) Physical layer

The actual transmission of the final signal is achieved through the physical layer. Transmit bit streams over physical media. The level, speed and cable pin are specified. Common devices include hubs, Repeaters, modems, network cables, twisted pair cables, and coaxial cables. These are all transport media in the physical layer.

The communication features of the OSI layer 7 model are as follows: Peer-to-peer communication Peer-to-peer communication: In order to transfer data packets from the source to the destination, each layer of the SOURCE OSI model must communicate with the peer layer of the destination. This communication mode is called peer-to-peer communication. During the communication of each layer, the layer uses its own protocol for communication.

2. TCP/IP Layer 5 protocol

TCP/IPFive layers of protocol andOSIThe seven-layer protocols are as follows:

  • Application layer: Directly provides services for application processes. Application layer protocols define the rules for communication and interaction between application processes. Different applications have different application layer protocols, such as HTTP (World Wide Web service), FTP (file transfer), SMTP (email), and DNS (domain name query).
  • Transport Layer: sometimes also referred to as the transport layer, which is responsible for providing communication services to processes on two hosts. This layer has the following two types of protocols:
    • Transmission Control Protocol (TCP) : Provides connection-oriented and reliable data Transmission services. The basic unit of data Transmission is the segment.
    • User Datagram Protocol (UDP) : Provides a connectionless data transmission service that tries its best, but does not guarantee the reliability of data transmission. The basic unit of data transmission is User Datagram.
  • Internet layer: Sometimes translated as the Internet layer, it is responsible for providing communication services between two hosts and transmitting data to the target host by selecting appropriate routes.
  • Data Link Layer: Encapsulates the IP datagram handed over by the network layer into frames and transmits frames between two adjacent nodes of the link. Each frame contains data and necessary control information (such as synchronization information, address information, error control, etc.).
  • Physical Layer: Ensures that data can be transmitted over various physical media, providing a reliable environment for data transmission.

As can be seen from the figure above, the TCP/IP model is more concise than the OSI model, which integrates the application layer/presentation layer/session layer into the application layer.

Different devices work at each layer. For example, switches work at the data link layer, and routers work at the network layer.The protocols implemented at each layer are also different, i.e. the services at each layer are different. The following figure shows the main transport protocols at each layer:

In the same way,TCP/IPThe communication mode of layer 5 protocol is also peer-to-peer communication:

TCP and UDP

1. Concepts and features of TCP and UDP

TCP and UDP are transport-layer protocols that belong to the TCP/IP family:

(1) the UDP

UDP stands for User Datagram Protocol (UDP). It is a connectionless protocol used to process packets in networks, just like TCP. In the OSI model, at the transport layer, it is one layer above the IP protocol. UDP has the disadvantage of not providing packet grouping, assembly, and sorting, which means that once a packet is sent, it is impossible to know whether it arrived safely and intact.

Its features are as follows:

1) Connectionless

First of all, UDP does not require a three-way handshake to establish a connection as TCP does. You can send data when you want. In addition, it is only a carrier of data packets and does not split or splice data packets.

Specifically:

  • At the sending end, the application layer passes the data to the UDP protocol at the transport layer, which only adds a UDP header to the data and then passes it to the network layer
  • At the receiving end, the network layer passes the data to the transport layer. UDP only removes the IP packet header and passes the data to the application layer without any splicing operation

2) Unicast, multicast and broadcast functions

UDP not only supports one-to-one transmission mode, but also supports one-to-many, many-to-many, and many-to-one modes. In other words, UDP provides unicast, multicast, and broadcast functions.

3) Message oriented

The UDP packet sent by the sender to the application program adds the header and delivers the packet to the IP layer. UDP does not merge or split the packets sent by the application layer, but retains the packet boundaries. Therefore, the application must select the appropriate size of the packet

4) Unreliability

First of all, unreliability is reflected in no connection, communication does not need to establish a connection, want to send, such a situation is certainly not reliable.

The data is passed as it is received, without backing up the data, and without caring whether the data has been received correctly.

Again, the network environment is up and down, but UDP, because there is no congestion control, will always send data at a constant speed. Even if the network is not good, the transmission rate will not be adjusted. The disadvantage of this implementation is that it may cause packet loss in the case of poor network conditions, but the advantage is also obvious. In some real-time scenarios (such as teleconference), it is necessary to use UDP instead of TCP.

5) The header overhead is small, and the transmission of data packets is very efficient.

The UDP header contains the following data:

  • Two 16-bit port numbers, the source port (optional field) and the destination port
  • The length of the entire data packet
  • The validation sum of the entire data packet (IPv4 optional field), which is used to detect errors in the header and data

Therefore, the header overhead of UDP is small, only 8 bytes, compared with the TCP header of at least 20 bytes, much less, in the transmission of data packets is very efficient.

(2) TCP Transmission Control Protocol (TCP) is a connection-oriented, reliable, byte stream based transport layer communication protocol. TCP is a connection-oriented, reliable streaming protocol (a stream is an uninterrupted data structure).

It has the following characteristics:

1) Connection-oriented

Connection-oriented: a connection must be established between the two ends before data can be sent. The way to establish a connection is to use a “three-way handshake” to establish a reliable connection. Establishing a connection lays the foundation for the reliable transmission of data.

2) Only unicast transmission is supported

Each TCP transmission connection has only two endpoints and supports only point-to-point data transmission. Multicast and broadcast transmission modes are not supported.

3) Byte stream oriented

Unlike UDP, TCP does not transmit individual packets independently. Instead, TCP transmits packets as a byte stream without preserving packet boundaries.

4) Reliable transmission

For reliable transmission, packet loss and error code are determined by TCP segment number and acknowledgement number. To ensure reliable transmission of packets, TCP assigns a serial number to each packet, and the serial number also ensures that the packets sent to the receiving entity are received in order. The receiving entity then sends back an acknowledgment (ACK) that the byte has been successfully received; If the sending entity does not receive an acknowledgement within a reasonable round trip delay (RTT), the corresponding data (assuming it is lost) will be retransmitted.

5) Provide congestion control

When the network is congested, TCP can reduce the rate and amount of data injected into the network to ease congestion.

6) Provide full-duplex communication

TCP allows applications on both sides of the connection to send data at any time, because both sides of the TCP connection have caches that temporarily store data from two-way communication. Of course, TCP can send a data segment immediately, or it can cache for a period of time to send more data segments at once (the maximum data segment size depends on the MSS)

2. Differences between TCP and UDP

UDP TCP
Whether connection There is no connection connection-oriented
reliable Unreliable transmission, no use of flow control and congestion control Reliable transmission (data sequence and correctness), using flow control and congestion control
Number of connected objects Support one-to-one, one-to-many, many-to-one and many-to-many interaction communication Only one to one communication
transport For a message Byte oriented stream
The first overhead Header overhead is small, only 8 bytes The header has a minimum of 20 bytes and a maximum of 60 bytes
Applicable scenario Suitable for real-time applications such as video conferencing and live streaming Suitable for applications that require reliable transfers, such as file transfers

3. Use scenarios of TCP and UDP

  • TCP application scenario: This scenario has low efficiency requirements but high accuracy requirements. Because the transmission needs to confirm the data, resend, sort and other operations, compared with UDP high efficiency. For example, file transfer (accuracy is highly required, but the speed can be relatively slow), receiving mail, and remote login.
  • UDP application scenario: A scenario that requires high efficiency but low accuracy. For example: QQ chat, online video, voip (instant messaging, high speed, but occasional intermittent is not too big a problem, and you can not use the resend mechanism at all), broadcast communication (broadcast, multicast).

4. Why is UDP unreliable?

UDP does not require a connection to be established before data transmission. The transport layer of the remote host does not need to confirm the received UDP packets and thus provides unreliable delivery. Summarize the following four points:

  • No message delivery guaranteed: no acknowledgement, no retransmission, no timeouts
  • No guarantee of delivery order: no package number is set, no reordering, and no queue blocking occurs
  • Do not track connection status: There is no need to establish a connection or restart the state machine
  • No congestion control: No built-in client or network feedback mechanism

5. TCP retransmission mechanism

The lower layer of TCP (network layer) may be lost, duplicated, or disordered. Therefore, TCP provides reliable data transmission services. To ensure correct data transmission, TCP retransmits packets that it considers to be lost (including bit errors in the packet). TCP uses two separate mechanisms for retransmission, one based on time and the other based on confirmation information.

After sending a packet of data, TCP starts a timer. If no ACK packet is received within this period, TCP retransmits the packet. If the packet is not successfully sent for a certain number of times, TCP gives up and sends a reset signal.

6. TCP congestion control mechanism

The TCP congestion control mechanisms are as follows:

  • Slow start (slow start)
  • Congestion avoidance
  • The fast retransmission
  • Fast recovery

(1) Slow start (slow start)

  • Set CWND = 1 at the start of sending (CWND refers to congestion window)
  • Idea: Instead of sending a lot of data at first, test the network congestion level and increase the size of the congestion window from small to large.
  • To prevent CWND from growing too large and causing network congestion, set a slow start threshold (SSthresh state variable)
    • When CNWD < SSthRESH, use the slow start algorithm
    • When CNWD = SSthRESH, either the slow start algorithm or the congestion avoidance algorithm can be used
    • When CNWD > SSthRESH, use the congestion avoidance algorithm

(2) Congestion avoidance

  • Congestion avoidance may not be able to completely avoid congestion. It means that in the phase of congestion avoidance, the congestion window is controlled to increase linearly, so that the network is not prone to congestion.
  • Let the congestion window CWND increase slowly, that is, each return time RTT increases the sender’s congestion control window by one
  • Either in the slow start phase or in the congestion avoidance phase, as long as the sender determines that the network is congested, the slow start threshold is set to half the size of the send window when the congestion is present. Then set the congestion window to 1 and execute the slow start algorithm. As shown in the figure:Among them, the basis for judging network congestion is that no acknowledgement is received. Although no acknowledgement may be caused by packet loss due to other reasons, it is treated as congestion because it cannot be determined.

(3) Fast retransmission

  • Fast retransmission requires the receiver to send a double acknowledgement immediately after receiving a misordered segment (so that the sender can know as soon as possible that a segment has not reached the recipient). As long as the sender receives three consecutive double acknowledgements, the sender immediately retransmits the unreceived packet segments without waiting for the expiration of the retransmission timer.
  • Because there is no need to wait for the set retransmission timer to expire, unacknowledged segments can be retransmitted as early as possible, improving the throughput of the entire network

(4) Quick recovery

  • When the sender receives three consecutive double acknowledgements, the “multiply reduce” algorithm is performed to halve the SSthresh threshold. But the slow start algorithm is not executed next.
  • The sender now thinks that the network may not be congested, given that it does not receive multiple confirmations if the network is congested. So instead of executing the slow start algorithm at this time, CWND is set to the size of SSthRESH and the congestion avoidance algorithm is executed.

7. TCP traffic control mechanism

Generally, flow control is designed to keep the sender from sending data too fast and the receiver from receiving it. TCP uses sliding Windows of variable size for flow control. The unit of window size is byte. In this case, the window size is actually the size of the data transmitted each time.

  • When a connection is established, each end of the connection allocates a buffer to hold incoming data and sends the size of the buffer to the other end.
  • When the data arrives, the receiver sends an acknowledgement containing its remaining buffer size. The size of the remaining buffer space is called the window, and the notification indicating the window size is called the window notification. Each acknowledgement sent by the recipient includes a window notification.
  • If the recipient application can read the data as fast as the data arrives, the recipient will send a positive window notification with each acknowledgement.
  • If the sender operates faster than the receiver, the received data will eventually fill the receiver’s buffer, causing the receiver to notify a zero window. When the sender receives a zero window notification, it must stop sending until the receiver resends a positive window.

8. Reliable transmission mechanism of TCP

The reliable transport mechanism of TCP is based on continuous ARQ protocol and sliding window protocol.

TCP protocol in the sender to maintain a send window, send window before the message is already sent and confirmed the message, send window contains a message has been sent but not yet confirmed and allowed to send but haven’t send a message, send window message later period of the cache is not allowed to send a message. When the sender sends packets to the receiver, the sender sends all the packet segments in the window in sequence and sets a timer. The timer can be interpreted as the packet segments that are sent first but receive no acknowledgement. If an acknowledgement reply from a certain segment is received within the time of the timer, the window is slid and the head of the window is slid back to the position after the acknowledgement segment. If there are still sent but no acknowledgement segments, the timer is reset. If there are no acknowledgement segments, the timer is disabled. If the timer expires, all segments that have been sent but have not received acknowledgement are resended and the timeout interval is set to double the previous interval. When the sender receives the three redundant acknowledgement replies from the receiver, it indicates that the subsequent packet segments are likely to be lost. In this case, the sender enables the fast retransmission mechanism, that is, sends all the sent but acknowledged packet segments before the current timer ends.

The receiver uses the cumulative acknowledgement mechanism. For all segments arriving in sequence, the receiver returns a positive response of the segment. If a segment is received out of order, the receiver discards it and returns a positive response to the nearest segment in order. The use of cumulative acknowledgement ensures that all segments before the returned acknowledgement number have arrived in sequence, so the sending window can be moved to the end of the acknowledged segment.

The size of the sending window varies, which is determined by the remaining size of the receiving window and the congestion degree on the network. TCP controls the sending rate of packet segments by controlling the length of the sending window.

But TCP is not exactly the same as the sliding window protocol, because many TCP implementations cache out-of-order segments and only retransmit one segment when a retransmission occurs, so the reliable transmission mechanism of TCP is more like a hybrid of the window sliding protocol and the selective retransmission protocol.

9. TCP three handshakes and four waves

(1) Three handshakes

Three-way Handshake is when a TCP connection is established with a total of Three packets sent by the client and server. The main function of the three-way handshake is to confirm whether the receiving and sending capabilities of both parties are normal, and specify their own initialization sequence number for the reliability of later transmission. In essence, it is to connect to the specified port of the server, establish a TCP connection, synchronize the serial number and acknowledgement number of the two sides of the connection, and exchange TCP window size information.

The client is in the Closed state and the server is in the Listen state.

  • First handshake: The client sends a SYN packet to the server with the ISN initializing sequence number and the client is in the SYN_SEND state.

The segment with SYN=1 in the header, seq=x, and SYN=1 cannot carry data, but consumes a sequence number.

  • Second handshake: After receiving a SYN packet from the client, the server responds with its own SYN packet and specifies its own initial sequence number (ISN). In addition, the ISN + 1 of the client is used as the ACK value to indicate that the server has received the SYN from the client and is in the SYN_REVD state.

In the acknowledgement segment, SYN=1, ACK=1, ACK= x+1, and initial serial number seq=y

  • Third handshake: After receiving a SYN packet, the client sends an ACK packet. Of course, the ISN + 1 is used as the ACK value to indicate that the client has received a SYN packet from the server. In this case, the client is in the ESTABLISHED state. After receiving an ACK packet, the server is in the ESTABLISHED state. In this case, a connection has been ESTABLISHED between the server and the SERVER.

Acknowledgement segment ACK=1, acknowledgement number ACK= y+1, sequence number SEq =x+1 (the initial value is SEq =x, and the second segment should be +1), ACK segment can carry data, without data, no sequence number is consumed.

So why the three handshakes? Not twice?

  • To confirm that both receiving and sending capabilities are normal
  • If you use two handshakes, the following happens:

If the client sends a connection request but does not receive acknowledgement because the connection request packet is lost, the client retransmits the connection request. Confirmation was later received and a connection was established. After data transmission is completed, the connection is released. The client sends two connection request segments, among which the first one is lost and the second one reaches the server. However, the first lost segment is only delayed in some network nodes for a long time and reaches the server at some time after the release of the connection. The server for the client and send a new connection request, and then send a confirmation message to the client, agreed to establish a connection, do not use three-way handshake, as long as the server send confirmation, to establish a new connection, confirmation letter from the client to ignore the service side, at this time also not send data, consistent service side waiting for the client to send data, waste of resources.

These are three simple steps:

  • First handshake: The client sends a connection request packet segment to the server. The packet contains its own initial data communication serial number. After the request is SENT, the client enters the SYN-sent state.
  • Second handshake: After receiving the connection request packet and agreeing to connect, the server sends a reply containing its own initial serial number. After the reply is complete, the server enters syn-received state.
  • Third handshake: When the client receives the connection consent reply, it also sends an acknowledgement packet to the server. The client enters the ESTABLISHED state after sending the packet segment, and the server enters the ESTABLISHED state after receiving the reply. In this case, the connection is ESTABLISHED.

The process of establishing a TCP three-way handshake is to confirm the initial sequence number of each other and tell each other what sequence number can be correctly received. The third handshake is used by the client to confirm the initial serial number of the server. If only two handshakes are used, there is no way for the server to know if its serial number has been confirmed. At the same time, this is to prevent the invalid request segment received by the server, and the error situation.

(2) Four waves

Both sides start in the ESTABLISHED state, suppose the client initiates a close request first. The process of four waves is as follows:

  • First wave: The client sends a FIN packet. The packet contains a serial number. The client is in the FIN_WAIT1 state.

That is, it sends the connection release packet segment (FIN=1, serial number SEq = U), stops sending data again, actively closes the TCP connection, enters the FIN_WAIT1 state, and waits for the acknowledgement from the server.

  • Second wave: After receiving the FIN packet, the server sends an ACK packet and uses the client sn +1 as the SERIAL number of the ACK packet to indicate that it has received the packet from the client. In this case, the server is in CLOSE_WAIT state.

That is, after receiving the connection release packet segment, the server sends a acknowledgement packet segment (ACK=1, ACK= U +1, seq= V), and the server enters CLOSE_WAIT state. At this time, TCP is in half closed state, and the connection between the client and the server is released. After receiving the acknowledgement from the server, the client enters the FIN_WAIT2 state and waits for the connection release segment sent by the server.

  • Third wave: If the server also wants to disconnect the connection, the server sends a FIN packet with a serial number as the first wave on the client. The server is in the LAST_ACK state.

That is, the server sends the connection release packet segment (FIN=1, ACK=1, SEq =w, ACK= U +1) without sending data to the client. The server enters the LAST_ACK state (final acknowledgment) and waits for the acknowledgement from the client.

  • Fourth wave: After receiving the FIN packet, the client sends an ACK packet and uses the server sn +1 as the SN of the ACK packet. In this case, the client is in TIME_WAIT state. The server enters the CLOSED state after receiving ACK packets. After receiving ACK packets, the server closes the connection and is in the CLOSED state.

That is, after receiving the connection release packet from the server, the client sends an ACKNOWLEDGEMENT packet (ACK=1, SEq = U +1, ACK= w+1). The client enters the TIME_WAIT state. In this case, the TCP is not released and the client enters the CLOSED state only after the timer is set at 2MSL.

So why do we need four waves?

After receiving a SYN request packet from a client, the server can directly send a SYN+ACK packet. ACK packets are used for response and SYN packets are used for synchronization. However, when the server closes the connection, it may not close the SOCKET immediately after receiving a FIN packet. Therefore, it must first reply an ACK packet to the client, informing the client that the FIN packet you sent was received. I can only send FIN packets when all packets on the server are sent. Therefore, I cannot send FIN packets together. Therefore, I need to wave my hand four times.

These are four simple steps:

  • First wave: If the client thinks the data has been sent, it needs to send a connection release request to the server.
  • Second wave: After receiving the connection release request, the server tells the application layer to release the TCP connection. It then sends an ACK packet and enters the CLOSE_WAIT state, indicating that the connection between the client and the server has been released and that it is no longer receiving data from the client. But because TCP connections are bidirectional, the server can still send data to the client.
  • Third wave: The server will continue to send data if there is still data to be sent. After that, the server will send a connection release request to the client, and then enter the last-ACK state.
  • Fourth wave: After receiving the release request, the client sends an acknowledgement to the server. In this case, the client enters the time-wait state. The state lasts for 2MSL. If the server does not resend the request within the period, the server enters the CLOSED state. When the server receives the acknowledgement, it enters the CLOSED state.

TCP uses the four-wave wave because TCP connections are full-duplex. Therefore, both parties need to release the connection to each other. If a single party releases the connection, only the table cannot send data to the other party, and the connection is in the half-release state.

In the last wave, the client waits for a period of time before shutting down. This is to prevent the loss or error of the segment of the acknowledgement packet sent to the server. As a result, the server cannot be shut down.

10. What is TCP packet sticking? How to deal with it?

By default, TCP connections enable the delayed transfer algorithm (Nagle algorithm) to cache data before it is sent. If multiple data packets are sent in a short period of time, they are buffered together for one send (see socket.bufferSize for bufferSize), which reduces IO consumption and improves performance.

If it is transferring files, then there is no need to deal with the problem of sticky packets, to a package to a package. But if it’s multiple messages, or data for some other purpose then you need to deal with sticky packets.

Send (data1, data2) send (data1, data2) Send (data1, data2) Data1 is received first, then data2.b. Part of data1 is received first, then the rest of DatA1 and all of datA2 is received. C. All data1 data and part of datA2 data are received, and then the rest of datA2 data are received. D. All data1 and data2 data are received at one time.

Among them, BCD is the common case of sticky packets. For the problem of sticky packets, common solutions are as follows:

  • Wait time before sending multiple messages: You need to wait a period of time before sending the next message. This method is applicable to scenarios with low interaction frequency. The disadvantage is also obvious, the transmission efficiency is too low for more frequent scenarios, but there is very little to do.
  • Disable Nagle: Disable Nagle. In node.js you can disable Nagle using the socket.setnodelay () method so that each send is sent without buffering. This approach is more suitable for scenarios where the data sent each time is relatively large (but not as large as a file) and the frequency is not particularly high. If the amount of data sent each time is relatively small, and the frequency is particularly high, closing Nagle is purely self-deprecating. In addition, this method is not applicable to the situation of poor network, because Nagle algorithm is in the server side of the packet merge situation, but if the client network is bad in a short period of time, or the application layer can not timely recV TCP data due to some reasons, it will cause multiple packets in the client side of the buffer and thus sticky packets. (If the communication is in a stable machine room, the probability is relatively small, so you can choose to ignore)
  • Pack/unpack: Pack/unpack is a common solution in the industry. Before sending each packet, put some characteristic data before/after it, and then split each packet according to the characteristic data when receiving the data.

11. WhyudpDoesn’t stick to the bag?

  • TCP is a flow-oriented protocol, and UDP is a message-oriented protocol. Each UDP segment is a message, and the application must extract data in the unit of the message, rather than extracting any byte of data at a time
  • UDP has a protected message boundary, a header (source address, port, etc.) in each UDP packet, which is easy for the receiver to distinguish. The transport protocol transmits data over the network as a single message, and the receiver can only receive separate messages. The receiving end can receive only one packet sent by the sending end at a time. If the size of the data sent by the receiving end is smaller than the size of the data sent by the sending end at a time, part of the data will be lost. Even if the data is lost, the receiving end will not receive the data twice.

Seven, WebSocket

1. Understanding of WebSocket

WebSocket is a full-duplex communication network technology between browser and server provided by HTML5, which belongs to the application layer protocol. It is based on the TCP transport protocol and multiplexes the HTTP handshake channel. A handshake is all that is needed to create a persistent connection between the browser and the server, and two-way data is transferred.

WebSocket solves the problem of half – duplex communication. Its biggest characteristic is: the server can actively push the message to the client, the client can also actively push the message to the server.

WebSocket principle: The WebSocket server was notified by the client of an event with all recipient IDs, during which all active clients were notified immediately after receiving the event. Only clients whose IDs were in the sequence of recipient IDs would process the event.

WebSocket features are as follows:

  • Support two-way communication, more real-time
  • You can send either text or binary data.
  • Based on THE TCP protocol, the implementation of the server is relatively easy
  • The data format is relatively light, low performance overhead, and efficient communication
  • Without the same origin restriction, the client can communicate with any server
  • The protocol identifier is WS (or WSS if encrypted), and the server URL is the URL
  • It is compatible with THE HTTP protocol. The default ports are also 80 and 443, and the handshake phase uses THE HTTP protocol, so it is not easy to mask the handshake and can pass through various HTTP proxy servers.

The Websocket can be used as follows:

On the client side:

// Write WebSocket directly in index.html and set the server port number to 9999
let ws = new WebSocket('ws://localhost:9999');
// Triggered after the connection is established between the client and the server
ws.onopen = function() {
    console.log("Connection open."); 
    ws.send('hello');
};
// Triggers when the server sends a message to the client
ws.onmessage = function(res) {
    console.log(res);       // Prints a MessageEvent object
    console.log(res.data);  // Print the received message
};
// Triggered after the client and server establish close
ws.onclose = function(evt) {
  console.log("Connection closed.");
}; 
Copy the code

2. Im implementation: What’s the difference between short polling, long polling, SSE and WebSocket?

The purpose of both short and long polling is to implement an instant message between the client and server.

The basic idea of short polling is that the browser sends AN HTTP request to the browser every few days. After receiving the request, the server responds directly regardless of whether the data is updated. This way to achieve instant communication, essentially or the browser to send a request, the server to accept the request of a process, by making the client continue to request, so that the client can simulate real-time data received server changes. The advantage of this approach is that it is relatively simple and easy to understand. The disadvantage of this method is that the server and client resources are wasted due to the need to constantly establish HTTP connections. As the number of users increases, the pressure on the server side increases, which is unreasonable.

The basic idea of long polling is that the client first initiates a request to the server. When the server receives the request from the client, the server does not respond directly. Instead, it suspends the request and then determines whether the data on the server is updated. If there is an update, the response is made, and if there is no data, a certain time limit is reached. After processing the information from the server, the client-side JavaScript response handler makes another request to re-establish the connection. The advantage of long polling over short polling is that it significantly reduces the number of unnecessary HTTP requests and saves resources in comparison. The downside of long polling is that connection hangings can also result in a waste of resources.

The basic idea of SSE: the server uses flow information to push information to the server. Strictly speaking, THE HTTP protocol does not enable the server to actively push information. However, there is a workaround where the server declares to the client that the next thing to send is the stream information. That is, instead of sending a single packet, a stream of data is sent continuously. At this point, the client will not close the connection and will wait for a new data stream from the server, such as video playback. SSE uses this mechanism to push information to the browser using flow information. It is based on the HTTP protocol, which is currently supported by all browsers except IE/Edge. Compared to the previous two methods, it does not need to establish too many HTTP requests, which is relatively resource saving.

WebSocket is a new protocol defined by HTML5. Unlike the traditional HTTP protocol, WebSocket allows the server to actively push information to the client. The disadvantage of using the WebSocket protocol is that the configuration on the server side is more complex. WebSocket is a full-duplex protocol, that is, the communication parties are equal, can send messages to each other, and SSE is one-way communication, only by the server side to push information to the client, if the client needs to send information belongs to the next HTTP request.

Of the four communication protocols above, the first three are based on the HTTP protocol.

For these four communication protocols, WebSocket > Long Connection (SEE) > Long polling > Short polling from a performance point of view. However, if we consider browser compatibility, the order is reversed: Short polling > Long Polling > Long Connection (SEE) > WebSocket.