A, network based TCP | IP

Commonly used network is based on TCP | IP group, the HTTP are a subset of them

  • Protocol: communication, both sides must be based on the same method, such as how to detect the communication target, which side initiates the communication first, which language is used for communication, how to end the communication and other rules need to be determined in advance. Communication between different hardware, operating systems, etc.
  • TCP | IP: it is in the process of IP communications, use the agreement to the clan
  • TCP | IP layer:
    • Application layer: FTP HTTP | | | DNS websocket
    • Transport layer: provides reliable byte stream services, such as TCP | UDP
    • Network layer: Select a transport route among many options, such as IP
    • Data link layer: the hardware part of the network, including the control operating system, hardware device drivers, network cards, and optical fibers

1. IP protocol

Responsible for sending various packets to each other

  • IP address: Specifies the IP address assigned to a node
  • MAC address: the fixed address of the NIC
    • ARP is a protocol used to resolve addresses. The MAC address of the communication party can be traced based on the IP address of the communication party

2, TCP

Three handshakes are used to deliver data accurately to the target

  • SYN(synchronize)
  • ACK(acknowledgement)

3, UDP

  • UDP is connectionless
  • TCP ensures data correctness, while UDP may lose packets
  • TCP guarantees data order, UDP does not
  • TCP connections can only be point-to-point, while UDP supports one-to-one, one-to-many, many-to-one, and many-to-many interactions

4, DNS

Resolves domain names to IP addresses or reversely searches domain names from IP addresses

5. URI and URL

URI: Uniform Resource Identifier (URI), a string used to identify an Internet Resource

Uniform Resource Locator URL: Uniform Resource Locator (URL)

A URL is a subset of a URI

http://user:[email protected]:80/dir/index.html?uid=1#ch1

Protocol scheme name + Login information (authentication) + server address + server port number + hierarchical file path + query string + fragment identifier

HTTP protocol format

1. HTTP request protocol format

<request line>          // HTTP request line, which specifies the request type (method), the resource to access (request-URI), and the HTTP version to use
<headers>               // HTTP request message header that specifies additional information to be used by the server
<blank line>            // Enter a newline
[<request-body>]        // HTTP request body
Copy the code
POST/index. HTTP / 1.1 HTMLRequest message
Host: hackr.jp  # request header fieldConnection: keep-alive Content-Type: application/x-www-form-urlencoded Content-Length: 16 if-modified_since: Thu, 12Jul 2012 07:30:00 GMT# return only resources on the index. HTML page that have been updated since 7:30 p.m. on July 12, 2012. If the content has Not been updated, 304 Not Modified will be returned

name=ueno&age=37 # Content entities
Copy the code

2. HTTP response protocol format

<status line>          // HTTP response status line, by providing a status code to indicate the requested resource. Reason-phrase (" reason phrase ")
<headers>              // HTTP response message header
<blank line>           // Enter a newline
[<response-body>]      // HTTP response body
Copy the code
HTTP / 1.1 200 OKProtocol version + status code + reason phrase used to explain the status code
Date: Tue, 10 Jul 2012 06:50:15 GMT # Optional response header field
Content-Length: 362
Content-Type: text/html

<html> Entity-body of the resource entity

Copy the code

HTTP method

1, the GET

Get resources. The data submitted by GET is displayed in the address bar. Specific browsers and servers have restrictions on the URL length. For example, Internet Explorer has a URL length limit of 2083 bytes (2K+35). For other browsers, such as Netscape, FireFox, etc., there is theoretically no length limit, which depends on the operating system support. Therefore, for GET submissions, the transmitted data is limited by the LENGTH of the URL.

2, POST

Transport entity body. Since the value is not sent through the URL, the data is theoretically unlimited. However, each WEB server sets a limit on the size of the data submitted by post. Apache and IIS6 have their own configurations.

3, PUT

It is used to transfer files and has security problems because it does not have authentication mechanism. Similar Web sites that incorporate Web application validation mechanisms or are architased to REST(REpresentational State Transfer) standards may be open to the use of the PUT method.

4, the HEAD

It is the same as the GET method, but does not return the body part of the packet, which is used to confirm the validity of the URI and the date and time of resource update

5, the DELETE

Used to delete files without authentication, like PUT

6, the OPTIONS

Allow: GET, POST, HEAD, OPTIONS

If a request is made to the server itself rather than to access a specific resource, an * can be used instead of the request URL

The OPTIONS * HTTP / 1.1

7, the TRACE

A method that allows the server to loop back previous request traffic to the client to see how the outgoing request was processed/tampered with. XST (Cross-site Tracing) attacks are easily triggered

8 the CONNECT.

Asked for tunnel protocol connection broker, mainly use the communication content encrypted SSL | TLS tunnel transmission via the network

  • SSL: Secure Sockets Layer
  • TLS: Transport Layer Security
CONNECT Proxy server name: port number HTTP versionCopy the code

HTTP status code

1. Status code type

  • 1XX: The request received by the information character body code is being processed
  • 2XX: Success status code The request is successfully processed
  • 3XX: The redirection status code requires additional operations to complete the request
  • 4XX: Client error status code The server cannot process the request
  • 5XX: Server error status code The server fails to process the request

2. 14 common status codes

When 301, 302, 303 is returned, almost all browsers change POST to GET and remove the body of the request message, after which the request is automatically resended

  • 200 OK: The request is processed normally
  • 204 NO Content: The request was processed successfully, but there are NO resources to return. Used when only information needs to be sent from the client to the server, but no new information content needs to be sent to the client
  • 206 Partial Content: Scope request was successfully processed
  • 301 Moved Permanently: Permanent redirection: indicates that the requested resource has been assigned a new URI. This status code is returned when a ‘/’ is forgotten at the end of the specified resource path
  • 302 Found: A temporary redirection indicating that another URI exists for the requested resource When used instead of 303, it indicates that the client should use the GET method to obtain resources
  • 304 NOT Modified: Allow the request to access the resource, but not If the request header contains if-match, if-modified-since, if-none-match, if-range, if-unmodified-since, and the response does not contain any body of the response. It has nothing to do with redirection
  • 400 Bad Request: The Request packet contains syntax errors. Modify the Request content and send the Request again
  • 401 Unauthorized: The request to be sent requires authentication information that passes HTTP authentication (BASIC authentication and DIGEST authentication). The response must include a WWW-authenticate first challenge user information applicable to the requested resource
  • 403 Forbidden: Access to that resource is Forbidden
  • 404 Not Found: Requested resource Not Found on server. It can also be used when the server rejects the request without giving a reason
  • 500 Internal Server Error: An Error occurred when the Server performed the request
  • 503 Service Unavailable: The server is temporarily overloaded or undergoing maintenance downtime, and cannot process requests. It is best to write the retry-after header field and return it to the client

HTTP header

If the header field is duplicated, the results may be inconsistent depending on the browser’s internal processing logic. Some give priority to the first header field, while others give priority to the last header field.

HTTP ensures that its messages are properly transmitted, identified, extracted, and processed:

  • Can be correctly identified (via the content-Type header specifying the media format, and the Content-language header specifying the Language) so that browsers and other clients can handle the Content properly
  • Can be unpacked correctly (via content-Length and Content-Encoding headers)
  • Be up to date (via entity captcha and cache expiration control)
  • Meet the needs of users (content negotiation header based on Accept series)
  • Fast and efficient transmission over the network (through range requests, differential coding, and other data compression methods)
  • Arrived intact, untampered (by transmitting the encoding header and content-MD5 checksum header)

A message is a box, and an entity is a cargo

1. Generic header field

  • Cache-control: controls the Cache behavior
  • Connection: hop – by – hop header, Connection management
  • Date: indicates the Date and time when the packet is created
  • Pragma: message instruction
  • Trailer: View the header at the end of the packet
  • Transfer-encoding: Specifies the Transfer Encoding mode of the packet body
  • Upgrade: Upgrade to another protocol
  • Via: information about the proxy server
  • Warning: Indicates an error notification

2. Request header field

  • Accept: The media type that the user agent can process
  • Accept-charset: indicates the preferred character set
  • Accept-encoding: Indicates the preferred content Encoding
  • Accept-language: preferred Language (natural Language)
  • Authorization: Web authentication information
  • Expect: Expects specific behavior from the server
  • From: Indicates the email address of the user
  • Host: indicates the server that requests resources
  • If-match: Compare entity tags (ETag)
  • If-none-match: compares entity tags
  • If-modified-since (IMS request): Compares the update time of the resource
  • If-unmodified-since: Compares the update time of the resource
  • If-range: Range requests sent to entity bytes when the resource is not updated
  • Max-forwards: maximum transmission hops
  • Proxy-authorization: Proxy servers require client Authorization information
  • Range: byte Range request for the entity
  • Referer: The original acquirer of the URI in the request
  • TE: priority of transmission encoding
  • User-agent: indicates information about HTTP client programs

3. Response header field

  • Accept-ranges: Whether to Accept requests for byte Ranges
  • Age: calculates the elapsed time of resource creation
  • ETag: matching information of resources
  • Location: Redirects the client to the specified URI
  • Proxy-authenticate: indicates that the Proxy server authenticates the client
  • Retry-after: Specifies the time required to Retry the request
  • Server: indicates the installation information of the HTTP Server
  • Vary: Administrative information cached by the proxy server
  • Www-authenticate: indicates that the server authenticates the client
  • Last-modified: indicates the Last Modified date

4. Entity header field

  • Allow: indicates HTTP methods supported by resources
  • Content-encoding: Encoding method applicable to the entity body
  • Content-language: The natural Language of the entity body
  • Content-length: specifies the size of the entity body
    • This is especially important for a caching proxy server, which may store incomplete content and use it multiple times to serve if it receives a truncated message and does not recognize the truncated message. Caching proxy servers typically do not cache HTTP bodies that do not have an explicit Content-Length header to reduce the risk of caching truncated packets.
    • Persistent connection: Because the connection is persistent, the client cannot rely on the connection closure to determine the end of the packet. From this header, you can know where the packet starts and ends.
    • If it is encoded, the encoded body byte length is displayed.
  • Content-location: replaces the URI of the corresponding resource
  • Content-md5: indicates the packet digest of the entity body
    • The sender can generate a checksum of the data when the initial body is generated so that the receiver can check this checksum to catch any unexpected entity modifications
    • Used as a hash key to quickly locate documents and eliminate unnecessary duplicate content storage
    • If encoded, for the encoded document
  • Content-range: Specifies the location Range of the entity body
  • Content-type: The media Type of the entity body (MIME Type)
    • Primary media type/subtype
    • The entity body type before encoding
    • Optional parameters are supported to further specify the type of content
  • Expires: Indicates the date or time when the entity body Expires
  • Last-modified: Indicates the Last Modified date and time of the resource
  • ETag: A unique verification code for a specific instance of this document
  • Cache-control: How should this document be cached

5, RFC4229 HTTP Header Field Registrations

  • Cookie: The Cookie information received by the server in the request header field
  • Set-cookie: Indicates the Cookie information used to start the status management in response to the header field
    • NAME=VALUE: Specifies the NAME and VALUE assigned to the Cookie
    • Expires =DATE: indicates the validity period of the Cookie
    • Path = path: use the file directory on the server as the applicable object for cookies (if not specified, default is the domain name of the server where cookies are created)
    • Domain = domain name: the domain name used as the Cookie object (if this is not specified, the default is the domain name of the server that created the Cookie)
    • Secure: Cookies are sent only for Secure HTTPS communication
    • HttpOnly: Restrict cookies from being accessed by JavaScript scripts
      • Document. cookie using JS cannot read the attached property
  • Content-Disposition

6. Hop-by-hop Header

  • Connection
  • Keep-Alive
  • Proxy-Authenticate
  • Proxy-Authorization
  • Trailer
  • TE
  • Transfer-Encoding
  • Upgrade

7, End to End Header

All fields except the eight hop headers are end-to-end headers

8. Other header fields: HTTP header fields are self-extensible

  • X-frame-options: prevents clickjacking attacks
    • DENY: refused to
    • SAMEORIGIN: permitted only when pages under the SAMEORIGIN domain name match
  • X-xss-protection: prevents cross-site scripting attacks. It is used to enable or disable the XSS Protection mechanism of the browser
    • 0: Sets XSS filtering to invalid state
    • 1: XSS filtering is set to a valid state
  • DNT: Refuse to have personal information collected

6. Disadvantages of HTTP

  • Communications use clear text (not encryption) and the content can be eavesdropped
  • The identity of the communicating party is not verified, so it is possible to encounter camouflage
  • The integrity of the message could not be proved, so it may have been tampered with

Seven, HTTPS

HTTPS = data encryption + website authentication + integrity authentication + HTTP

  • Verify the authenticity of the website through information such as certificates
  • Establish an encrypted message channel
  • Integrity of data content

The digital certificates issued by the CA center encrypt the information we transmit over the Internet through both symmetric and asymmetric encryption. Port number: 443

  • Symmetric encryption: Encryption and decryption use the same key
  • Asymmetric encryption:
    • Public key encryption: to all who view the site
    • Private key decryption: only the website server owns it

Advantages of HTTP 2.0

  • Use binary format to transfer data, rather than HTTP1.1 text format, binary format in the protocol parsing and optimization expansion brings more advantages and possibilities;

  • The message header using HPACK compression transmission, can save the network traffic occupied by the message header, and HTTP1.1 will carry a large number of redundant header information, waste bandwidth, header compression can be a good solution to the problem;

  • Multiplex, that is, multiple requests are completed through a TCP connection concurrently, HTTP1.1 although through pipeline can also be concurrent requests, but the response between multiple requests will be blocked, so pipeline has not been widely used, and HTTP2.0 does the real concurrent requests, Streams also support priority and flow control;

  • Server Push: The Server can Push resources to the client more quickly. For example, the Server can actively Push JS and CSS files to the client without the client parsing HTML. Send those requests, and when the client needs it, it’s already there.

Nine, WebSocket

Websocket is a set of protocol proposed by HTML5 to fill the lack of persistent links in HTTP links.

The “long connection” established by HTTP protocol is a “pseudo long connection”, which only relies on the server to realize the so-called long connection effect. In the process of connection, the server will be constantly connected and disconnected, and the data header information will be repeatedly sent, wasting broadband and reducing the efficiency of information exchange.

The Websocket server actively pushes data to the requester as long as it is available.

  • WS uses HTTP to establish connections, but defines a new set of header fields that are not used in HTTP.

  • A WS connection cannot be forwarded through a middleman; it must be a direct connection.

  • After the WS connection is established, both communicating parties can send data to the other party at any time.

  • After the WS connection is established, the data is transferred using frames and the Request message is no longer required.

  • WS data frames are ordered

IO is an open source WebSocket library. It realizes the server side through Node.js and also provides the client SIDE JS library. Socket.IO supports bidirectional data communication based on event triggering

10. Interview questions

The Etag value is the server’s Hash of the file’s index section, size, and last modification time

With Last-Modified, why Etag?

  • Last-modified is incorrect if two changes are made to a file within 1S
  • Some servers do not know exactly when a file was last modified
  • Some files may change periodically, but the content does not change (only the modification time), and we do not want the client to think that the file has been modified and GET again
  • That is, Etag can sometimes compensate for last-Modified judgment defects

With Etag, why last-Modified?

  • Sometimes last-Modified can compensate for the defect of ETag judgment, such as the modification of some static files such as pictures. If ETag is generated for comparison of each scan content, the modification time is obviously much slower than that of direct comparison.