preface

I took notes on this article a year ago when I was reading Diagrams of HTTP, but I left it somewhere in my folder. Some time ago, I met some problems in the status code when I was practicing in the masturbating project, so I took it out and looked it over again. Meanwhile, I made some modifications on the original basis and added some things in HTTPs. This article is mainly a summary of the relative importance of HTTP knowledge. If you want to learn HTTP well, you still need to see the Illustrated HTTP or HTTP Authoritative Guide to build a more complete knowledge system. In the end, you are right, I am the fabled clickbait, but hope this article is useful to you, even a little bit, 2333.

1.HTTP brief (from MDN)

Hypertext Transfer Protocol (HTTP) is an application-layer protocol for transmitting hypermedia documents such as HTML. It is designed for communication between Web browsers and Web servers, but it can also be used for other purposes. HTTP follows the classic client-server model, where a client opens a connection to make a request and then waits for it to receive a server-side response. HTTP is a stateless protocol, meaning that the server does not retain any data (state) between requests. Although typically based on the TCP/IP layer, it can be used over any reliable transport layer.

2. The URL and URI

We often come across a URL (Uniform Resource Locator), which is a string address that we use to access the Web. Uris, on the other hand, are relatively new to us. They are called uniform Resource Identifiers (URIs). Let’s look at the specific differences:

  • URI: Uniform Resource Identifier. A resource identifier is an abstract resource identifier that can be relative or absolute.
  • URL: Uniform Resource Location A uniform resource locator that identifies a resource but specifies how to Locate the resource. Because it specifies location information, it must be absolute.

(When we talk about a relative address, we really only talk about another absolute address.)

2.1 the URL

The basic format of the URL is as follows:

schema://host[:port#]/path/... /[?query-string][#anchor]
Copy the code
format meaning
scheme Specify the protocol (for example, HTTP, HTTPS, FTP) used in the lower layer.
host Specifies the IP address or domain name of the HTTP server.
port# The default port of the HTTP server is 80, in which case the port number can be omitted. If another port is used, you must specify it.
path Path to access resources.
query-string Data sent to the HTTP server.
anchor- anchor

3. HTTP message

3.1 FORMAT of HTTP Packets

HTTP packets are divided into header and body formats:

The blank line is used to distinguish the packet head from the packet body and consists of a carriage return character and a line feed character.

Both request packets and response packets need to have a header, but some request packets do not have a header. The general format of a request packet is as follows:

The format of the response message is as follows:

The following is the content of an HTTP packet sent by Google. Request headers describes the content of the request header, and Response headers describes the content of the response header:

The most common attributes are as follows:

  1. URL: indicates the HTTP access address
  2. Request Method: Indicates the request method of a packet
  3. Status code, status code and status phrase
  4. Accept Encoding
  5. Connection: Connection mode
  6. Cookie, the added Cookie content
  7. Host: indicates the destination Host
  8. User-agent: information about the client browser
  9. Set-cookie, which specifies what you want to save in the Cookie

Let’s talk about what these properties do

3.2 HTTP Request Method: GET and POST

There are many ways to send HTTP, but the most common are POST and GET.

  • GET: The GET method can be used to request access to resources that have been identified by the URL. The specified resource is parsed by the server and the response is returned. Simply put, if the requested resource is text, then return it as is.

  • POST: The POST method can be used to transfer the body of an entity.

The differences between the two are as follows:

  1. Different goals for use

POST and GET are both used to GET information, but GET is only a query, and does not have any effect on the content on the server. The content of each GET is the same. POST is often used to send something for some modification.

  1. Different size

Because different browsers have certain character restrictions on the length and size of the URL, so because GET mode is placed in the head of the URL, naturally follow the first, but the specific size depends on the browser. In POST mode, the content is added to the content of the packet. Therefore, as long as the content of the packet is not limited, the size of the packet is not limited.

  1. Different security

GET is added directly to the URL, so you can see the content in the URL. POST is stored inside packets and cannot be directly viewed by users.

Generally speaking, GET is used to obtain a certain content, and POST is used to submit a certain data request. From the usage scenario, the content registered by ordinary users is private and POST should be used to keep it private, while when a certain content needs to be queried and a quick response is required, GET is used.

3.3 Status Code Status code

Status codes are usually what the server says to the client and are classified as follows:

Status code meaning
1 * * The server receives the request and requires the requester to continue
2 * * Success, the operation is received and processed successfully
3 * * Redirect and further action is required to complete the request
4 * * Client error, request contains syntax error or request cannot be completed
5 * * Server error. The server encountered an error while processing the request

Common status codes:

  • 200 normal success OK

GET: The requested resource is returned as a response. The response will contain the result of the description or operation. POST: Returns the result of processing the corresponding request.

  • 204 Request processed successfully, No Content returned

Indicates that the request received by the server has been processed, but the server does not need to return a response. For example, if the client is a browser, the page displayed by the browser will not be updated.

  • 206 Partial Content

Some OF the GET requests were successfully processed

  • 301 Moved Permanently

The requested page has been permanently moved to a new location, permanently redirected

  • 302 Found

Temporary redirection of website, temporarily inaccessible (record, checked)

  • 303 See Other

This status code indicates that another URI exists for the requested resource and specifies that the requested resource must be directed to GET using the GET method. Unlike 302, which does not change the last request method

  • 304 Not Modified

If the resource cannot be accessed and returns as it was last accessed, the resource has not been modified and remains the same as it was last accessed.

  • 307 Temporary Redirect

Temporary redirection, similar to 302, 303, except that the client does not specify what method to request,

  • 400 Bad Request

Indicates that a syntax error exists on the client, causing the server to fail to understand the request. The client needs to modify the content of the request and send the request again.

  • 401 Unauthorized

That is, the user does not have the necessary credentials. This status code indicates that the current request requires user authentication.

  • 403 Forbidden

The server understands the request, but refuses to execute it.

  • 404 Not Found

The server could not find the requested page.

  • 500 Internal Server Error

The server encountered an error and could not complete the request.

  • 503 Service Unavailable

The server is currently unable to process requests due to temporary server maintenance or overload. This situation is temporary.

3.4 Content Encoding Accept Encoding

In order to reduce the transmission time, HTTP adopts some compression measures. For example, accept-encoding defines gZIP as the content Encoding format in the preceding packet information.

In general, there are several formats of content encoding:

  • Gzip :GNU compressed format
  • Compress: standard compression format in UNIX
  • Deflate: Is a lossless compression format that uses both LZ77 and Huffman encoding
  • Identity: Do not compress

3.5 Persistency — Connection

When sending HTTP packets, we need to establish a TCP connection and then send packets:

If you need to go through the above process every time to send HTTP packets, it will undoubtedly spend a lot of time in the process of establishing and disconnecting the connection. Therefore, HTTP uses the Connection attribute to specify the connection mode. When keep-alive is set, a persistent connection will be established. So you don’t have to make a connection every time you break a connection:

(Keep-alive enabled by default in HTTP1.1)

3.6 Stateless HTTP – cookies

Because HTTP is a stateless protocol, this is due to the Web server to the face of concurrent access to a lot of browser, in order to improve the processing ability of concurrent access to the Web server, when designing the HTTP protocol provisions of the Web server sends the HTTP response message and the document, not save the requesting any Web browser process state information, This reduces the load on the server side, and statelessness also reduces the overhead of HTTP requests.

However, when some scenarios need to remember the user’s information at all times, stateless obviously cannot meet the requirements, so HTTP provides cookies to solve this problem. Cookie technology controls the state of the client by writing cookie information in the request and corresponding packets. The cookie notifies the client to save the cookie according to the information in the header field called set-cookie in the packet sent from the server. The next time a client sends a request to the server, the client automatically adds the cookie value to the request header and sends it. Request without cookie state:

Request with cookie stored:

To put it simply, Cookie is a kind of content determined by the server and saved in the browser of the client. Instead of adding the user’s information each time, the request will automatically add the corresponding content in the cookie.

(For those interested in browser-side data storage, please read this article: A common browser-side data storage solution.)

3.7 Range Request

In some scenarios, when we use HTTP packets to request large images, the loading process tends to be slow. (on photography sites, for example) this is when we find that some images are loaded in pieces. This is because the length of the HTTP request is set to block the load resource. The Range attribute in the request packet and the Content-Type attribute in the response packet can specify their own HTTP requests.

3.8 Packet Header Summary

(figure from: http://www.cnblogs.com/xing901022/p/4311987.html)

4. HTTP method

HTTP supports several different request commands, called HTTP methods. Each HTTP request packet contains a method. This method tells the server what to do (get a Web page, run a gateway program, delete a file, and so on). The following table is a list of common HTTP methods:

HTTP method describe
GET Sends a named resource from the server to the client
PUT Stores data from the client to a named server resource
DELETE Deletes the named resource from the server
POST Send client data to a server gateway application
HEAD Send only the HTTP header in the named resource response

(GET and POST have been discussed above, not here)

4.1. PUT Transfers files

The PUT method is used to transfer files. It is similar to FTP uploading. The file content must be included in the subject of the request packet and saved to the location specified by the request URI. Because the PUT method does not have an authentication mechanism, anyone can upload files. Therefore, this method is not applicable to common Web sites.

4.2. DELETE Deletes a file

The DELETE method is used to DELETE a file. It is the opposite of put. The DELETE method deletes the specified resource according to the requested URL. In essence, the PUT method has no authentication mechanism, so it is recommended to use the DELETE method sparingly.

4.3. HEAD Obtains the packet header

The HEAD method is the same as the GET method, but does not return the body part of the packet. It is usually used to confirm the validity of the URL and the date and time of resource update.

5.HTTPS

5.1 What is HTTPS

HTTPS (full name: Hyper Text Transfer Protocol over Secure Socket Layer (Hyper Text Transfer Protocol over Secure Socket Layer) is an HTTP channel that aims at security. In short, it is a Secure version of HTTP, that is, SSL Layer is added under HTTP. The security cornerstone of HTTPS is SSL. SSL is therefore required for the details of encryption. It is now widely used, such as GitHub, Alipay, Nuggets and so on.

5.2 Why IS HTTPS Required

This is due to several disadvantages of HTTP:

  • Use clear text when transmitting, which will obviously be intercepted by the wrongdoers to do something shady.
  • There was no authentication mechanism so we could fake some HTTP access, which obviously caused some confusion. Jmeter, for example, is a typical example of forging a bunch of HTTP urls and then stress-testing them, which is a DOS attack.
  • Packet integrity cannot be verified. For example, an HTTP packet is intercepted and tampered with by an invalid, but the server cannot verify the packet integrity.

5.3 Differences between HTTP and HTTPS

Due to these shortcomings, HTTPS has made the following changes:

  • HTTP is plaintext transport, HTTPS is encrypted with SSL\TLS;
  • The HTTP port number is 80 and the HTTPS port number is 443.
  • HTTPS requires you to apply for a certificate from a CA. Generally, there are few free certificates and you need to pay a fee.
  • -HTTP connections are simple and stateless. The HTTPS protocol is based on SSL+HTTP. Protocol A network protocol that supports encrypted transmission and identity authentication. It is more secure than HTTP

5.4 HTTPS Defects

It can be said that HTTPS relative to HTTP is set on the golden armour saint, changed the body of ultraman, sleeping Maori Small Goro, not only improve security, but also improve the forced case. But HTTPS has some drawbacks:

  • Communication slows down, with encryption required to make multiple round trips to a handshake;
  • An increase in load on the user’s machine. (This may surprise you, but most of our school’s HTTPS websites are inaccessible at night.)

6. HTTP authentication

Some sites require users to log in to obtain user personal information to carry out the following operations, so you need to know these messages at any time, but certainly can not let the user enter the user password every time, which will make the user feel very uncomfortable, so HTTP also has its own authentication function, authentication methods are mainly as follows:

6.1 BASIC authentication

BASIC authentication is the simplest authentication, and the general process is as follows:

  1. The client accesses a URL.
  2. The server returns the 401 status code, prompting the user to enter the user name and password.
  3. A user enters a user name and password, which is encoded in BASE64.
  4. The server is authenticated and returns status code 200

But it has the following disadvantages:

  1. Only through BASE64 encoding, in fact, or plaintext transmission, security is not high
  2. Some browsers do not support logout

6.2 DIGEST authentication

Because of BASIC’s weaknesses, DIGEST authentication has been around since HTTP/1.1. DIGEST authentication also uses challenge/response, but does not send passwords in plain text as BASIC does.

6.3 SSL Authentication (Common)

SSL client authentication uses the HTTPS client certificate to complete the authentication. With client certificate authentication, the server can verify that access is from a logged in client.

Steps for SSL client authentication:

  1. When receiving a request for authentication resources, the server sends a CertificateRequest packet asking the client to provide a client certificate.
  2. The Client sends the Client Certificate information to the server in the form of Client Certificate packets.
  3. The server can obtain the public key of the client only after the client certificate is authenticated, and then start HTTPS encryption communication.

Websites with high security requirements, such as Alipay and e-bank, need to download a digital authentication when logging in. This digital authentication is a kind of SSL client authentication. However, its disadvantages are also obvious, requiring manual download, which will be very troublesome for the increasingly lazy Internet users (including me).

6.4 Form Authentication (Most Commonly used)

The last authentication method is the most common, which can be done through cookies or sessions.

A combination of Session management and Cookie applications

As I mentioned earlier, HTTP is a stateless protocol that does not allow for state management, hence the cookie. We can use cookies to manage sessions to compensate for the state management functionality that doesn’t exist in HTTP.

Authentication steps:

  1. The client puts the user’s ID and password into the physical part of the message, which is then sent to the server, usually in the form of a POST request.
  2. The server issues Session ids that identify users. The user authenticates the login information sent from the client and records the user authentication status and Session ID on the server.
  3. After receiving the Session ID, the client saves it locally as a Cookie. The next time you send a request to the server, the browser automatically sends a Cookie and the Session ID is sent to the server. The server authenticates the received Session ID to identify the user and its authentication status, and then the user can perform specific operations.

References:

  1. Illustrated HTTP
  2. https://juejin.cn/post/6844903504046211079
  3. http://www.cnblogs.com/xing901022/p/4309840.html