This chapter directory
- 1. How is HTTP defined?
- How does HTTP work?
- 3. What is a message?
- 4. What is URL?
- What are the RequestMethod methods?
- 6. State Code Indicates the status Code
- 7. Header
- 8. Cache Cache
- The history of HTTP
- 10. How to implement the resumable function
Structure diagram of this chapter
1. How is HTTP defined?
1.1. Hypertext Transfer Protocol;
HyperText Transfer Protocol (HTTP) is an application-layer Protocol for distributed, collaborative, and hypermedia information systems. HTTP is the basis of data communication on the World Wide Web.
HTTP was originally designed to provide a way to publish and receive HTML pages. Resources requested over HTTP or HTTPS are identified by Uniform Resource Identifiers (URIs).
1.2. Layering of HTTP in the network;
The initial network layer is divided into seven layers, as follows;
The OSI model:
Since the top three layers (application layer, presentation layer and session layer) are an application layer in the TCP/IP group, they are merged into the application layer.
TCP/IP group model:
HTTP is an application layer protocol in the network layer.
How does HTTP work?
The way HTTP works is usually done by the client and the server together;
Usually, a client initiates a request and creates a TCP connection to a specified port (default: 80) on the server. The server listens for the request on this port. When the request is received, the server returns the corresponding content, such as HTML text, images, etc. And return the corresponding status code to tell the client whether the request status is successful;
3. What is a message?
3.1. What is the HTTP Packet?
HTTP packets are blocks of data sent between HTTP applications.
Compared the HTTP to express, the HTTP message is parcel delivery sheet, include name, address, postal code, I said the Courier to send where to go, and after receiving the Courier delivery center, through the delivery sheet, to determine where you want to send this package, and this information is obtained by Courier single;
The same is true of HTTP packets. The client sends a request and carries a packet. After receiving the request, the server parses the packet and knows what the client needs.
When the server returns the content to the client, it will also carry a packet to indicate what the returned content is, which is convenient for the client to parse the data and process the data through the packet.
3.2 format of THE HTTP packet
HTTP packets are classified into request packets and corresponding packets.
- Request packet: The packet sent by the client to the server.
- Corresponding packet: The packet sent by the server to the client.
Format of the request message:
It consists of request line, request header, blank line and request data
For example, request the address of Baidu, enable the debugging function of the browser, and view the request packet.
Format of the response packet:
It consists of four parts: status line, request header, blank line and request data
For example, view the response message of the Baidu page in the browser.
4. What is URL?
4.1. How is URL defined?
URL: Full name: Uniform Resource Locator Translated name: Uniform Resource Locator, which accurately describes the address of a Resource on the Internet.
The pages we visit are all addressed, and the address usually points to a resource on a server;
4.2. What is the URL format?
The format of URL is composed of protocol type (HTTP, HTTPS, etc.), server address (host), port (port), and path (path).
For example, baidu’s address: www.baidu.com/;
The format is http://host[:port][/path]
- Protocol type: indicates the network protocol used for network requests, such as HTTP or HTTPS.
- Server address: indicates the host, domain name, or IP address.
- Port number: default to 80 if not specified.
- Path: specifies the URL of the requested resource, with “/” by default, which is usually added by the browser.
What are the RequestMethod methods?
5.1 Request method
There are many ways to make an HTTP request, as follows:
-
GET: Requests that use GET are used to GET data from the server;
-
HEAD: Similar to a GET request, but with no response body;
-
POST method: used to submit content to the server, usually to modify or delete resources on the server;
-
PUT: Usually used to modify data on the server;
-
DELETE method: DELETE resources specified by the server;
-
CONNECT method: establish a tunnel to the server identified by the target resource, used for proxy server;
-
OPTIONS: Describes the communication OPTIONS of the target resource, usually for cross-domain requests.
-
TRACE method: performs a message loopback test along the path to the target resource to TRACE the request;
5.2 Differences between GET and POST requests
The most common request methods are basically GET and POST requests, so let’s look at the differences between them;
Usually in the browser input an address to request, generally through the GET request mode, and the POST request is generally used to submit content, will be submitted to the server parameters into the body, the request;
Take a look at the differences listed by W3School:
Questions: 1. Security:
GET requests have parameters that are visible to the browser and are essentially non-secure;
A POST request does not see the request parameters in the browser, so is it safe?
No, other people can capture your request parameters through the way of packet, so it is not secure, to secure transmission only through encrypted HTTPS request;
2. Does the POST method generate two TCP packets?
The answer is: not necessarily;
This is not necessarily two TCP packets, but whether the browser has done something to send the TCP packets twice, if so, it will send the TCP packets twice.
There are articles on the web that verify this, which I won’t go into here, but it turns out that Chrome and Safari send TCP packets twice, while Firefox only sends them once.
“99% of people GET the difference between GET and POST in HTTP wrong”?
3. What are the problems caused by excessively long URLS?
Let’s use Postman to simulate this. Let’s make a very long URL and request it to see if there is an error.
Postman returns 414 request-URL Too Long, indicating that the current URL is Too Long.
The same goes for the error message returned by a browser request:
So is there any question here, why is the URL too long? The server can’t handle it, or postman can’t handle it? Or does the HTTP protocol stipulate that the URL cannot exceed long?
A review of the RFC HTTP documentation (RFC 2616-HTTP /1.1) reveals the following passage:
The HTTP protocol does not place any a priori limit on the length of a URI. Servers MUST be able to handle the URI of any resource they serve, and SHOULD be able to handle URIs of unbounded length if they provide GET-based forms that could generate such URIs. A server SHOULD return 414 (Request-URI Too Long) status if a URI is longer than the server can handle (see section 10.4.15).
Translated, it means:
The HTTP protocol does not place any length limit. The server must be able to handle URI services for any of its resources. It should be able to handle URIs of unlimited length.
The HTTP protocol does not limit the length of the URL. Only the server or browser can handle the length of the URL, so the server or browser can handle the error.
4, can the GET method include Body?
The answer is yes!
Why can the GET method take the Body? The GET method fetches resources from the URL, right?
Let’s look at the RFC protocol definition for the GET method:
The GET method means retrieve whatever information (in the form of an entity) is identified by the Request-URI.
The GET method is used to retrieve information from the server via the URL.
So are you confused at this point? He didn’t say no just because he could bring the Body with him; Don’t worry, let’s continue the analysis;
Let’s look at another RFC protocol for GET requests with Body:
A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request.
The GET method with the Body may cause the request to be rejected, either by the server or by the browser or the frame.
Conclusion: The GET method can carry the Body request, but it may reject the request error, or some other error occurs;
HTTP GET can’t send data through Body. HTTP GET can’t send data through Body.
6. State Code Indicates the status Code
6.1 What is the status code?
The status code is a number returned by the server after the client requests the server to tell the status of the request. For example, 200 indicates that the request is successful; for example, 404 indicates that the client is wrong. In this way, the client can get the status code and then do the corresponding processing, such as whether to retry, such as display error page, such as prompt the user improper operation and so on;
6.1 What are the status codes?
Take a look at the newbie’s definition of a status code:
As shown in the figure, HTTP status codes are mainly divided into 5 types, respectively 1xx, 2xx, 3xx, 4xx, 5xx and so on.
Every status code that starts with x can be classified as a certain type of state; For example, 200 (request success), 404 (client error), etc.
There are many types of status codes, which I will not cover here;
7. Header
What is the Header used for?
Header is actually a key-value pair, which is metadata that tells the server what data I’m going to fetch or what operation I’m going to do;
For example, Accept: text/plain indicates that the client can Accept the return of text data.
In addition to the standard Header, we can also customize the request Header. For example, user: “Header” means I want to send “Header” to the server, and the server will handle it accordingly.
7.2. What are the headers?
HTTP header fields can be divided into four types according to the actual use: general header, request header, response header, and entity header.
- Generic headers: Headers that can be used by both the client and server to provide some very useful generic functionality between the client, server, and other applications, such as the Date header;
- Request headers: They are unique to request packets. They provide the server with additional information, such as what type of data the client expects to receive.
- Response header: easy for the client to provide information, such as what type of Server the client is interacting with, such as the Server header;
- Entity header: The header that deals with the body of the entity. For example, you can use the entity header to specify the data Type of the body of the entity, such as the Content-type header.
7.3. What are the common Header types?
1, the Host
The client specifies the domain name, IP address, and port number of the WEB server that it wants to access.
2, the content-type
Specifies the type of the request body. There are four main types;
(1) text/ HTML: used to tell the server that the response type is text data type;
(2) X-www-form-urlencoded: form type, used for web page plain text form submission data to the server, such as registration page data submission;
(3) Multitype /form-data: used for submitting forms with binary files on web pages, such as modifying user avatars and uploading pictures to servers;
(4) Application /json, image/jpeg, application/zip… : Submit individual content to server, such as submit JSON, submit image, submit ZIP package to server;
3, the Content – Length
Used to specify the length of the response body, indicating how many bytes of content I need to return this request, mostly used for block transmission;
4, the user-agent
Used to indicate identity information to the server, indicating that I am from the mobile client request, or from a browser request;
5, the Range
Byte: start-end; byte: start-end; Used for multi-threaded download or breakpoint continuation;
6, the Accept
Tells the server what type of data the client can accept, such as text/ HTML;
7, the Accept – Charset
Tells the server what character set the client can accept, such as UTF-8;
8, the Accept – Encoding
Tells the server what type of compression, such as ZIP, the client can accept.
9, the Content – Encoding
The server tells the client what compression methods they use, such as gZIP, deflate, etc.
HTTP Header types are too many to describe here, interested please refer to: “Android Series of Networks (2) —-HTTP Header and response Header
8. Cache Cache
8.1. Why does HTTP need caching?
1. Access to content through the network will be affected by speed and cost a lot. It is necessary to establish communication between the client and the server, and then communicate through the way of packet transmission.
2. Reduce overhead. If every request is fetched from the server, then the server will face huge pressure and may hang down, resulting in access failure;
3. The use of cache can also reduce the consumption of network bandwidth. Excessive requests will lead to network congestion, so the response speed will be slow;
4, reduce meaningless repeated requests, such as the data of a page, operation will be modified once a day, but EVERY time I come in this page to request data from the server, this is meaningless, will only lead to a waste of resources;
The use of caching can greatly improve response speed, reduce server stress, and reduce bandwidth usage, so HTTP caching is critical;
8.2. What is the caching mechanism of HTTP?
1. Verify the cached response through ETag
(1) What is ETag?
An ETag is essentially a header, also known as a validation token, that is generated by the server based on the hash value or some other value generated from the file;
(2) What problem is the verification token designed to solve?
If our local data is a cache time, when we are from the local data, found the cache time expired, will go over to the server at this time of the data, but the back data from the server and the local data is the same, no change, only the local cache time has expired, it doesn’t make any sense to get data from the server at this time. It also wastes the resources consumed by the request;
So validation tokens are designed to solve these kinds of problems;
(3) How does the validation token solve this problem?
As I said before, the validation token is essentially a header, which we will carry when we go to the server to fetch data. For example, ETag: “XXXXXX “; When the server receives the header, it validates it. If it compares the token and finds no change, it returns a 304 Not Modified response telling the browser that nothing has changed in the cache and that it can continue to use it. When we receive the response, we update the local cache time. Continue to use the cache;
Borrow the official picture:
2. Cache-control Cache Control
(1) What is cache-control used for?
Cache-control is used to define caching policies, such as how long a resource will be cached in what scenario. It is essentially a header.
Cache-control is defined in the HTTP/1.1 specification;
(2) How does cache-control define the Cache policy?
Cache-control implements the Cache policy with the Cache instruction defined by the header. For example, cache-control: max-age = 180;
(3) What are the cache instructions?
Request instruction:
-
No-cache: Communicates with the server using the ETag token to determine whether the server data has been modified. If it has been modified, the new data returned by the server is used. Otherwise, the cache is used.
-
No-store: When the command is “no-store”, data is not cached and is fetched from the server. It is generally used in scenarios requiring security or real-time refresh.
-
Max-age: indicates the validity period of the response body of the current request. If the validity period is exceeded, data will be fetched from the server.
-
Public: indicates that the data can be cached by any intermediary (proxy server, CDN, etc.) or the browser. In general, public is not a required directive. There are other directives (such as max-age) that indicate that the request can be cached.
-
Private: indicates that the data cannot be cached by any intermediary (proxy server, CDN, etc.), but the browser can cache the data of this directive;
3. Expires cache header
The Expires header contains the date and time. The next request will compare the local time to the time. If the cache is valid, it will be cached. Hence the use of cache-control directives after HTTP1.1 is more flexible;
4. Cache-control in OKHTTP
final CacheControl.Builder builder = new CacheControl.Builder(); builder.noCache(); // Do not use the cache, all network builder. NoStore (); Builder. MaxAge (10, timeunit.milliseconds); // Do not use cache and do not store cache. // Indicates that the client can receive a response with a lifetime of no more than the specified time. CacheControl cache = builder.build(); //cacheControlCopy the code
The history of HTTP
HTTP 0.9, 9.1
The World Wide Web Consortium (W3C) and the Internet Engineering Task Force (IETF) developed the HTTP 0.9 standard in 1991, which only supported GET requests.
HTTP 1.0, 9.2
From the single GET method request, added POST request, support to send any format of content, including text, video, music, and binary files;
The format of the request body and return body has also changed. There are new headers, status codes, Multi-character set supports multi-part type, authorization, cache, and Content encoding.
Disadvantages:
(1) The connection cannot be reused, the browser and the server can only establish a short TCP connection, when the request is completed, the TCP connection is closed; If the request needs to be made again, the TCP connection needs to be established again, which wastes resources and takes response time.
(2) Head-of-line Blocking (HOLB). In TCP connection, requests are ordered, and the next request will be processed only after the previous request is processed by the server. If the speed Of the previous request is slow, the Blocking Of subsequent requests will occur.
HTTP 1.1, 9.3
HTTP 1.1 is currently the most popular version of HTTP;
Optimization points: (1) Cache: While HTTP 1.0 used if-modified-since, Expires in headers as the caching criteria, HTTP 1.1 introduced more caching control policies such as Entity Tag, If-unmodified-since and other alternative cache headers to control caching policies.
(2) Bandwidth optimization: the new range header field is used for block transmission, allowing the request of part of the current file, which can greatly reduce the network bandwidth occupation;
(3) Added 24 error status response codes
(4) Long links: The header field added Connection: keep-alive, indicating that a part of the Connection can be reused, on the TCP Connection can transmit multiple requests, so as to reduce the cost of establishing and connecting;
(5) Add pipelining technology, when the current request has not returned, can send the next request to the server, so as to reduce the request time;
Disadvantages:
(1) Long link: although joining long link can reduce the consumption of establishment and connection, but the link of different domain name can not be reused, can only re-create long link, will consume resources, and bring great pressure to the server;
(2) The amount of data requested in the header is too large, resulting in a waste of traffic. If the content of each request is unchanged, but the amount of data carried in the header is large, it will cause a waste of resources;
(3) Pipelining was introduced in HTTP 1.1 to solve the problem Of head-of-line Blocking. The browser can send multiple requests to the server at the same time. Instead Of waiting for the last request to return, the server processes the request after processing the response to the current request. Even if many requests have already been processed, the server still has to respond according to the order of the requests.
So it’s not really a multi-request protocol, but it’s a nice improvement;
HTTP 2.0, 9.4
HTTP2.0 is based on the SPDY protocol developed by Google. What new improvements does HTTP2.0 bring over HTTP 1.1?
Point of optimization:
(1) multiplexing: for HTTP1.1, if there are multiple requests, the request is sent in serial execution, the utilization efficiency of broadband is not high, but in HTTP 2.0, multiple requests can be parallel requests, greatly improve the utilization of broadband;
(2) Header compression: The Header table is used to track and store previously sent key-value pairs. For the same content, it will not be sent every time for request and response.
(3) Data priority: Since the requests can be sent to the server together, the server still follows the first-in-first-out rule to process the requests. However, in HTTP 2.0, the priority of the current request can be set, so that the server will return the request with higher priority when processing the response of the request;
(4) Server push: Generally, the browser or client sends the request to the server, and the server processes it and then returns it. However, in HTTP 2.0, the server can push the relevant file to the client, and then the client processes it, rather than wait for the client to send the request and then return it to the client.
10. How to implement the resumable function?
Principle of 10.1,
The principle is very simple, is to continue to download from the stop download place, such as I downloaded 10 megabytes, the file has a total of 20 megabytes, at this point the breakpoint download is from 10 megabytes began to continue to download;
10.2, the core
For example, if I want or want the first 1024 bytes of a file, then ALL I need to do is pass the Range: Bytes =0-1024 (0 to 1024 bytes of data), so that the data in bytes can be obtained from the server;
10.3 benefits,
The advantage of resumable transmission is high transmission efficiency. For example, when downloading a file, it is interrupted by the network in the process of transmission. At this time, it does not need to download again at the beginning, but continue to download at the place where it stopped.
Breakpoint continuation can also improve efficiency through multi-threading, such as downloading a file, I can open 10 threads or 20 threads to piecewise download file, this can greatly improve the download speed, like thunderbolt or Baidu cloud are mostly this principle, but more complex; The maximum number of threads you can open for download depends on the performance of the current client.
10.3 Implementation of client
Database: the actual location for storing breakpoint downloads;
Multithreading: to speed up the download;
RandomAccessFile class: Java provides access to the contents of a file, either to read or to write the file, to any location of the file. Applicable to files consisting of records of known size;
Here’s a look at the implementation:
Reference & Thanks
-
Hypertext Transfer Protocol
-
Who says HTTP GET can’t send data through Body?
-
—-HTTP request headers and response headers
-
HTTP cache
-
Android okHttp network request Cache-Control
-
Okhttp parsing (5) Cache processing
-
A thorough look at the history of HTTP
-
How does HTTP2 solve Head of Line blocking (HOL) issue
-
Java with RandomAccessFile to achieve multithreaded download
About me
Brother Dei, if my post is helpful to you, please give me a like ️, and follow me on Github and my blog.