When asked what you know about HTTP in an interview

preface

This article is the second in a series on HTTP. For more articles, see juejin.cn/post/684490…

TCP/IP network layered model

TCP/IP’s original designers came up with the concept of “layering”, which divides complex network communication into multiple layers and assigns different responsibilities to each layer. Within each layer, you only need to focus on your own work. With the idea of “divide and rule”, a “big trouble” is divided into several “small trouble”, thus solving the problem of network communication.

The following figure shows the TCP/IP stack hierarchy:

The TCP/IP stack has four layers. Like building blocks, each layer needs support from the lower layer and supports the upper layer. If any layer is removed, the whole stack may collapse. Its layers are numbered “from the bottom up”, so the first layer is the lowest layer.

1. Layer 1: Link Layer

This layer is responsible for sending raw packets over underlying networks such as Ethernet and WiFi. It works at the nic level and uses MAC addresses to mark devices on the network, so it is sometimes called the MAC layer.

2. Layer 2: “Internet Layer” or “Internet Layer”

The IP protocol is at this layer. Because the IP protocol defines the concept of “IP address”, it is possible to use “IP address” instead of “MAC address” on the basis of the “link layer” to connect many lans and wide area networks into a virtual huge network. To find devices on the network, you simply “translate” IP addresses into MAC addresses.

3. The third layer: transport Layer

This layer ensures reliable data transfer between two points marked by IP addresses. It is the layer of TCP and UDP.

Differences between TCP and UDP

Stateful or not: TCP is a stateful protocol. It needs to establish a connection with an object before sending data and ensures that data is not lost or repeated. UDP is relatively simple. It is stateless and can send data arbitrarily without establishing a connection beforehand, but there is no guarantee that the data will be sent to the other party.
Data format: TCP data is a sequential byte stream. UDP, on the other hand, is distributed in small packets, which are sent in sequence and received in disorder.

4. The fourth layer: Application Layer

Because the three layers below lay the groundwork so well, a hundred flowers bloom in this layer, with all kinds of application-specific protocols. For example, SSH, HTTP, and FTP.

The transmission unit at the MAC layer is frame, the transmission unit at the IP layer is package, the transmission unit at the TCP layer is segment, and the transmission unit at HTTP is message. But there is no essential distinction between these terms, which can be called packets.

OSI network layered model

OSI stands for the Open System Interconnection Reference Model.

Network model is not the beginning, at the beginning of network development, network protocol is defined by the various Internet companies, such as when the giant network company of IBM, Microsoft, apple, cisco, etc., every company has their own network protocol, each house also can’t exchange protocol, then we think it is possible, But for consumers it’s actually a technology monopoly, because if you buy an Apple device you can’t use a Microsoft device because their protocols are not the same, there’s no unified standard to regulate the network protocols, it’s all proprietary to these companies. The Open Systems Interconnection Model (OSI), the International Organization for Standardization, in 1984, was a standard, not an implementation. TCP/IP protocol is designed based on this model.

The OSI model is divided into seven layers, some of which are similar to TCP/IP, as shown below:

1. Physical layer: the physical form of the network, such as cables, optical fibers, network adapters, and hubs

2. The data link layer is basically equivalent to the TCP/IP link layer

3. Network layer: equivalent to the Internet layer in TCP/IP

4. Transport layer: Equivalent to the transport layer in TCP/IP

5. Session layer: Maintains the connection status in the network, that is, maintains sessions and synchronizes

6. Presentation layer: Translating data into appropriate and understandable syntax and semantics

7. Application layer: Data is transmitted for specific applications

But international standards bodies are well aware that protocols such as TCP/IP already work on many networks and cannot be overturned. Thus, the OSI layered model was explicitly stated at the time of release as a “reference,” not a mandatory standard.

But the OSI model also has advantages. By contrast, TCP/IP is a stack of pure software, without the most basic physical devices such as cables and network cards that a network should have. OSI makes up for this lack by describing the network more completely in theory.

Mapping between two hierarchical models

Now that we have two network layering models: TCP/IP and OSI, a new problem arises, one is a four-tier model, the other is a seven-tier model, how should the two map to each other or explain each other?

Fortunately, OSI was designed with TCP/IP and other protocols in mind, making it relatively easy but not very precise to implement the correspondence.

Layer 1: physical layer, TCP/IP does not correspond;

Layer 2: data link layer, corresponding to the TCP/IP link layer;

Layer 3: network layer, corresponding to the TCP/IP Internet layer;

The fourth layer: transport layer, corresponding to TCP/IP transport layer;

Layer 5, layer 6, and Layer 7 correspond to the TCP/IP application layer.

How the TCP/IP stack works

How does the TCP/IP stack work?

HTTP is transmitted through the protocol stack layer by layer, through the application layer, transport layer, Internet layer, and link layer. Each layer adds the proprietary data of its own layer, packages the data layer by layer, and sends the data layer by layer.

The data receiving process of HTTP protocol is the opposite operation, from bottom to top through the link layer, Internet layer, transmission layer, application layer, layer by layer unpacking, each layer remove the proprietary header of the layer, the upper layer will get its own data.

The transmission process of the lower layer is completely “transparent” to the upper layer, and the upper layer does not need to care about the specific implementation details of the lower layer, so the HTTP level, it does not care whether the lower layer is TCP/IP protocol, see only a reliable transmission link, as long as the data with their own header, the other party can be received as is.

You can think of HTTP as using the TCP/IP protocol stack to transfer data like sending a package.

If you want to give a stuffed animal to a friend, the toy is the equivalent of HTTP, like HTML. To protect the toy, you put a plastic bag around the stuffed animal, which is like adding an HTTP header to the HTTP protocol.

You give your toy to the delivery guy, and in order to protect the goods, he adds a layer of packaging and a label, which is like repackaging the data at the TCP layer, adding the TCP header.

Then the Courier goes downstairs, puts the package in a tricycle, takes it to the hub, and then loads it into a bigger truck. It is equivalent to adding IP header and MAC header to TCP packets at the IP layer and MAC layer.

After a long period of transportation, the parcel arrived at the destination and had to be unloaded and then put into the tricycle of another Courier, which was unpacked after transmission at the IP layer and MAC layer.

The Courier arrives at your friend’s door, RIPS off the tag, removes the TCP header, your friend removes the plastic wrapper, the HTTP header, and you end up with the stuffed animal, the actual HTML page.

HTTP packets are the core of the HTTP protocol

The core of THE HTTP protocol is the content of the packets it transmits.

HTTP defines the format of packets, components, parsing rules, and processing policies in detail in the specification document. Therefore, HTTP can realize more flexible and rich functions on top of TCP/IP layer, such as connection control, cache management, data encoding, and content negotiation.

TCP packet structure

A TCP packet, for example, contains a 20-byte header before the actual data to be transmitted. The header stores additional information required by THE TCP protocol, such as the port number of the sender and the packet number of the receiver.

With this additional TCP header, the packet can be correctly transmitted, after the destination is not removed, you can get the real data.

HTTP packet Structure

HTTP is similar to TCP in that it also needs to attach some header data before the actual data transmission. However, unlike TCP, it is a “plain text” protocol, so the header data is ASCLL code text, which can be easily read with the naked eye and can be understood without using programs.

The structure of HTTP request packets and response packets is basically the same and consists of three parts:

Start line: Describes the request or corresponding basic information
Header field set: describes the packet in more detail in the key-value format
Message body: The actual transmitted data, such as images and videos, is not necessarily plain text

The first two initial lines and header fields are often referred to together as “request header” or “response header” or “header,” and the body of the message is referred to as “entity” or “body.”

According to the HTTP protocol, a packet must contain a header but not a body. Just like you can send an empty package when you send express. However, the header must be followed by a blank line, which is CRLF. Although the HTTP protocol does not limit the size of headers, web servers do not allow oversized headers, because such headers occupy a large number of server resources and affect operating efficiency.

A complete HTTP message looks like this, with an empty line between header and body.

Common header field of HTTP packets

Type to distinguish

The HTTP protocol specifies a large number of header fields and implements a wide variety of functions, but they can be divided into four broad categories:

Generic fields: can appear in both request headers and response headers
Request field: Can only appear in the header of a request to provide further information about the request or additional conditions
Response field: indicates the information about the response packet, which can only be displayed in the response header
Entity field: This is actually a generic field, but specifically describes additional information about the body

In fact, the analysis and processing of HTTP packets is mainly the processing of header fields, and the understanding of header fields means the understanding of HTTP packets.

Request Headers is a common Request field

Accept: Indicates the media format (MIME) that the client expects the server to return, such as text/ HTML, mage/webp, /, etc. Wildcard characters */* indicate any type of data.
Accept-encoding: Indicates to the server the encoding method received by the client (browser). It is usually a compression method, such as gzip.
Accept-language: indicates the Language received by the client to the server, such as en-us and zh-cn.
Cache-control: controls the browser Cache, such as private and no-cache.
Cookie: tells the information about the Session to the server and stores the information for the server to identify the user.
Refer: is a link that tells the server from which page the link came.
User-agent: sends the browser version, system, and application information to the server.
Connection: Indicates the Connection status. For example, keep-alive indicates that the Connection status is kept and close indicates that the Connection status is closed.

Commonly used corresponding field Response Headers

Content-length: indicates the length of the HTTP entity message (body).
Content-type: the header sent by the server to the client, representing the unified response for the media Type and encoding format of the Content, such as text/plain. Charset = utf-8.
Content-language: This field is a response to accept-language. The server uses this field to tell the client what language the Body information is returned in.
Date: If the server has no cache, Date is the immediate generation time of the response. If the server has a cache, Date is the time when the response content is cached.
Expires: Sets the expiration time of the response body. If access is made before expiration, the version in the cache is read.
Last-modified: Sets the time when the file was Last modified on the server.

See a very complete article on the HTTP header field in nuggets, we have the need to refer to:Juejin. Cn/post / 684490…

Note: Attached is a chart of common media formats

HTTP request method

A request method is an operation on a resource that is issued by a client and required to be performed by the server.

The actual meaning of the request method is that the client issues an “action command” to the server, asking the server to perform the action on the resource located by the URI. Currently HTTP/1.1 specifies eight methods, all of which must be capitalized.

GET: access to resources, can be interpreted as reading or downloading data.
HEAD: get the meta information of the resource.
POST: writes or uploads data to a resource.
PUT: similar to POST;
DELETE: deletes resources.
CONNECT: establish a special connection tunnel;
OPTIONS: Lists the methods that can be implemented on the resource;
TRACE: Traces the transmission path of the request and response.

The request method is an “instruction” to the server, which decides what to do with it.

Since the request method is an “instruction” from the client to the server, the client naturally has no decision-making power, and the server controls all resources and has absolute decision-making power. After receiving the HTTP request packet, it sees the request method in it and can execute or reject it. It tells the client through the CORRESPONDING HTTP status code, for example:

If the file you accessed does not exist, you will be returned with a 404
If the file you are accessing exists, but access is not allowed, you may be returned with 403 forbidden, etc

Four common request methods

The most common request methods are GET and POST, which fetch data and send data respectively.

GET/HEAD

The GET method is probably the best known and most used request method in the HTTP protocol. It means to request a resource from the server, which can be text, pages, images, videos, HTML pages, and so on
The HEAD method is similar to the GET method in that it requests a resource from the server, but the server does not return the requested entity data, only the response header. The HEAD method can be considered a “simplified” version of the GET method because its response header is exactly the same as GET’s, so it can be used in many scenarios that do not require a body and avoid the waste of transferring body data. For example, if you want to check the existence of a file, you can just send a head request. There is no need to use GET to fetch the entire file.

POST/PUT

The GET and HEAD methods fetch data from the server, while the POST and PUT methods do the opposite, submitting data to the resource specified by the URI, which is stored in the body of the packet.

POST is also a frequently used request method, the frequency of use should be second only to GET, there are many applications, as long as the server to send data, most of the use is POST.

For example, if you write an article on the Nuggets and click publish to upload your article to the Nuggets server, a POST request will be executed.

The function of PUT is similar to that of POST. It can also submit data to the server, but it is slightly different from POST. POST usually means “create” or “create”, while PUT means “modify” or “update”. Compare this to SQL: POST as INSERT, PUT as UPDATE, multiple INSERTS add multiple records, while multiple updates operate on only one record and have the same effect. In practice, PUT is rarely used because it is too similar to POST in semantics and functionality.

Two important concepts of request methods are security and idempotence

security

In THE HTTP protocol, “safe” means that the request method does not “corrupt” the resources on the server, that is, does not substantially modify the resources on the server.

According to this definition, only the GET and HEAD methods are “safe” because they are “read-only” operations and the data on the server is “safe” no matter how many times it is accessed. However, operations such as POST, PUT, and DELETE modify resources on the server and add or DELETE data. Therefore, they are not secure.
Power etc.

“Idempotent” is actually a mathematical term borrowed from the HTTP protocol, meaning that the same operation performed many times will have the same result, that is, the result of multiple “powers” is “equal”.

For example, the POST method is not idempotent because it performs multiple operations.

HTTP response status code

In the previous study, we mentioned that when the client sends a request to the server, the server will return a status code to tell the client, indicating the server’s processing result of the request. The status code specified in the RFC standard is a three-digit number. The RFC standard divides the status code into five categories, with the first digit representing the classification. The specific meanings of the five categories are as follows:

1xx: indicates that the protocol processing is in the intermediate state and further operations are required
2 x x: Yes, the packet is received and processed correctly
3 x x: indicates redirection. The resource location changes and the client needs to resend the request
4 x x: An error occurs on the client. The request packet is incorrect and the server cannot process the request packet
5 x x: Server error. An internal error occurred when the server was processing the request.