Http protocol

preface

I was ushered into a small dark room by Hr and asked to wait for the interviewer. After a while, a middle-aged man in slippers walked in. Looking at his brilliant hairline, I knew he must be a big shot.

As expected, the interviewer is not cover, just sat down and began to immediately violent output

Interviewer: I see on your resume that you are familiar with Http. What happens when your browser accesses a web address?

I :(this nima is afraid to be engaged in things so did not write proficient, this is also engaged. What happens, of course, when your brain is spinning so fast and your neck is breaking, that you finally realize what the interviewer might want to ask.)

Me: Hello, handsome interviewer!

Take https://silently9527.cn as an example. First, the browser accesses the DNS server and obtains the IP address corresponding to the domain name. And if we’re going to go down to the bottom, we’re going to be talking about TCP/IP layering, so let me draw a picture.

The server returns resources in a similar manner

Interviewer: You talked about TCP/IP layering. Can you elaborate on that?

I :(fortunately before the former university girlfriend did not throw away the notes that I had a class in those days, just last night to find review once, review old knowledge! It’s just a note, don’t get it wrong!

Me: THE TCP/IP protocol family is divided into four layers: application layer, transport layer, network layer and data link layer

Application layer: protocols used to communicate with applications, such as FTP, DNS, and HTTP
Transport layer: The application layer transmits data between two machines using two protocols: TCP and UDP
Network layer: During the transmission between two machines, there are multiple routes through multiple routers. At the network layer, one route is selected
The data link layer is used to handle the hardware parts of the network, such as network cards, device drivers, etc

Interviewer: What is the process of packet encapsulation and unpacking in the TCP/IP hierarchy?

Me: In this layer, each network request communicates with each other in a hierarchical order, with the sender going down from the application layer and the receiver going up from the data link layer. Take Http as an example:

The client makes an Http request at the application layer (Http protocol)
At the transport layer (TCP), Http request packets received from the application layer are separated into small packets and sorted
The network layer (IP protocol) selects a path to send packets after receiving them
The server sends data in sequence until the application layer receives the data

After the sender passes through a layer, the header information of the layer will be added. When the receiver receives the data, the corresponding header information will be removed at each layer

Interviewer: How does TCP ensure that data gets to its destination reliably?

Me: The three-way handshake used by TCP

First handshake: When establishing a connection, the client sends a SYN packet (SYN = J) to the server and enters the SYN_SEND state, waiting for confirmation from the server.
Second handshake: After receiving a SYN packet, the server must acknowledge the client’s SYN (ACK = J +1) and send a SYN packet (ACK = K). In this case, the server enters the SYN_RECV state.
Third handshake: After receiving the SYN+ACK packet from the server, the client sends an ACK packet (ACK = K +1) to the server. After the packet is sent, the client and the server enter the ESTABLISHED state to complete the three-way handshake.

Interviewer: Why three handshakes instead of two or four?

I: if two shake hands, if the client processing anomaly or the server returns an ack message is missing, so the end of the service will think connection is established, the failure once again to send a request to establish connections, but the server is no perception, thought, as well as the normal connected to build useless the server connection, waste of resources

If you shake hands four times, if three is enough, then you don’t need four. If four times, the last ACK is lost, then the problem of two handshakes will occur again.

Interviewer: You said three handshakes. Why don’t you say four waves?

The client sends a FIN request to the server to disconnect.
The server sends an ACK to the client indicating that it agrees to release the connection.
The server sends a FIN to the client indicating that it wants to disconnect.
The client returns an ACK to the server agreeing to release the connection.

Interviewer: Why does it take four disconnections instead of three?

Me: Because when the server receives the request for disconnection from the client, the server cannot disconnection immediately, because there may be data unfinished on the server side, so it can only reply an ACK to indicate that I have received the message; After the server sends data, the FIN sends a message indicating that it wants to open the connection. After the client replies with an ACK, the connection can be safely disconnected

Interviewer: Why is Http stateless? How to solve Http stateless protocol?

I: the HTTP protocol itself does not save the state. It does not save the communication state of the request and response directly, so it is a stateless protocol. Because there are scenarios where we need to save the user’s login information, cookies are introduced to manage state. When the client requests the server for the first time, the server will generate a cookie to add to the response header, and the client will carry this cookie information with each subsequent request.

Interviewer: What’s the difference between cookies and sessions?

The Cookie is generated by the server and written in the response header of the request. The browser saves the Cookie. Server passSet-CookieField to set the Cookie to the client, property:

Name= Value Sets the Name and value of the cookie
Expires Sets the expiration date of a Cookie
Domain = Domain name Specifies the domain name that takes effect
Secure indicates that cookies are sent only for Https communication
HttpOnly: cannot be accessed by javascript

Session is also generated by the server and represents a piece of memory in the server. Each client has a Session and the sessions between clients are independent of each other. Each time the client initiates a request, it will bring a Cookie, which generally contains a JSESSIONID. The server uses this JSESSIONID to find the corresponding Session of the client, so the login information of ordinary users will be stored in the Session. This also solves the problem of Http protocol statelessness

Interviewer: What are the Http request methods? How do you choose which method to use?

GET: Obtain resources. So the query operation usually uses GET
POST: Transmits the entity body and creates the update operation with POST
PUT: transfers files
HEAD: Retrieves the header of a packet. You can use this method if you want to query the header information of a request
DELETE: deletes resources. Therefore, DELETE is used to DELETE resources
OPTIONS: asks the server which methods are supported. The response header returns Allow: GET,POST,HEAD
TRACE: indicates a tracing path. Set the number of the max-forwards field in the request header. After passing through each server, the number is reduced by one. When it reaches zero, it is returned directly

Interviewer: How does Http implement persistent connections?

Me :(wool, I’m just a junior programmer coming to interview for Java, why do you want to rub Http against me repeatedly? ! But that’s ok, my skin is very, this problem I would) I again: in the early days of the HTTP protocol, each a HTTP communication is about to disconnect a TCP connection, when the content of the transmission can also accept less, now each web page usually contains a lot of pictures, each request leads to connect and disconnect the TCP connection, increase the communication overhead.

In order to solve this problem, we came up with a persistent connection method, also called keep-alive, which keeps the TCP connection as long as one end does not break the connection. Persistent connections allow clients to send multiple requests simultaneously without waiting for one response after another.

Interviewer: How is a breakpoint continuation of a large file implemented?

Me: THE HTTP request header has a Range field; If we have a network interruption while downloading a file, it would be a waste of time to start from the beginning, so we can pick up where we left off. Specific operation:

Range: bytes=5001-10000
Copy the code

Or specify all data after 5001

Range: bytes=5001-
Copy the code

The status code returned by the response is 206

Interviewer: You mentioned status codes just now. What are the common Http status codes?

Me :(the interviewer forgot to write on my resume that I used to be an excellent student with a good memory and never lost an endorsement)

Me: HTTP status codes fall into four main categories:

2xx: success code, indicating that the request is successfully processed
3xx: redirection status code, indicating that additional operations are required to complete the request
4xx: indicates the client error status code
5xx: indicates the server error status code

Common status codes are: 200 (the request is processed normally), 204 (the request is processed successfully, but no resource is returned), 206 (the client makes a Range request, and the response packet contains the content-Range), 301 (permanent redirection, the requested resource is assigned to a new address), 302 (temporary redirection, Expect the user and request the new address), 400 (error occurred in the client request message, usually parameter error), 401 (client unauthenticated error), 403 (no permission to access the resource), 404 (not found the requested resource), 405 (the request method is not supported. If the server supports GET, 500 (server exception), 503 (server cannot provide service)

Me :(this I can remember, isn’t it give me a thumbs up) (has been crazy hint brothers thumbs up, don’t white whoring oh)

Interviewer: What are the components of an HTTP packet?

Me: There are two types of packets: request packets and response packets.

The request packet consists of three parts:

Request line: contains request method, URI, and HTTP version information
Request header field
Request content entity

The response packet consists of three parts:

Status line: contains HTTP version, status code, status code reason phrase
Response header field
Response content entity

Interviewer: What are the problems with Http and what is HTTPS?

Me: THE Http problem

Plaintext communications are not encrypted and the content can be eavesdropped
Do not authenticate the identity of the correspondence party and may be disguised
Failed to verify packet integrity and may be tampered with

HTTPS is HTTP plus SSL encryption (generally SSL secure communication lines) + authentication + integrity protection

Interviewer: How does HTTPS keep data secure?

Me: First, there are two encryption mechanisms

Symmetric encryption: The client and server use the same key for encryption, which is efficient
Asymmetric encryption: Encryption is divided into public key and private key. The public key can be transmitted over the network, but only the private key can decrypt the encrypted content, which is inefficient

Due to the special characteristics of the two encryption mechanisms, HTTPS uses a mixed encryption mechanism. It uses asymmetric encryption in the stage of exchanging keys and symmetric encryption in the stage of establishing communication exchange packets

For example, visit Silently9527.cn

The browser makes a request to the server, and the server returns the certificate and key upon receipt of the request
The browser verifies the validity of the certificate with the third-party certificate authority. If the certificate is invalid, a warning page is displayed, allowing users to continue accessing the certificate
If the certificate is valid, the browser generates a random string, encrypts it using the public key and sends it to the server. The server decrypts the random string using the private key, and the server returns the random string encrypted content to the client
Both client and server will then encrypt symmetrically through random strings

Interviewer: Why do you need a certification authority? Is it not secure without HTTPS?

Me: Although HTTPS can be encrypted, since the request can still be intercepted, how can the client know that the public key returned to them is from the real server and not from the attacker? This requires verification of the validity of the certificate, so third-party certification bodies need to be introduced. Generally, the HTTPS certificate needs to be purchased from a third-party organization. If the HTTPS certificate is generated by ourselves, a warning will pop up for browser verification.

Interviewer: How does the browser ensure that the certificate verification process is secure?

Me: The browser also uses asymmetric encryption to verify the certificate to the CA. It is very difficult to transfer the public key safely to the client, so browser developers usually embed the public key of the common ca inside the browser

Interviewer: HTTP related protocol master is ok, let’s continue to talk about Java…..

You can’t help but give yourself a thumbs-up for making it this far as a rookie! (Thumbs up again)

Write to the end (pay attention, don’t get lost)

This interview story is purely fictional. Please don’t take it seriously.

There may be more or less deficiencies and mistakes in the article, suggestions or opinions are welcome to comment and exchange.

Finally, white piao is not good, creation is not easy, I hope friends can like the comments pay attention to three even, because these are all the power source I share 🙏

Reference: Illustrated HTTP