Interview HTTP: 99% of interviewers ask these questions

Difference between HTTP and HTTPS

HTTP is a Hypertext Transfer Protocol (Hypertext Transfer Protocol). HTTP is a Protocol and specification for transmitting Hypertext data, such as text, pictures, audio and video, between two points in the computer world

The main content of HTTP is divided into three parts: Hypertext, Transfer and Protocol.

Hypertext is more than just text, it can also transfer pictures, audio, video, and even click on a text or imagehyperlinksThe jump.
These concepts can be collectively referred to as data, and transmission is the process in which data is transferred from one end system to another through a series of physical media. Usually we call the party that transmits the packetThe requester, the party receiving the binary packet is calledReply party.
Protocol refers to the norms for transferring and managing information on networks (including the Internet). Just as people need to follow certain rules to communicate with each other, computers need to follow certain rules to communicate with each other. These rules are called protocols, but network protocols.

Speaking of HTTP, the TCP/IP network model has to be mentioned, which is usually a five-tier model. As shown in the figure below

However, it can also be divided into four layers, that is, the link layer and the physical layer are expressed as the network interface layer

Another is the OSI seven-layer network model, which adds a presentation layer and a session layer on top of the five-layer protocol

The full name of HTTPS is Hypertext Transfer Protocol Secure. From its name, we can see that HTTPS is more Secure than HTTPS. In fact, HTTPS is not a new application-layer Protocol. It is a combination of HTTP + TLS/SSL, and security is what TLS/SSL does.

In other words, HTTPS is HTTP with SSL on top.

So, what are the main differences between HTTP and HTTPS?

The simplest, HTTP protocol on the address bar ishttp://The protocol for HTTPS in the address bar ishttps://At the beginning

http://www.cxuanblog.com/
https://www.cxuanblog.com/
Copy the code

HTTP is a protocol without secure encryption. Its transmission process is easy to be monitored by attackers, data is easy to be stolen, and sender and receiver are easy to be forged. HTTPS is a secure protocol, which can solve these problems through key exchange algorithm – signature algorithm – symmetric encryption algorithm – digest algorithm.

The default port for HTTP is 80 and the default port for HTTPS is 443.

HTTP Get is different from Post

HTTP includes many methods. Get and Post are the two most commonly used methods in HTTP. Basically, 99% of HTTP methods are used in Get and Post methods, so it is necessary for us to have a deeper understanding of these two methods.

The get method is generally used for requests, such as when you type in the browser address barwww.cxuanblog.comA get request is sent, and its main feature is to ask the server to return the resource, while the POST method is generally used forThe < form > formGet is equivalent to a pull/ pull operation and POST is equivalent to a push/ push operation.
The get method is not secure, because your request parameters will be spelled after the URL in the process of sending the request, making it easy for attackers to steal your information and cause damage and forgery.

/test/demo_form.asp? name1=value1&name2=value2Copy the code

The POST method puts parameters in the request body, which is not visible to the user.

Asp HTTP/1.1 Host: w3schools.com name1=value1&name2=value2Copy the code

A GET request has a URL with a length limit, whereas a POST request places parameters and values in the message body, with no requirement for data length.
Get requests are actively cached by browsers, whereas POST requests are not, unless set manually.
Get requests are harmless when repeated back/forward operations are performed in the browser, while POST operations resubmit the form request.
A GET request generates a TCP packet during transmission. Post generates two TCP packets during sending. For get requests, the browser sends both HTTP headers and data, and the server responds with 200 (return data). For POST, the browser sends a header, the server responds with 100 continue, the browser sends data, and the server responds with 200 OK (returns data).

What is stateless protocol, HTTP is stateless protocol, how to solve

The Stateless Protocol means that the browser has no memory for transaction processing. For example, a client may close the browser after requesting a web page, and then start the browser again to log in to the site, but the server does not know that the client closed the browser once.

HTTP is a stateless protocol that has no memory for user actions. Most users probably don’t believe that. They probably think that every time they enter a username and password to log in to a site, they will not re-enter the username and password the next time they log in. That’s not really what HTTP does. What does is a mechanism called cookies. It gives the browser the ability to remember.

If your browser allows cookies, viewing the chrome: / / Settings/content/cookies

That means your memory chip is powered…… When you want the server to send a request, the server sends you an authentication message. When the server receives the request for the first time, it creates a Session space (the Session object is created), generates a sessionId, and passes the ** set-cookie in the response header: JSESSIONID=XXXXXXX ** command to send a response to the client requesting to set cookies; After receiving the response, the client sets a Cookie with **JSESSIONID=XXXXXXX ** on the local client. The Cookie expires at the end of the browser session.

Next, when the client sends a request to the same website each time, the request header will carry the Cookie information (including the sessionId). Then, the server obtains the value named JSESSIONID by reading the Cookie information in the request header and obtains the sessionId of the request. In this way, your browser has the ability to remember.

Another way is to use the JWT mechanism, which is also a mechanism to make your browser memorable. Unlike cookies, JWT is information stored on the client and is widely used in single sign-on situations. JWT has two characteristics

JWT Cookie information is stored inThe client, instead of server memory. In other words, JWT can directly authenticate the Token locally. After the authentication, the Token will be sent to the server in the Session with the request. In this way, the server resources can be saved and the Token can be authenticated multiple times.
JWT supports cross-domain authentication, Cookies can only be used inThe domain of a single nodeOr itssubdomainThe effective. If they try to access through a third node, they are blocked. Using JWT can solve this problem, using JWT can passMultiple nodesUser authentication, that’s what we call itCross-domain authentication.

Differences between UDP and TCP

Both TCP and UDP reside in the transport layer of the computer network model and are responsible for transferring data generated by the application layer. Let’s talk about the characteristics and differences between TCP and UDP

What is the UDP

UDP stands for User Datagram Protocol. It speeds up communication by eliminating the need for a so-called handshake operation, allowing other hosts on the network to transfer data before the receiver agrees to communicate.

A datagram is a transport unit associated with a packet-switched network.

UDP has the following characteristics

UDP can support bandwidth-intensive applications that tolerate packet loss
UDP is characterized by low latency
UDP can send a large number of packets
UDP allows DNS lookup, an application-layer protocol built on top of UDP.

What is the TCP

TCP stands for Transmission Control Protocol. It helps you determine whether your computer is connected to the Internet and the data transfer between them. A TCP connection is established through a three-way handshake, which is used to initiate and confirm a TCP connection. Once the connection is established, data can be sent, and when the data transfer is complete, the connection is disconnected by shutting down the virtual circuit.

The main features of TCP are as follows

TCP ensures that connections are established and packets are sent
TCP supports error retransmission
TCP supports congestion control and can delay transmission in case of network congestion
TCP provides error checksums to identify harmful packets.

Difference between TCP and UDP

The following list lists some differences between TCP and UDP for you to understand and remember.

TCP	UDP
TCP is a connection-oriented protocol	UDP is a connectionless protocol
TCP establishes a connection before sending data	UDP can send large amounts of data directly without establishing a connection
TCP rearranges packets in a specific order	UDP packets have no fixed sequence and are independent of each other
TCP transmission is slow	UDP transfers will be faster
The TCP header has 20 bytes	UDP header bytes require only 8 bytes
TCP is heavyweight, requiring three handshakes to establish a connection before sending any user data.	UDP is lightweight. No trace connections, message ordering, etc.
TCP performs error verification and error recovery	UDP also checks for errors, but discards the wrong packets.
TCP has send confirmation	UDP did not send confirmation
TCP uses handshake protocols, such as SYN, SYN-ACK, and ACK	No handshake protocol
TCP is reliable because it ensures that data is delivered to the router.	There is no guarantee that data will be delivered to the destination in UDP.

TCP three handshakes and four waves

The TCP three-way handshake and the four-way wave are also popular interview questions, which correspond to the TCP connection and release process, respectively. Here’s a quick look at these two processes

TCP three-way handshake

Before we look at the process, we need to understand a few concepts

Message type	describe
SYN	This message is used to initiate and establish a connection.
ACK	Confirm the SYN message received by the peer party
SYN-ACK	Local SYN messages and earlier ACK packets
FIN	Used to disconnect

SYN: The full name is Synchronize Sequence Numbers. Is a handshake signal used by TCP/IP to establish a connection. A signal that is first sent when establishing a TCP connection between a client and a server. When the client receives a SYN message, it generates a random value X in its segment.
Syn-ack: After receiving the SYN, the server opens the client connection and sends a SYN-ACK. The acknowledgement number is set to one more than the received serial number, X + 1, and the server selects another random serial number Y for the packet.
ACK: Acknowledge character, indicating that the data sent has been received correctly. Finally, the client sends the ACK to the server. The serial number is set to the received confirmation value, Y + 1.

If you use real life as an example

Xiao Ming – Client and Xiao Hong – Server

Xiao Ming calls Xiao Hong. After he gets through, Xiao Ming says hello, can you hear me? This is equivalent to establishing a connection.
Xiao Hong responds to Xiao Ming, can you hear me? Can you hear me? This is like asking for a response.
Xiao Ming hears Xiao Hong’s response and says, ok, this is a confirmation link. After that, Xiao Ming and Xiao Hong can talk/exchange messages.

TCP waved four times

Using four waves during the connection termination phase, each end of the connection terminates independently. Let’s describe the process.

First, the client application decides to terminate the connection (the server can also choose to disconnect). This causes the client to send the FIN to the server and enterFIN_WAIT_1State. When the client is in the FIN_WAIT_1 state, it waits for an ACK response from the server.
Then, in step 2, when the server receives a FIN message, it immediately sends an ACK message to the client.
When the client receives an ACK response from the server, the client entersFIN_WAIT_2State, and then wait for theFINThe message
After the server sends an ACK message, it sends a FIN message to inform the client that it can shut down the server.
When the client receives a FIN message sent from the server, the client status changes from FIN_WAIT_2 toTIME_WAITState. Clients in TIME_WAIT state are allowed to re-send ACKS to the server to prevent information loss. The amount of time a client spends in TIME_WAIT depends on its implementation, and after waiting some time, the connection is closed and all resources (including port numbers and buffer data) on the client are released.

Again, you can use the call example above to describe it

Xiao Ming said to Xiao Hong, all my things have been said, I have to hang up the phone.
“Received,” said Xiao Hong. “I still have some things to say.”
After a number of seconds, small red also said, small red said, I said, now can hang up
After xiao Ming received the message, he waited for some time and hung up the phone.

Brief the differences between HTTP1.0/1.1/2.0

The HTTP 1.0

HTTP 1.0 was introduced in 1996, and since then its popularity has been phenomenal.

HTTP 1.0 provides only the most basic authentication, and at this point the user name and password are not encrypted, making it easy for prying eyes.
HTTP 1.0 was designed to use short links, where each transmission of data goes through TCP’s three-way handshake and four-way wave, which is less efficient.
HTTP 1.0 only uses if-Modified-since and Expires in headers as criteria for cache invalidation.
HTTP 1.0 does not support breakpoint continuation, which means that all pages and data are sent each time.
HTTP 1.0 assumes that only one IP can be bound to each computer, so the URL in the request message does not pass the hostname.

The HTTP 1.1

HTTP 1.1 came three years after HTTP 1.0 was developed, in 1999, with the following changes

HTTP 1.1 uses the digest algorithm for authentication
HTTP 1.1 uses long connections by default. Long connections are established once and can be transmitted multiple times. After the transmission is complete, the connection can be disconnected only once. The connection duration of a long connection can be specified in the request headerkeep-aliveTo set the
HTTP 1.1 added e-tag, if-unmodified-since, if-match, if-none-match and other cache control headers to control cache invalidation.
HTTP 1.1 supports breakpoint continuation by using theRangeTo implement.
HTTP 1.1 uses virtual networks, where multiple virtual hosts (multi-homed Web Servers) can exist on a single physical server and share a single IP address.

The HTTP 2.0

HTTP 2.0 is a standard developed in 2015 with the following major changes

The head of compressionBecause HTTP 1.1 comes up a lotUser-agent, Cookie, Accept, Server, RangeFields like “, “and”, “can take up hundreds or even thousands of bytes, whereas” Body “is often only tens of bytes, leading to a heavy header. The HTTP 2.0 usingHPACKAlgorithm for compression.
Binary formatHTTP 2.0 uses a binary format closer to TCP/IP and ditched ASCII to improve parsing efficiency
Strengthen the securitySince security has become a top priority, HTTP2.0 generally runs on HTTPS.
multiplexingThat is, each request is used for connection sharing. One request corresponds to one ID, so there can be multiple requests on a connection.

Please describe the common HTTP headers

This is an open question because there are many HTTP headers. Here are just a few examples. For details, please refer to my other article

Mp.weixin.qq.com/s/XZZR0945I…

There are four types of HTTP headers: generic headers, entity headers, request headers, and response headers. Introduce them separately

General header

There are three common headers, Date, cache-control, and Connection

Date

Date is a generic header that can appear in both request and response headers, and its basic representation is as follows

Date: Wed, 21 Oct 2015 07:28:00 GMT 
Copy the code

Greenwich Mean Time, which is eight hours behind Beijing Time

Cache-Control

Cache-control is a common header, which can appear in both request and response headers. Cache-control is a variety of headers. Although this is a common header, there are some features of the request header, some of which are unique to the response header. The main categories are cacheability, threshold, revalidating and reloading, and other features

Connection

Connection determines whether the network Connection will be closed after the current transaction (a three-way handshake and a four-way wave) completes. There are two types of Connection: persistent Connection, that is, the network Connection is not closed after the completion of a transaction

Connection: keep-alive
Copy the code

The other is a non-persistent connection, in which the network connection is closed after a transaction is completed

Connection: close
Copy the code

Other common headers for HTTP1.1 are as follows

Entity header

Entity headers are HTTP headers that describe the content of the message body. Entity headers are used in HTTP requests and responses. The content-Length, Content-language, and Content-Encoding headers are entity headers.

Content-length The entity header indicates the size of the entity body, in bytes, to be sent to the receiver.
Content-language The entity header describes the Language that is acceptable to the client or server.
Content-encoding Another tricky property, this entity header is used to compress the media type. Content-encoding indicates what Encoding is applied to the entity.

Common content encodings include gzip, COMPRESS, Deflate, and Identity. This attribute can be applied to request packets and response packets

Accept-Encoding: gzip, deflate //Content-Encoding: gzip // Response headerCopy the code

Here are some entity header fields

The request header

Host

The Host header specifies the domain name of the server (for virtual hosts) and, optionally, the TCP port number on which the server listens. If no port number is given, the default port for the requested service is automatically used (for example, 80 is automatically used for requesting an HTTP URL).

Host: developer.mozilla.org
Copy the code

The above Accpet, Accept-language, and Accept-Encoding are request headers for content negotiation.

Referer

The HTTP Referer attribute is part of the request header. When a browser sends a request to a Web server, it usually carries the Referer with it, telling the server from which the page was linked, so that the server can obtain some information for processing.

Referer: https://developer.mozilla.org/testpage.html
Copy the code

If-Modified-Since

If-modified-since is usually used with if-none-match to verify the validity of local resources owned by the proxy or client. The update date and time of the resource can be determined by confirming the header field last-Modified.

The server responds with 200 if the resource has been updated since Last-Modified, and 304 if the resource has not been updated since Last-Modified.

If-Modified-Since: Mon, 18 Jul 2016 02:36:04 GMT
Copy the code

If-None-Match

If-none-match HTTP request header makes the request conditional. For the GET and HEAD methods, the server will only send back the requested resource in status 200 if the server does not have an ETag matching the given resource. For the other methods, the request is processed only if the ETag of the final existing resource does not match any of the listed values.

If-None-Match: "c561c68d0ba92bbeb8b0fff2a9199f722e3a621a"
Copy the code

Accept

The accept request HTTP header notifies the client of a MIME type it understands

Accept-Charset

The accept-charset attribute specifies the character set accepted by the server for processing form data.

Common character sets are: UTF-8-Unicode character encoding; Iso-8859-1 – Character encoding of the Latin alphabet

Accept-Language

The header field accept-language is used to tell the server which natural Language sets (Chinese, English, etc.) the user agent can handle, and the relative priority of the natural Language sets. Multiple sets of natural languages can be specified at once.

Request headers we will cover these in general, and an article will delve into all of them in detail. Here is a summary of response headers, based on HTTP 1.1

Response headers

Access-Control-Allow-Origin

A returned HTTP header might have access-Control-allow-Origin, where access-Control-allow-Origin specifies a source that tells the browser to Allow that source to Access the resource.

Keep-Alive

Keep-alive indicates the keepalive time of a non-continuous Connection. You can specify the keepalive time.

Server

The server header contains information about the software used by the original server to process the request.

Overly verbose and detailed Server values should be avoided because they may reveal internal implementation details, which could make it easy for attackers to discover and exploit known security vulnerabilities. For example, write it this way

Server: Apache/against 2.4.1 (Unix)Copy the code

Set-Cookie

Set-cookie Is used by the server to send the sessionID to the client.

Transfer-Encoding

The header field transfer-encoding specifies the Encoding method used to transmit the packet body.

HTTP /1.1 transport encoding is only valid for block transport encoding.

X-Frame-Options

HTTP header fields are self-extensible. Therefore, in the application of Web server and browser, there will be various non-standard header fields.

The header x-frame-options field belongs to the HTTP response header and is used to control the display of Web content within the Frame tag of other Web sites. Its main purpose is to prevent clickjacking attacks.

Here is a summary of the response headers, based on HTTP 1.1

What happens when you enter the URL in the address bar

This is also a frequently asked interview question. So let’s take a look at what happens from the time you type in the URL to the time you respond.

First, you need to enter the URL you want to visit in your browser, as follows

You shouldn’t be able to access it, right

Then, the browser will check whether the domain name is cached by the local DNS based on the URL you enter. Different browsers have different Settings for DNS. If the browser caches the URL you want to access, it will return the IP address directly. If your URL is not cached, the browser will make a system call to query the hosthostsCheck whether the file has an IP address. If yes, the system returns the IP address. If not, a DNS query is issued to the network.

Let’s start with what DNS is. There are two ways to identify hosts on the Internet, by hostname and IP address. We like to remember by name, but routes in communication links prefer fixed-length, hierarchical IP addresses. So there is a need for a host name to IP address translation service, this service is provided by DNS. The full Name of DNS is Domain Name System. DNS is a distributed database implemented by hierarchical DNS servers. DNS runs on UDP and uses port 53.

DNS is a hierarchical database, and its main hierarchy is as follows

In addition, there is another important DNS server, which is the Local DNS server. Strictly speaking, the local DNS server does not belong to the above hierarchy, but the local DNS server is crucial. Each Internet Service Provider (ISP), such as an ISP in a residential area or an organization, has a local DNS server. When a host connects to an ISP, the ISP provides the IP address of a host, and the host has the IP address of one or more local DNS servers. By accessing network connections, users can easily determine the IP address of the DNS server. When a host sends a DNS request, the request is sent to the local DNS server, which acts as a proxy and forwards the request to the DNS server hierarchy.

If the local DNS server fails to find the destination IP address, the local DNS sends a DNS query to the root DNS server.

Note: DNS involves two types of query: Recursive query and Iteration query. “Computer Networks: The top-down Approach” unexpectedly does not give the difference between recursive query and iterative query, looked for information on the Internet probably understand the next.

If the root DNS server cannot tell the local DNS server which TOP-LEVEL DNS server to access next, a recursive query is used.

Iterative queries are used if the root DNS server can tell the DNS server which top-level DNS server it needs to access next.

After the root DNS server > top-level DNS server > authoritative DNS server, the authoritative server informs the local server of the destination IP address, and the local DNS server informs the user of the IP address to be accessed.

Step 3: The browser needs to establish a TCP connection with the target server, and the three-way handshake is required. For details about the handshake, see the preceding answer.
After the connection is established, the browser initiates a request to the target serverHTTP-GETRequests, including urls, use long connections by default after HTTP 1.1, requiring only one handshake to transfer data multiple times.
If the target server is a simple page, it returns directly. However, for some large sites, the site is often not directly returned to the host name of the page, but directly redirected. The status code returned is 301,302 redirection code starting with 3. After obtaining the redirection response, the browser finds the redirection address in the Location item of the response message, and the browser can access it again in the first step.
The browser then resends the request with the new URL and returns a status code of 200 OK, indicating that the server can respond to the request and return the packet.

How HTTPS works

We have described how HTTP works, and here is how HTTPS works. Because we know that HTTPS is not a new protocol, but rather

So, when we talk about HTTPS handshake, it’s actually SSL/TLS handshake.

TLS is an encryption protocol designed to secure communication over the Internet. A TLS handshake is the process of starting and using a TLS encrypted communication session. During the TLS handshake, communication parties on the Internet exchange information with each other, verify cipher suites, and exchange session keys.

A TLS handshake occurs every time a user navigates to a specific website over HTTPS and sends a request. In addition, TLS handshakes also occur whenever any other communication uses HTTPS, including API calls and DNS queries over HTTPS.

The TLS handshake process varies according to the type of key exchange algorithm used and the password suite supported by both parties. We discuss this process in terms of RSA asymmetric encryption. The whole TLS communication flow chart is as follows

Before communication, the HTTP three-way handshake is performed. After the handshake is complete, the TLS handshake is performed
ClientHello: The client sends a message to the serverhelloMessage to initiate the handshake process. Client support will be embedded in this messageTLS Version number (TLS1.0, TLS1.2, TLS1.3), client supported password suite, and a stringRandom number of client.
ServerHello: After the client sends the Hello message, the server sends a message containing the SSL certificate of the server, the password suite selected by the server, and a random number generated by the server.
Authentication: The client certificate authority authenticates the SSL certificate and sends itCertificateA packet containing a public key certificate. Finally the server sendsServerHelloDoneAs ahelloThe response to the request. The first part of the handshake is over.
Encryption stage: After the first phase of the handshake is complete, the client sendsClientKeyExchangeAs a response, this response contains a type calledThe premaster secretThe key string is the string encrypted using the public key certificate above. The client then sendsChangeCipherSpecTell the server to decrypt this using a private keypremaster secretThe client then sends the stringFinishedTell the server it’s done sending.

A Session key is a public key encrypted with a public key certificate.

Secure asymmetric encryption is realized: Then, the server sends it againChangeCipherSpec 和 FinishedTell the client that decryption is complete, thus achieving RSA asymmetric encryption.

Article Reference:

What is a TLS handshake?

Recursive and Iterative DNS Queries

DNS recursive query and iterative query

TCP three-way handshake and four-way wave

HTTP/1.0 AND 1.1, WHAT ARE THE DIFFERENCES?

TCP Connection Termination

Transmission_Control_Protocol

SYN

TCP 3-Way Handshake (SYN, SYN-ACK,ACK)

What are the major improvements in HTTP/2 over 1.0?

TCP vs UDP: What’s the Difference?

Computer network 7 layer model

HTTP often meet test questions