Para 0 overview
The first part mainly includes a url parsing process, OSI model, TCP model, as well as TCP/UDP protocol analysis, and common status code.
Para1 starts by typing in a web address
I don’t know if any of you are curious. How the browser finds the corresponding server based on the website and returns the content to us. If you are a person, you can take this article as science view, if you are a software practitioners, you need to network applications can work with a complete hierarchical cognition, here also includes the same technology used: like browser, HTTP, HTML, web server, demand processing and so on.
Para 1.1 Enter the url
Take Baidu as an example.
Para 1.2 Search for the corresponding IP address
This brings us to the first point of knowledge we need to know: DNS.
DNS is a service on the Internet. As a distributed database that maps domain names and IP addresses to each other, it makes it easier for people to access the Internet. DNS uses UDP port 53.
The DNS lookup process in the browser is as follows:
- Browser cache – The browser caches DNS records for a period of time. Interestingly, the operating system does not tell the browser how long to store DNS records, so different browsers store DNS records for a fixed amount of time (anywhere from 2 to 30 minutes).
- System cache – If the desired record is not found in the browser cache, the browser makes a system call (gethostbyName in Windows). This allows you to obtain records in the system cache.
- Router cache – The previous query request is then sent to the router, which typically has its own DNS cache.
- ISP DNS cache – The next thing to check is the server where the ISP caches DNS. You can usually find the corresponding cached record here.
- Recursive search – Your ISP’s DNS server performs a recursive search starting with the domain name server, from the.com top-level domain name server to the Facebook domain name server. The DNS server usually has the domain name in the.com domain name server in its cache, so the matching process to the top-level server is not necessary.
Para 1.3 The browser sends an HTTP request to the Web server
After finding the corresponding IP address through the DNS server, the browser sends an HTTP request to the Web server where Baidu is located.
GET/HTTP/1.1 Host: www.baidu.com Connection: keep-alive Pragma: no-cache cache-control: no-cache sec-ch-ua: "Google Chrome"; v="89", "Chromium"; v="89", "; Not A Brand"; v="99" sec-ch-ua-mobile: ? 0 upgrade-insecure -Requests: 1 User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36 Accept: text/html,application/xhtml+xml,application/xml; Q = 0.9, image/avif, image/webp image/apng, * / *; Q = 0.8, application/signed - exchange; v=b3; Q =0.9 sec-fetch -Site: None sec-fetch -Mode: navigate sec-fetch -User:? 1 Sec-Fetch-Dest: document Accept-Encoding: gzip, deflate, br Accept-Language: zh-CN,zh; Q = 0.9Copy the code
GET The request defines the URL to read: “www.baidu.com”. The browser defines itself (the user-agent header) and what it expects to Accept (the Accept and accept-encoding headers). The Connection header asks the server not to close the TCP Connection for subsequent requests.
Of course, if the authentication request is involved, the request also contains the corresponding Session or Cookie. Cookies are stored as text documents on the client and sent to the server on each request.
Here’s another one: cookies and sessions.
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Cookie Fundamentals
A Cookie is a small piece of text data no more than 4KB, consisting of a Name, a Value, and several other optional attributes used to control the validity, security, and scope of the Cookie.
(1) Name/Value: Set the Name and corresponding Value of the Cookie. For the authentication Cookie, the Value includes the access token provided by the Web server.
(2) Expires property: Sets the lifetime of Cookie. There are two types of storage cookies: session and persistence. When the Expires attribute defaults, it is a session Cookie.
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Session Fundamentals
Session is a data structure stored on the server to track user status. This data can be stored in clusters, databases, and files. The security is better than cookies, and the Session will be invalid when the user launches the browser.
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Security threats
Cookie capture and replay: The user can illegally monitor the user’s network to intercept the user’s request package, or read the Cookie information stored in the hard disk to obtain the user’s Cookie information, and replay the Cookie information to the background through mock. At this time, the background will consider it as a legitimate request to achieve the intrusion.
CSRF attacks: Cross-site Request Forgery (CSRF) means that an attacker may use malicious code in a Web page to force the victim’s browser to send a forged Request to the attacked Web Site, usurping the victim’s identity information, such as authentication cookies. Thus impersonating the victim to perform the specified operation on the target site.
Para 1.4 The server returns a response to the client
After the server receives the request, it will return the corresponding result to the client. The following is the return result of accessing Baidu.
HTTP/1.1 200 OK Bdpagetype: 2 Bdqid: 0x9989168a00155fC1 Cache-Control: private Connection: keep-alive Content-encoding: Encoding gzip Content-Type: text/html; Charset = UTF-8 Date: Sat, 20 Mar 2021 16:18:56 GMT Expires: Sat, 20 Mar 2021 16:18:56 GMT Server: BWS/1.1 set-cookie: BDSVRTM=88; path=/ Set-Cookie: BD_HOME=1; path=/Copy the code
At this point, the server returns 200 OK to indicate that the resource has been requested successfully, Connection indicates that the Connection has been maintained to the server, and content-Type defines the data format of the response result.
Para 1.5 Browser initiates a formal request
This is similar to sending the first request in the beginning, without much analysis.
Para 1.6 The server parses requests
The server receives the fetch request, processes it and returns a response.
This seems to be a direct task on the surface, but in fact there are a lot of interesting things happening in this process – simple websites like author blog, let alone websites like Baidu with a lot of visits!
- Web Server software Web server software (like IIS and Apache) receives an HTTP request and then determines what request processing to perform to process it. Request processing is a program that can read a request and generate HTML to respond to it (ASP.NET,PHP,RUBY…). .
Here we can see that the Server type of Baidu is BWS, which is assumed to be Baidu Web Server, which is estimated to be a Server developed by itself or based on open source transformation.
- Request processing Request processing reads the request and its parameters and cookies. It reads and possibly updates some data and stores it on the server. The requirements processing then generates an HTML response.
Para 1.7 Server response result
Para 1.8 The browser parses pages
After the browser receives the response from the server, it begins to parse the resource, rendering the HTML first, and then pulling the resource according to the corresponding CSS and JS to complete the rendering.
Para 1.9 Disconnecting a TCP connection
After the data completes the request and returns, the keep-alive property of the Connection can determine whether to disconnect the TCP Connection. HTTP/1.1 generally supports multiple requests for the same TCP Connection, rather than disconnecting once the request is completed under version 1.0. TCP disconnection is different from connection. The disconnection can be divided into active and passive closing, and requires four handshakes. When all the data the browser needs has been loaded, a page is finished.
Para 2 Models of the online world
In the network world, the Internet implements network data parsing, rendering, transmission and other operations through models. The most classic models are OSI seven-layer model and TCP/IP four-layer model.
Here is a basic comparison of the two models.
Para 2.1 OSI seven-layer model
- OSI: Open System Interconnection Reference model
- OSI and TCP/IP mappings and protocols
The basic functions of the OSI model layers are:
The key layer:
Para 3 Common protocols
Para 3.1 TCP
TCP packets user data into a packet segment. When sending data, TCP starts a timer. The other end confirms the received data, reorders the disordered data, and discards duplicate data. TCP provides a reliable connection-oriented byte stream service. Connection-oriented means that two TCP applications (B/S) must establish a TCP connection before exchanging data with each other. This is similar to the process of making a phone call. In a TCP connection, only two parties communicate with each other. TCP reliability comes from:
(1) The application data is divided into the most suitable transmission data block for TCP
(2) After TCP sends a segment, it starts a timer and waits for the destination to acknowledge receiving the packet. If the destination cannot receive an acknowledgement in time, it resends the packet.
(3) When TCP receives data from a connection, it delays sending an acknowledgement by a fraction of a second.
(4) TCP maintains its header and data checksum, which is an end-to-end checksum to detect whether data changes during transmission. (If there is an error, it is not confirmed and the sender will resend it)
(5) TCP is transmitted by IP packets. IP data is out of order. After receiving all data, TCP sorts it and delivers it to the application layer
(6) IP datagrams are duplicated, so TCP is de-duplicated
(7) TCP can provide flow control, and each place of the TCP connection has a fixed buffer space. The TCP receiver allows the other end to send only the data that the cache can accept.
(8) TCP does not interpret the byte stream. The interpretation of the byte stream is explained by the application layer on both sides of the TCP connection.
Para 3.2 UDP protocol
UDP is a transport layer protocol that adds basic services on top of IP’s datagram services: reuse and reuse, and error detection. UDP provides unreliable services and has advantages over TCP:
- UDP has no connection, and there is no time delay required for establishing a connection. In space, TCP requires the maintenance of connection state in the end system, which requires some overhead.
- Packet header overhead is small **, TCP header 20 bytes, UDP header 8 bytes.
- UDP does not have congestion control, so the application layer can better control the data to be sent and the sending time. Moreover, the congestion control does not affect the sending rate of hosts. Some real-time applications require stable transmission speed, which can tolerate the loss of some data, but cannot allow a large delay (such as real-time video, live broadcast, etc.).
- UDP provides best effort delivery and does not guarantee reliable delivery. All maintenance of transmission reliability needs to be done by the user at the application layer. There is no TCP confirmation or retransmission mechanism. UDP does not return an error message to the application layer if the packet is not sent to the peer end due to network reasons
- UDP is packet-oriented. After the header is added to the packets sent from the application layer, the packets are directly delivered to the IP layer without merging or splitting. The boundaries of the packets are retained. After removing the header, the UDP user datagram is delivered to the upper application process intact. The packet is indivisible and is the smallest unit of UDP datagram processing. Because of this, UDP is not flexible enough to control the number and amount of read and write data. For example, if we want to send a packet of 100 bytes, we call Sendto once to send 100 bytes, and the other end needs to use recvfrom to receive 100 bytes at a time, instead of using a loop to fetch 10 bytes at a time.
- UDP is commonly used to transmit a small amount of data at a time for network applications, such as DNS and SNMP. For these applications, TCP costs a lot of money to create, maintain, and dismantle connections. UDP is also commonly used in multimedia applications (such as IP telephony, live video conferencing, streaming media, etc.) where reliable transmission of data is not important, TCP’s congestion control can cause large delays and is not tolerated.
Para 3.3 HTTP
Para 3.3.1 HTTP 0.9
The 0.9 and 1.0 versions are the most traditional request-response mode. The 0.9 version of HTTP protocol is extremely simple. When requesting, it does not support the request header, only supports the GET method. HTTP 1.0 is an extension of version 0.9 with several major changes:
- Add the HTTP version number to the request as follows:
GET/coolshell/index. HTTP / 1.0 HTML
- HTTP now has headers, whether it’s request or response.
- Added HTTP Status Code to the Status Code.
- There are
Content-Type
You can transfer other files now.
Para 3.3.2 rainfall distribution on 10-12 HTTP 1.0
HTTP 1.0 is starting to make this protocol very civilized, an engineering civilization. Because:
- A protocol without version management is a sign of engineering.
- Header is a protocol that decouples metadata from business data, or separates control logic from business logic.
- The emergence of a Status Code allows both requesting parties and third-party monitoring or management programs to have a unified understanding. The key is the separation of control errors and business errors.
Para 3.3.3 HTTP 1.1
HTTP/1.1 mainly addresses the network performance issues of HTTP 1.0, as well as adding some new things:
- You can set the
keepalive
To enable HTTP to reuse TCP links, which saves the huge overhead of TCP’s three-way handshake over a wan on every request. This is what’s called”HTTP long links“Or”Request responsive HTTP persistent links“. HTTP Persistent Connection. - Then support pipeline network transmission, as long as the first request is sent out, do not have to wait for it to come back, you can send out the second request, can reduce the overall response time. (Note: Non-idempotent POST methods or dependent requests cannot be pipelined.)
- Support Chunked Responses, that is, in Response, there is no need to specify
Content-Length
In this way, the client cannot disconnect until it receives the EOF identifier from the server. This technique is also called”Server-side Push model“Or”Server-side Push HTTP persistent links“ - A cache control mechanism has also been added.
- Protocol headers add Language, Encoding, Type and other headers to allow more negotiation between the client and the server.
- And officially added a very important head —
HOST
That way, the server knows which site you’re requesting. Multiple domain names can be resolved to the same IP address. To distinguish the requested domain name, you need to add the domain name information to the HTTP protocol instead of the IP address information translated by DNS. - Officially on board
OPTIONS
Method, which is mainly used in CORS – Cross Origin Resource Sharing application.
Para 3.3.4 HTTP 2.0
The main differences between HTTP/2.0 and HTTP/1.1 are:
- HTTP/2 is a binary protocol that increases the efficiency of data transfer.
- HTTP/2 allows multiple HTTP requests to be made concurrently in a single TCP connection, removing serial requests from HTTP/1.1.
- HTTP/2 will compress the headers, so if you make multiple requests at the same time and their headers are the same or similar, the protocol will help you eliminate duplicates. This is known as the HPACK algorithm.
- HTTP/2 allows the server to cache on the client side, also called server push, which means that if you don’t request something, the server can send it to you and store it in your local cache. For example, if you ask for X, my server knows that X depends on Y. Even though you don’t ask for Y, I send Y back to the client along with X’s request.
Para 4 Common status code
Para 4.1 2 xx
200 OK: Indicates that the request sent from the client to the server is processed and returned.
204 No Content: Indicates that the request sent from the client to the client is processed successfully, but the response packet returned does not contain the body of the entity (No resources can be returned).
206 Patial Content: Indicates that the client makes a Range request and the server successfully executes this part of the GET request. The response packet contains the entity Content in the Range specified by content-range.
Para 4.2 3 xx
301 Moved Permanently: Permanently redirects the requested resource to a new URL, and the new URL should be used afterwards.
302 Found: Temporary redirection indicates that the requested resource has been assigned a new URL, and the request is expected to use the new URL.
The difference between 301 and 302: the former is a permanent move, the latter is a temporary move (the URL may change later)
303 See Other: Indicates that a new URL is assigned to the requested resource. Use the GET method to obtain the requested resource.
Difference between 302 and 303: The latter explicitly states that clients should use GET to obtain resources
304 Not Modified: Indicates that the server allows access to resources when the client sends a request with conditions (if-match, if-modified-since, if-none-match, if-range, or if-unmodified-since header in the GET request packet). However, the request to meet the conditions of the situation returned to change the status code;
307 Temporary Redirect: the browser does not change from POST to GET. (This may vary from browser to browser);
Para 4.3 4 xx
400 Bad Request: indicates that the Request packet contains syntax errors.
401 Unauthorized: HTTP authentication is required without permission.
403 Forbidden: The server denies the access (the access permission is incorrect)
404 Not Found: Indicates that the requested resource cannot be Found on the server. This parameter is also used when the server rejects the request but does Not want to give a reason.
Para 4.4 5 xx
500 Inter Server Error: Indicates that an Error occurs when the Server executes a request, which may be a Web application bug or some temporary errors.
503 Server Unavailable: The Server is temporarily overloaded or is being stopped for maintenance and cannot process requests.
Para 5 References
What happens when the browser enters a web address
Browser URL resolution process
OSI network model
TCP protocol details