Browser parsing URL

A standard URL syntax is usually based on a generic nine-part format that the browser will parse from the URL string entered by the user:

<schema>://<user>:<password>@<host>:<port>; <params>? <query>#<frag>
Copy the code

Some of the most important ones are:

  • Schema – Protocol version
  • Host — host address
  • Port – the port number
  • Path — File path
  • Query — query_string Query string

DNS Domain name Resolution

If the host part mentioned above is a domain name, it needs to be resolved by DNS.

Find the local host file

Users can specify the mapping between domain names and IP addresses in the local host file. Therefore, the browser searches for the corresponding IP address in the local host file first and sends a request to the IP address if it exists.

The DNS

  • Example Query the local DOMAIN name server
  • The local DNS server forwards the query packet to the root DNS server for query
  • The root DNS server returns the TOP-LEVEL DNS server address to the local DNS server based on the specific domain name address
  • The local DNS server sends query requests to the TOP-LEVEL DNS server
  • The top-level DNS server returns the permission DNS server address to the local DNS server
  • The local DNS server sends query requests to the permission DNS server

Finally, we get the IP address of the host through DNS resolution

Establishing a TCP Connection

Three-way handshake

The process of three handshakes is a platitude, there are more information, here is not specific verbose, just list the general process:

  1. Client sendSYNRequest to establish a connection
  2. Server sendACKA message andSYNIndicates that the client agrees to establish a connection request and requests to establish a connection
  3. Client sendACKPacket: indicates that the request is received

Use APR protocol to locate target address

In Ethernet, when a host sends a data frame to another host on the same LAN, the device driver must know the Ethernet address in order to send the data. We only know IP addresses. In this case, WE need to use ARP to map IP addresses to Ethernet addresses. When sending the first SYN packet, the IP layer uses ARP to query the MAC address of the target host. (TCP/IP — ARP)

The first SYN packet sent during the first handshake reaches the IP layer through connect(). Then the IP layer obtains the MAC address of the target host by querying the routing table and caches it. Then the MAC address is sent to the network interface for encapsulation by send(). Finally, the data is sent. PS: The article cited in this article explains the working details of this process in different situations in great detail, so I suggest you read it carefully.

Establishing an SSL Tunnel

If the REQUEST uses HTTPS, after the TCP connection is established, an encrypted tunnel, namely SSL, needs to be established over the TCP connection through a four-way handshake.

Four times to shake hands

  1. Client:
    • Send protocol version number
    • Send supported encryption methods
    • generateRandom number 1(client random) and transmitted to the server
  2. Server:
    • Select and send the encryption method to use
    • sendThe digital certificate
    • generateRandom number 2(server random) and transmitted to the client
  3. Client:
    • validationThe digital certificate
    • To generate aRandom number 3(premaster secret), usingThe public keyEncrypt it and send it to the server
    • useclient random,server random,premaster secretgenerateThe session key(session key)
  4. Server:
    • useThe private keyDecrypt the encrypted string and obtainpremaster secret
    • useclient random,server random,premaster secretgenerateThe session key(session key)

The four-way handshake uses asymmetric encryption to enable the client and server to obtain and hold the same session key and use the key value to perform symmetric encryption on subsequent sessions.

Send an HTTP (s) request

The browser constructs the request message using the information obtained in the first two steps and sends the HTTP request to the server over the TCP connection established in the third step. The basic format of the request packet is as follows:

<method><request-URL><version>  
<headers>  
<entity-body>
Copy the code

These three parts are the start line, the head, and the body. By default, a browser request is a GET request, so there is no body. Here is a hypothetical HTTP message with the first action starting line and the second and third action heading:

GET /index.html HTTP/1.1 Accept: text/ HTML Host: www.foo.comCopy the code

The server proxy processes the request

Server-side proxies are server software such as Nginx and Apache. They map requests to specific files on the server based on configuration files and return them based on file types.

Return to static file (.html)

If the file is a static file of type.html,.txt, or.xml, its contents are returned directly to the client as the entity-body of the response.

Parsing dynamic files (.php)

If the file type is dynamic file such as.php,.jsp, or. Asp, you need to parse it. Here we will only describe the operation on the.php file, using the Nginx server as an example.

  • Nginx is told to process a.php file
  • Nginx calls its ownFast-CGIModule to construct a fast-CGI request
  • Nginx toPHP-FPMSend a fast-CGI request,At this point nginx acts as a reverse proxy server
  • PHP – FPMThe master processReceipt of a request
  • The master assigns requests to specificWorker processes
  • Worker processes use inlinePHP-CGIThe interpreter parses the PHP file, returns the results and generates a response to Nginx
  • Nginx gets the response from PHP-FPM, which is the static file

The server proxy responds to the request

Nginx generates a response packet and sends it back to the client. Only the start line syntax of the response message differs from that of the request message:

<version><status><reason-phrase>
<headers>
<entity-body>
Copy the code

Disabling a TCP Connection

As there are too many relevant information, I will not repeat it here, but just list the general process:

  1. Client sendFINPacket: indicates that all its data has been transmitted
  2. Server returnACKPacket: indicates that the request is received
  3. Server sendFINPacket: indicates that all its data has been transmitted
  4. Server returnACKPacket: indicates that the request is received and enteredTIME_WAITstate

The client parses the returned file

The browser will parse the returned HTML/CSS/JS files and eventually present the page to the user.