A complete HTTP service process

1. Enter the URL in the search box of the browser. The UI thread parses the URL. If it’s not a URL, hand it over to a search engine. If it is a URL, it generates the request and the UI thread passes it to the network process.

A valid URL contains the protocol, domain name, port number, requested resource path, parameter information, and hash value.Copy the code

5. When sending an HTTP request, the browser will first check whether there is a valid cache (generally only GET requests are cached). If there is a cache, the browser will determine whether there is a cache

HTTP <meta> if cache-control no-cache,no-store means that the Html resource is not allowed to be cached in the browser. Caches have strong caches and negotiated caches. Strongly cached fields in the response header are: cache-Control :max-age :GMT If either of the preceding two fields has not expired, the browser's cached data is used directly. The status code of the response line is 200. If both of these expire, a request message is sent to the server for negotiation cache. Negotiation cache exists after DNS,TCP, and connection. The fields for the negotiated cache are: (response)ETag/ (request) if-not-match: Hash check value (response) last-Modified/(request) if-Modified-since: Compares the GTM time with the data on the server. Return the status code 304 and continue using the cache. If the data has been modified, the updated data is returned and the cache field is updated with a status code of 200Copy the code

2. The first step is to convert domain names into Ip addresses.

The first step is a recursive lookup in the local DNS cache. Search for the browser's own domain name cache, the operating system's domain name cache (ipconfig/displaydns), and hosts file (C:\Windows\System32\drivers\etc). When no local DNS server is found, iterative queries are used to find root DNS server, top-level DNS server, authoritative DNS server, and so on.Copy the code

3. Establish the TCP connection

- The client sends a packet with SYN = 1 and sequence number seqX, indicating that it wants to establish a connection. - After receiving the packets, the server sends SYN = 1, ACK = 1, ACK = seqX+1, seqY packets. - After receiving the packet, the client sends ACK = 1, ACK = seqY+1, seqX+1 (or carries data in the data section). The three-way handshake is complete and TCP establishes the connection. The third handshake is used to prevent invalid connection requests from reaching the server and causing the server to open the connection by mistake. If a connection request from a client is stuck on the network, the client waits for a timeout retransmission period before reconnecting. But the stranded connection request will eventually reach the server, and if the three-way handshake is not done, the server will open two connections. If there is a third handshake, the client ignores the connection confirmation sent later by the server for a lingering connection request and does not open the connection again. TCP is connection-oriented, UDP is connectionless, that is, there is no need to establish a connection before sending data - TCP provides reliable service, that is, TCP connection to transmit data, no error, no loss. UDP does its best to deliver, but does not guarantee data reliability - TCP can only be one-to-one, UDP can be one-to-one, one-to-many, many-to-oneCopy the code

Establish TLS connections (4 times) (if HTTPS)

TLS————RSA handshake (Transport layer security) 1. Client Hello The client initiates a request for encrypted communication to the server, including the TLS version number supported by the client, client random number generated by the client, and password suite list supported by the client. 2. Server Hello Sends a response to the client to confirm the TLS version number, server Random number generated by the server, password suite list, and digital certificate of the server. 3. The client replies that the client receives an integer and authenticates the certificate by using the public key of the CA in the browser or operating system. If no problem exists with the certificate, the public key of the browser is extracted from the certificate, and the public key is used to encrypt the packet and send it: a. A random number (pre-master key) b. Notification of encrypted communication algorithm change C. 4. Each party uses the existing three random numbers and encryption algorithm to generate the session key. 5. After calculating the session key, the server sends the following message to the client: a. Notifies the client that the encryption algorithm is changed b. C. Summarize all previous messages. At this point, the four TLS connections are ended, and subsequent messages are encrypted using the session key.Copy the code

4 Send an HTTP request

An HTTP request packet consists of three parts: request line (request method, URL, protocol, and version), request header, and request body. -post: create a resource on the server -put: modify a resource on the server -delete: create a resource on the server -put: modify a resource on the server -delete: create a resource on the server GET is a request to the server for a resource (safely idempotent), and POST is a submission of data to a specified URL (not idempotent). The parameters of a GET request are contained in the URL and visible to the public, so security is poor. The parameters of a POST request are contained in the request body and invisible to the public, so security is relatively secure. Browsers proactively cache resources related to GET requests. But POST doesn't have a GET request and it's harmless when the browser rolls back, but POST will re-send the request and the parameters of the GET request will be saved in the browser's history, but POST won'tCopy the code

6. Respond to the HTTP request

Response message: Response line (protocol, version, and status code) Response header Response body ## Status code (response header) -1XX Provides information, indicating that it has been accepted and is being processed. 100 continue Provides information. 101 Switching protocol: If the server is switched, 101-2xx succeeded 200: OK 201: created 202: received 203: Unauthorized content 204: Request succeeded, but no resource returned 205: Reset content 206: The client made a range request, and the server successfully executed the force part of the request, returning the entity Content with the specified range in the header. 302: Temporary redirection 303: Another URL exists for the resource, which can be obtained using the GET method 304: No modification (this status code is returned when negotiation cache hits) 305: Using proxy 307: Temporary redirection 308: permanent redirection -4XX client error 400: the packet has a syntax error 401: authentication information is required 403: rejected by the server 404: The resource does not exist on the server 405: the server forbids using this method -5xx Server errorCopy the code

7. The browser parses the HTML and requests resources from the code

The network process passes the HTML to the renderer process through the IPC pipe. The main thread of the renderer process parses the HTML and constructs the DOM data structure. HTML first through lexical analysis of the input HTML content parsing into a number of tags, according to the tag DOM tree construction, in the process of construction will create a Document object, and then document object as the root node of the DOM tree constantly modify, add various elements to it. Additional resources are introduced in HTML: images, CSS,js (in this case, the connection:keep-alive attribute in the HTTP request header is used to request multiple resources in a single connection). . Download or load images, CSS will not block HTML parsing (does not affect the structure of the DOM tree), but when js loading will stop parsing the HTML tree, and to download and execute JS scripts (JS may change the structure of the DOM tree). A normal <script> tag is loaded and parsed synchronously and blocks DOM rendering, which is one of the reasons we often write <script> at the bottom of <body> to prevent a long blank screen caused by loading resources. Another reason is that JS may be doing DOM manipulation, so do it after the DOM is fully rendered. Script also has two asynchronous loading modes. Async properties load asynchronously, but still block HTML parsing (and out of order) when executed. Defer loaded the script asynchronously but delayed execution (after the DOM tree was built).Copy the code

8. Browser renders the page

After parsing the HTML page, you get a DOM tree, and then the main thread parses the CSS to determine the specific style of each DOM node. CSS has inheritance, which means that the child nodes inherit the style of the parent node. Then layout: The main thread generates a Layout tree by traversing the DOM tree and styles, with coordinates and border sizes recorded for each node. Layout trees and DOM trees are not one-to-one. DOM nodes with Display: None do not appear in the Layout tree, whereas elements with content in the before and after classes do. The main thread traverses the Layout tree and draws a list of the order in which it is drawn (layers are generated, elements with well-defined attributes, elements with transparent attributes, elements with CSS filters, clipping situations, etc., all with cascading context attributes). To generate a Layer tree. The main thread passes this information to the synthesizer thread, which blocks each layer to the raster thread for rasterization, and then passes the synthesizer a map block (which records the memory address of the rasterized content) when the rasterization is complete. Synthesizer thread will synthesize a synthesizer frame according to this information, and then pass it to GPU for rendering through the browser process, and display it on the page. ***** rewind and redraw 1. When you change the position and size of an element, or change the window size, you recalculate the position of the element, starting with the calculation of the layout. 2. Changing only the color properties of an element does not trigger a relayout, but does trigger style calculation and drawing, and the subsequent process called redrawing. Rearrangement, redrawing and JS will occupy the main thread, so if js takes too long, the page will be stuck. Optimize method 1. RequestAnimationFrame () is used. This method is called on each frame, so that the JS running task can be divided into smaller blocks distributed on each frame, and js execution can be paused before the time of each frame runs out, returning to the main thread, so that the layout and drawing can be executed on time for the next frame. The React rendering engine React Fiber is optimized using this API. 2.CSS animation has a property transform. The animation realized by this property will not be laid out and drawn, but run directly in the synthesizer thread and raster thread, so it will not affect JS execution. transform:translate(x,y) transform:scale(x,y) transform:rotate(deg)Copy the code
  1. Close the connection
## TCP four waves - The client sends a packet with FIN = 1 after sending data. - The server receives the packet with ACK = 1, and the client disconnects unilaterally. - After data transmission is complete, the server sends data packets with FIN = 1. - the client sends an ACK = 1 packet, waits for 2MSL, and closes the connection. ### why wait for 2MSL to ensure that the last confirmation message arrives. If USER B does not receive the connection release request from User A, user A resends the connection release request. User A waits for A period of time to process the connection release request. The purpose of waiting a period of time is to make all the packets generated during the duration of the connection disappear from the network, so that the old connection request packets do not appear in the next connection.Copy the code

The network structure

Classical TCP/IP five-layer structure: application layer Transport layer Network layer link layer physical layer.

OSI seven layer structure: application layer, presentation layer, session layer, transport layer, network layer, link layer, physical layer.

  • The application layer is the application service program used to provide users with various services. Common protocols include DNS, FTP, and HTTP.
  • Transport layer: Reliable end-to-end transmission of datagrams. The related protocols are TCP and UDP.
  • Network layer: it is responsible for encapsulation and forwarding of point-to-point data packets and searching for routes. The device has a router. The protocols include IP, ICMP, IGMP, and ARP.
  • Link layer: responsible for point-to-point data frame transmission, the device has a bridge, related protocols: CSMA/CD, CSMA/CA, etc.
  • Physical layer: responsible for the transmission of bitstream from point to point. Equipment includes hubs and Repeaters.

CDN

Proxy server is a proxy server that responds to requests from clients. Implements load balancing.

Advantages and disadvantages of the HTTP protocol

Hypertext Transfer protocol, the convention and specification for transferring text, pictures or hypertext data between two computers.

Features: 1. Flexible and extensible, you can customize the header key-value pair, and transmit various contents. 2. Request response mode 3. Reliable transmission (based on TCP) 4. Stateless: helps to reduce network overhead and improve transmission rate, such as in the live broadcast industry. Disadvantages: 1. Stateless: Even if the same client sends two consecutive requests to the server, the server cannot identify the requests sent by the same client and cannot save the client information. 2. Plaintext transmission: the header is mainly in the form of text, so that the information is exposed to the outside world, bringing convenience to the attacker. 3. Queue header blocking: http1.1 is a common TCP connection, when one request is too long, other requests are blocked.Copy the code

HTTP packet Structure

Common HTTP request headers

POST /api/5632836/envelope/? Sentry_key = 39616798988 c45a88e52d89282f7dcd1 & sentry_version = 7 HTTP / 1.1 Host: the initiator domain Connection: Keep-alive content-Length: 18861 (data Length) User-agent: Mozilla/5.0 (Linux; The Android 6.0. Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Mobile Safari/537.36 Content-Type: text/plain; Charset = utf-8 (send the packet data types) Accept: * / * (receive data types) Origin: Sec - Fetch - Site: https://hejialianghe.gitee.io cross-site Sec-Fetch-Mode: cors Sec-Fetch-Dest: empty Referer: https://hejialianghe.gitee.io/ Accept-Encoding: Gzip, DEFLate, BR (Acceptable encoding type) Accept-language: zh-cn,zh; Q = 0.9Copy the code

Basic method

  • GET: Obtains resources from the server
  • HEAD: Knowing the state of a resource without retrieving it (only the header is returned in the server response)
  • POST: Creates resources on the server
  • PUT: Modifies resources on the server
  • DELETE: deletes resources from the server

Response headers

HTTP/1.1 200 OK Server: nginx Date: Thu, 18 Feb 2021 01:41:36 GMT Content-Type: Application /json Content-Length: 41 Connection: keep-alive access-control-allow-origin: https://hejialianghe.gitee.io (allow cross-domain access) access - control - expose - headers: x-sentry-error, x-sentry-rate-limits, retry-after vary: Origin x-envoy-upstream-service-time: 0 Strict-Transport-Security: max-age=31536000; includeSubDomains; preloadCopy the code

HTTP2.0

Compared with the HTTP1.1

1. Head compression. If multiple requests are made at the same time and have similar fields in the header, the protocol eliminates duplicate parts. (HPACK algorithm: maintain a header table on the client and server, all fields are stored, generate an index number, and then only issue the index number) 2. Binary. HTTP1.1 or plain text, version 2 fully use binary form (header frame + data frame) 3. Data flow. Version 2 is not sent sequentially, so packets need to be marked and clients can specify priorities. 4. Multiplexing. Concurrent requests in one connection. 5. Server push: The server can actively send information to the client.Copy the code

Added HTTPS

Symmetric encryption Algorithm AES ChaCha20 Asymmetric encryption algorithm: used to encrypt session keys, including RSA and ECDHE. RSA is a traditional encryption method. The server provides a public key, the client encrypts the session key with the public key, and the server decrypts the session key with the private key. But this algorithm will fail if the server's private key is stolen. The ECC elliptic curve feature is used to calculate the public key during ECDHE. The two sides first confirm which elliptic curve and curve base point G to use, and then generate a private key D respectively. D *G generates a public key, and then exchange the public key. A shared key (session key) is generated based on the associative and commutative laws of multiplication on an elliptic curve. Algorithms are hash functions, such as MD5 and SHA-1Copy the code

Browser storage

cookie

Wangdoc.com/javascript/… HTTP cookie is a small piece of data that the server sends to the user’s browser and keeps in the user’s local area. It will be carried by the packet when the browser sends a request to the same server next time.

Cookies enable the stateless HTTP protocol to record state information. According to the same-origin policy, two urls can share cookies as long as they have the same domain name.

classification

Cookies are always stored on the client and can be divided into memory cookies and hard disk cookies.

In-memory cookies are stored in memory, maintained by the browser, and disappear after the browser closes, so they are also called non-persistent cookies.

Hard disk Cookies are stored on hard disks and disappear when the expiration time expires or the user manually clears them. They are also called persistent cookies.

The cookie is set on the server

Cookies are typically set on the server side and used in response HTTP headers

Set-Cookie:foo=bar; Expires=Wed,21 Oct 2015 07:28:00GMT; Max-age=38000; Domain=example.com; path=/blog; Secure; HttpOnly.Copy the code

Foo is the name of the cookie, and bar is the value of the cookie. The cookie is only carried when you visit example.com/blog, and the expiration time is the latter when both Expires and Max-ages occur. Secure can only be sent when HTTPS is used. HttpOnly means that cookies cannot be retrieved by JavaScript and can only be carried when the browser sends an HTTP request. Chrome has added a SameSite property to prevent CSRF attacks and user tracking.

(A CSRF attack is when a malicious website obtains a user’s cookie and sends an HTTP request.)

SameSite: Strict Prohibits third-party cookies completely. Cookies are added only when the url of the current web page matches the request target.

SameSite:Lax, slightly relaxed, the Get request that navigates to the target url can carry a cookie(for example, if there is a gold-digging link on another page, clicking on the link to go to gold-digging will carry a cookie, so it is logged in)

The client sends a cookie

When the browser makes a request to the server, it carries a cookie in the request header: cookie foo=bar

Read the current cookie

When HttpOnly is not set, all cookies of the current page can be obtained via document.cookie (preventing XSS attacks)

Cookie shortcomings

  • Cookies are added to each HTTP request, so they waste bandwidth at request time.
  • The cookie size is set to about 4KB.
  • They could be intercepted

storage

Generally, the maximum storage is 5M (depending on the browser) and also complies with the same origin policy.

Setting: the window. The localStorage. SetItem (' foo ', '3'); Access: window. LocalStorage. The getItem (' foo '); Removed: window. LocalStorage. RemoveItem (' foo ');Copy the code

sessionstorage

The data used to store the current session is stored locally and does not go between the client and server frequently.

Not across domains, not across pages, once the page is closed, the data is erased.

localstorage

Persistent local storage, stored locally and infrequently.

Can only store strings

Also not cross-domain, but cross-page, not cross-browser.

Closing the browser will not be cleared, but only if the user takes the initiative to do so.

indexedDB

Large capacity, noSQL, shared domain.

IndexedDB is a built-in browser database that is non-relational and stores data in JSON format.

IndexedDB, as the name implies, is a built-in browser database that is non-relational, that is, you do not need to write SQL statements to manipulate the database, and the data is stored in JSON format.

IndexedDB is not a NoSQL database. IndexedDB has no concept of tables. In IndexedDB, the object Store is called an object store.

3. Each operation of IndexedDB is a transaction, and each operation to the database is performed within the context of a transaction.

4. Each operation of IndexedDB requires opening the Object Store before performing the specified operation.

5. All operations on IndexedDB are asynchronous.

Specific use of IndexDB

What is the same origin policy

SOP is the core security function of the browser. Without the same Origin policy, the browser is vulnerable to XSS and CSFR attacks.

The same-origin policy protects user information and prevents malicious websites from stealing data.

The same origin policy addresses three types of behavior

  • Cookie, localStorage and indexDB.
  • DOM nodes and JS objects
  • An AJAX request

What is cross-domain

For security reasons, javascript does not allow cross-domain calls to objects on other pages. When a request URL protocol, domain name, or port is different from the current page URL, it is called cross-domain.

www.juejin.com:8080/abcdef.

This is a URL. HTTPS is the protocol, www.juejin.com is the domain name, 8080 is the port number, and abcdef is the resource file number.

Why do cross-domain risks cross domains

Because there may be multiple different subdomains within a company, these subdomains need to call resources from each other.

Some external apis need to be called, and there are cross-domain issues.

Allow cross-domain instances

  • The script tag
  • Img tag Link tag @font-face
  • Embedded resources for video and audio
  • iframe
  • websocket

JSONP

Using the cross-domain nature of script tags, a web page dynamically adds a Script element and requests JSON data from the server. The server receives the data and sends it back to a specified callback function.

Disadvantages: Only SUPPORT GET requests.

<! DOCTYPEhtml>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="Width = device - width, initial - scale = 1.0">
    <title>test</title>
</head>
<body>
    <h1>The client</h1>
    <script>
        function f(str){
            console.log(str);
        }
    </script>
    <script src = "http://localhost:91? callback=f"></script>
</body>

</html>
Copy the code
let express = require('express');
let app  = express();
app.use(express.static(__dirname))
app.listen(90);

let app2 = express();
app2.get('/'.function(req,res){
    let functionName = req.query.callback;

    res.send(functionName+"(' hello ')");
})
app2.listen(91);
Copy the code
<script type="text/javascript">
function addScriptTag(src){
var script = document.createElement('script');
script.setAttribute("type"."text/javascript");
script.src = src;
document.body.appendChild(script);
}
window.onload = function(){
addScriptTag("Http://ajax.googleapis.com/web? v = 1.0 & q = = apple&callback result");
}
// Custom callback result
function result(data) {
alert(data.responseData.results[0].unescapedUrl);
}
</script>
Copy the code

Intermediate server proxy proxy

Implemented with middleware, browsers have cross-domain restrictions, but there are no cross-domain restrictions between servers, so middleware is the server. (The server simply forwards the data).

Front-end deployment IP address: 127.0.0.1:8000 Intermediate server (proxy) : 127.0.0.1:8000. Target server ADDRESS: 127.0.0.1:8080

Ngnix reverse proxy

CORS (commonly used)

Cross-domain resource sharing is the fundamental solution to cross-domain AJAX requests. Because cross-domain issues are inherently limited within browsers, cross-domain sharing can be achieved by using CORS to add cross-domain fields to the response header. (Implement CORS interface in server)

<! DOCTYPEhtml>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="Width = device - width, initial - scale = 1.0">
    <title>test</title>
</head>
<body>
    <h1>The client</h1>
    <script>
        fetch("http://localhost:91/").then(
            res= > res.text()
        ).then(
            data= > {console.log(data)}
        )
    </script>

</body>

</html>
Copy the code
let express = require('express');
let app  = express();
app.use(express.static(__dirname))
app.listen(90);

let app2 = express();
app2.get('/'.function(req,res){
    res.header("Access-Control-Allow-Origin"."*");
    res.send('hello');
})
app2.listen(91);
Copy the code

The specific explanation

There are two types of CORS requests: simple and non-simple.

The request header contains Accept accept-language content-language last-event-id content-type: Applaction/X-www-form-urlencoded, multipart/form-data, Text /plain will put Origin in the header if the request is a simple one: If the source domain is not within the permitted range, the server normally returns an HTTP response. If the browser does not find access-Control-Allow-Origin in the response header, it throws an error and intercepts the response. If Origin is within the license, Add access-Control-allow-origin (* or the original domain name) to the response header Access-control-allow-credentials :true (cookies are allowed) // Set xhr.withCredentials = true for non-simple requests: 1. The methods are PUT/DELETE. 2.Content-type:application/json The browser sends a pre-check request when detecting a non-simple request. Request header: OPTIONS: Orgin access-Control-request-method: non-simple Request Method Access-Control-request-headers: the server checks the corresponding field after receiving the Request. Access-control-allow-orgin: all allowed fields access-Control-allow-methods: all allowed Methods access-Control-allow-headers: all allowed Headers If not, it will respond with a message without the corresponding header, which the browser will intercept and throw an error. After the server prechecks the request, the client can proceed with the normal simple request processCopy the code

websocket

Websocket is a persistent protocol of HTML5, which implements full duplex communication between browser and server. Websocket and HTTP are both application layer protocols based on TCP. This protocol does not implement the same origin policy and can communicate across domains as long as the server supports it.

XSS and CSRF

CSRF cross-site request forgery

Without the user’s permission, secretly use the user name, send malicious request attacks. Cookies are usually used to defraud the server of trust.

CSRF (usually) occurs in third party domain names. CSRF attackers (usually) cannot obtain information such as cookies, but simply use.

Cognate Monitoring CSRF Token: requires the server to generate a Token and place it on the page. The page submits the request with this Token. The server removes the Token from the Session and compares it with the Token in the request for verification. Samesite Cookie attribute:Copy the code

XSS cross-site scripting attacks

XSS is essentially Html injection. Attackers inject malicious JS codes on websites and tamper with client pages, thus stealing private data such as cookies and sessions or redirecting to bad websites.

Don't trust user input: Filter user input. Entity escape for special characters. Do not trust the server completely: escape the server output. Use HttpOnly cookies: Mark important cookies as HttpOnly so that cookies cannot be retrieved using JS code.Copy the code