HTTP protocol

1. What is TCP/IP

The communication between different hardware and operating systems requires a rule, which is called protocol. There are different protocols in each stage of network transmission, and the collection of these protocols is called TCP/IP protocol. HTTP is a subset of TCP/IP.

2, TCP/IP layer and the role of each layer?

It is divided into four layers: application layer, transmission layer, network layer and link layer.

Application layer: This layer generates HTTP request packets for the target server based on HTTP, and the server parses the packets based on HTTP.

Transport layer: At this layer, TCP divides HTTP request packets into packet segments based on TCP, and the server merges packet segments based on TCP. The process of establishing and disconnecting a TCP connection is three handshakes and four waves.

Network layer: IP protocol belongs to this layer, the role of the network layer is to determine the route of data transmission. Search for the address of the other party according to IP protocol, and transfer as they go. The IP address specifies the IP address assigned to the node. The MAC address is the fixed address of the NIC. The IP address can be changed, but the MAC address generally remains unchanged. The whole transfer process is like delivery. The user sends the data to the express station, and the express company sends it to a large transfer station.

Link layer: The hardware part of the network transmission process.

Process: -> The application layer sends HTTP request packets. -> The transport layer establishes TCP connections and divides the packets into packets. -> The network layer uses the requested IP address. After processing the request and adding the MAC address, the request is sent to the link layer -> the link layer sends the data to the requested IP address -> the server that requests the IP address splices the data according to IP,TCP, and HTTP protocols -> The server receives the request.

3. What is DNS?

DNS is a protocol at the application layer like HTTP. It is used to resolve domain names. DNS provides the service of searching for IP addresses by domain names or reverse searching for domain names by IP addresses.

4. The difference between URIs and urls

Uris are Uniform resource Identifiers (URIs), urls are uniform resource locators (URIs), and URLS are subsets of URIs.

URI format: Protocol name/scheme name + Login information (optional) + server address (WEB address or IP address) + port number (optional) + file path + Parameter (optional) + fragment identifier (optional, hash)

The difference between the two is that the URL determines the path of the file, while the URI uniquely identifies the file, but not necessarily the path of the file.

5. What are persistent connections?

If only one TCP connection is established for each request, the process takes too much time, resources, and efficiency. Therefore, a persistent connection is a TCP connection that is established with three handshakes and kept open until it is broken with four handshakes.

6. What is pipelining?

Because of persistent connections, you don’t have to wait for a request to respond before launching the next one; you can send multiple requests at the same time. It saves a lot of time.

7. The origin of cookies?

HTTP protocol is stateless and does not save all previous request packet information. Therefore, if a website needs to be logged in, how to keep the login state in the subsequent access process becomes a problem. The solution is to introduce cookie technology.

8. Knowledge about cookies

Cookies are for user identification and status management. In order to manage user status, web sites will temporarily write some data into users’ computers. When users visit the site, they can retrieve the cookies stored before.

1) Cookies are not cross-domain. Cookies under each domain name are saved separately and will not be mixed. Requests sent under a page with cookies under the current domain name.

2) Important attributes of cookies

Value: indicates the value of the cookie. Path: indicates the route. Secure: Indicates that the cookie is transmitted only over secure protocols such as HTTPS. When checked, Expires/ max-age cannot be obtained through JS to prevent XSS from attacking Expires/ max-age: cookie validity period. When it is a session, it will be cleared when the browser (not the page) is closed. If the value is expired, the browser automatically deletes it when the time is up.Copy the code

3) Once the server stores the cookie to the client through set-cookie, there is no way to delete it directly, but can only delete it through overwriting.

9. Status code

Status codes are classified into five types:

1XX: information status 2XX: success 3XX: redirection 4XX: client error 5XX: server errorCopy the code

Do not trust the status code completely, sometimes the returned status code is inconsistent with the actual situation!!

200: Normal handling301: Permanent redirection, the browser automatically sends a new request again302: temporary redirection, the browser automatically sends a new request again303And:302Similar, but means that the client should use the get method to get the new resource, and the browser automatically sends the new request again304: The status code is not met and has nothing to do with redirection. The server can use this status code to speed the client and use local cache403: The server rejected the request without giving a reason404: There is no requested resource on the server500: The server encountered an error during execution503: The server is overloaded or down for maintenanceCopy the code

10, a server with multiple domain names, IP address is the same, how to distinguish

The use of virtual host technology, can achieve a physical server, deploy multiple domain names, but the DNS service to resolve multiple domain names, resolved into the IP address is the same, that how to distinguish the request is which resource?

Use the host field in the request header to specify the host name or the URI for the domain name. Host is usually just a domain name to send, referer is the full URL, and Origin is sent across domains.

11. Proxy server

Proxy server: receives client requests and forwards them to other servers. A proxy server can be classified into cached and uncached proxies. Transparent and opaque proxies (data processing).

12. Header fields of common packets

Cache-control: controls the behavior of the cache connection: manages the connection, keep-alive persistent connection Cookie: set-cookie: sets the cookie Referer: original page of the current request cache-control: Controls the browser cache last-modified: the Last time it was updatedif-Modified-Since
Etag
If-None-Match
Copy the code

13. The difference between GET and POST

Get and POST are methods specified in the HTTP protocol to inform the server of the intent. The GET method is used to request resources that have been identified by the URI, and the POST method is used to transport entity bodies, but GET requests can also send entities, and POST requests can also take parameters to the URL. In essence, both are TCP connections. The difference is that GET sends packets once, and POST sends packets twice.

On the surface, the differences between GET and POST are as follows:

1. The size limit of the incoming parameter is not specified in the HTTP protocol, but is commonly known as the browser and server convention.

2. Regarding the security of transmitting parameters, the URL of the GET request is recorded in the log on the server, and the history record can also be found in the browser. However, the parameters of the POST request are in the body, which cannot be recorded in the log of the server and the history of the browser, so it is relatively safer.

3. Get requests can be cached, but POST requests cannot

Localstorage, sessionStorage, cookie

window.localStorage/sessionStorage.setItem("key"."value")
window.localStorage/sessionStorage.getItem("key")
window.localStorage/sessionStorage.removeItem("key")
Copy the code

15, HTTP shortcomings, HTTPS how to ensure security?

1. Communication is not encrypted and may be eavesdropped

2, do not verify the identity of the communication party, there may be camouflage

3. The integrity of the packet cannot be proved. May be tampered with

SSL and TLS protocols address these shortcomings and make HTTP more secure

The combination of SSL/TLS and HTTP is HTTPS. HTTPS is really HTTP wrapped around SSL/TLS.

HTTPS does not use asymmetric encryption to transmit data. Instead, it uses a handshake to communicate with the server, generate a private key, and use the key to transmit data symmetrically. And verify that the certificate is correct. Certificate verification ensures that the other party is legitimate, and middlemen cannot attack by forging certificates. Compared with HTTP, HTTPS has lower performance, because SSL/TLS has several handshakes and encryption and decryption operations, but encryption and decryption operations can be accelerated by special hardware.

Three handshakes, four waves

What is the three-way handshake

When establishing a TCP connection, the client and server need to send packets for three times to establish a TCP connection. Client -> Server -> Client.

2. Why three handshakes instead of two

If the following scenario occurs: The client sent the first connection request due to some reason stranded in the network node delay, until the connection release at some point in time to reach the server, it is a failure message already, but right now the server still think this is the first time the client to establish connection request to shake hands, then the server responded to the client, if only two shake hands at this time, The connection is made, which is definitely a waste of resources.

Why not four or more handshakes? Since every handshake is time consuming and resource consuming, four or more handshakes are certainly acceptable, but considering the cost, three is enough.

He waved his hand four times

To disconnect a TCP connection, the client or server can initiate a wave motion. The initiator sends a packet, indicating that I do not send data anymore, but I can still receive data. The receiver may also have a message to continue sending. Therefore, the receiver sends two packets to the initiator. One packet contains data, and the other packet disables TCP.

Why not once, twice, three times, five times? The principle is similar to the three-way handshake.

17, from the input url to display the web page, through what steps

1. Query the IP address corresponding to the domain name in the browser2. After obtaining the IP address, set up the TCP connection between the client and server. Check whether the CONNECTION is HTTPS. The HTTP protocol is a three-way handshake, while the HTTPS protocol includes the SSL handshake.

3. After the TCP connection is established, HTTP requests are sent.

4. After processing, the server returns an HTTP response.

5. When the HTTP response is complete, TCP is not disconnected. In HTTP/1.1, Connection: keep-alive is enabled by default, indicating a persistent Connection. In reverse proxy software Nginx, the default value of the persistent connection timeout is 75 seconds. If no new request arrives within 75 seconds, the client is disconnected. In addition, the browser sends a TCP keep-alive detection packet to the server every 45 seconds to check the TCP connection status. If no ACK response is received, the browser disconnects from the server.

6. Disconnect the TCP connection and wave four times.

7. The browser renders the page

Second, the cache

1. Packet headers related to the cache

Expires: Indicates the expiration time returned by the server. The time is a standard time in the GMT format, for example, Fri.01 Jan 1990 00:00:00GMT. Cache-control: There are many attributes, and each attribute has a different meaning. Max-age =t: cache content will expire in t seconds no-cache: negotiated cache is used to verify cached data no-store: all content will not be cached. Cache-contro precedence Expires last-Modified: The server tells the browser when the resource was Last Modified.if- modified-since: indicates whether the browser has Modified Since XXX time.if- unmodified-since: indicates whether the browser has not modified Since XXX time. Etag: Resource identifier, told by the server to the browser. If-none-match: indicates the cache resource identifier, which is told by the browser to the server. Etag and if-none-match function:Copy the code

2. Functions and differences between Last-Modified and Etag

The if-none-match request header carries the value of the Etag previously returned by the server. When the server receives the second request, it finds the if-none-match field, and recalculates the Etag of the resource. If the two Match, it considers that the resource has not changed, and directly gives the corresponding 304 to the client, allowing the client to read the data in the cache

Last-modified came first, but a problem was found in the process of using: sometimes, the resource was updated, but the Last update time was not changed, so the client could not get the latest data. So Etag was invented to directly determine if the file had changed.

3. The process of caching

1. The browser requests A.js.

2. The server returns A. Expires and tells the browser about the absolute and relative Expires time (cache-Control: max-age=10), the Last Modified time of A. last-Modified, and the Etag of A. JS.

Cache-contro has a higher priority than Expires. Within 10 seconds the browser requests A.js again, using the local cache instead of the server.

4. After 11 seconds, the browser requests the server again with if-modified-since and if-none-match.

5, the server receives the browser If – Modified – Since and If – None – Match, found that the If – None – Match, is more the If – None – Match and a. s Etag value, ignoring the If – Modified – Since comparisons.

If the contents of a.js file do not change, then Etag and if-none-match are the same, and the server tells the browser to continue using local cache (304). And so on.

Third, cyber security

1. Common front-end attacks XSS and CSRF

The principle of XSS attacks is that attackers inject some code to perform some malicious operations.

For example, when the user enters some HTML code into the input box, the page itself is mixed with the HTML code entered by the user when the page is displayed, causing the browser to execute the malicious code entered by the user.

1. All user input is not secure

2. All displays of user input are insecure

Do not use eval in JS

4. Don’t use innerHTML

The principle of CSRF attack is that an attacker constructs a request address of a function interface in the background of a website and induces users to click or use special methods to automatically load the request address. After receiving a request from a user in the login state, the request is mistaken as a valid operation.

Fourth, cross-domain

1. What is cross-domain

Front-end cross-domain usually refers to the narrow cross-domain, which refers to a restricted access scenario caused by the same origin policy of the browser.

What is the same-origin policy

Browsers for security purposes (to prevent attacks such as XSS), browsers restrict cross-source HTTP requests from within scripts. Cross-source means different protocols, domain names (even different subdomains), and ports.

Browsers do not reject all cross-domain requests. They generally allow cross-domain write operations and resource embedding operations such as links, redirects, IMG, CSS, and script tags.

Cross-domain operations initiated by scripts are not allowed, such as Ajax or FETCH requests, and browsers restrict the reading of cookies and LocalStorage from different sources. DOM and JS objects from different sources are also unavailable.

3. Why do we need cross-domain requirements

After engineering servitization, services with different responsibilities are scattered in different projects. Usually, the domain names of these projects are different. However, a requirement may correspond to multiple services, and then interfaces of different services need to be invoked, resulting in cross-domain.

4. Common solutions across domains

1, the json

At the heart of JSONP is the ability to request resources from different sources using script tags. And we can return the resource we want in the form of JS code, that is, return a piece of JS code, is to call our local function, the data we want to pass through the argument, so we can get the data in the local function. Jsonp supports only GET requests

Simple implementation of JSONP in JQ

function jsonp({url,jsonp,data,success,error}){
    var _script = document.createElement('script')
    var head = document.getElementsByTagName('head') [0]
    window[jsonp]=function(arg){
    head.removeChild(script)
        if(arg.isSuccess==true){
            success(arg)
        }else{
            error(arg)
        }
        window[jsonp]=null
    }
    _script.src=url+format(data)
    head.appendChild(script) 
    
    function format(params){
        let arr = []
        for(let item in params){
            arr.push(`${item}=params[item]`)}return arr.join('&')}}Copy the code
2, CORS

Cross-origin Resource Sharing (CORS) is a cross-domain technology that uses additional HTTP headers to tell browsers to allow Web applications to access specified resources from different source servers.

So, the same origin policy is the browser’s limit, and CORS technology is through some HTTP header fields, browsers allow cross-domain!

There are two types of cross-domain requests, and browsers handle them differently:

1) Simple request that meets all the following conditions:

– The request method is one of the following three methods: GET, HEAD, and POST

– The content-type of the request header is one of text/plain, multipart/form-data, Application/X-www-form-urlencoded

For cross-domain or POST requests, the request header contains the Origin field, which is used to indicate the source page of the request. The difference with referer is that it has no path, only protocol, domain name, and port. After receiving a simple request, the server checks the Origin field in the request header. If the server determines that the request is accessible, it adds the Access-Control-Allow-Origin field to the response header. When the value is * or the same as Origin, it indicates that resources in the outer domain can be accessed. The browser displays the response packet. Otherwise an error will pop up.

2) Complex request: satisfy any of the following conditions:

– Use any of the following HTTP methods: PUT, DELETE, CONNECT, OPTIONS, TRACE, PATCH

– Content-type values do not belong to one of the following: Application/X-www-form-urlencoded, multipart/form-data, text/plain

– A header field other than the CORS safe header field set is artificially set, that is, the header field is outside the following range. This set is Accept, accept-language, Content-language, Content-Type (extra restrictions to note), DPR, Downlink, save-data, viewport-width, Width

When the browser detects that the cross-domain request is complex, it will automatically send a precheck request. The request method is option method, and the request header contains two fields:

Origin:http://foo.example<! Access-control-request-method :POST <! -- Tell the server what custom fields will be carried in the Headers of the actual Request --> Access-Control-request-headers: x-pingother, content-typeCopy the code

The server decides whether to allow the request. If the request is allowed, the following fields are returned in the response header:

// Allow access from http://foo.example
Access-Control-Allow-Origin: http://foo.example<! Access-control-allow-methods: POST, GET, OPTIONS <! --> Access-Control-allow-headers: x-pingother, content-type <! -- The valid time of the response is86400In seconds, the browser does not need to make a precheck request for the same request again during the valid time --> access-Control-max-age:86400
Copy the code

3) About cookies when they cross domains

Set the withCredentials flag of XMLHttpRequest to true to send Cookies to the server. However, the server response header must contain access-Control-allow-credentials: Access-control-allow-origin cannot be set to an asterisk if a Cookie is to be sent, and must specify a specific domain name that corresponds to the requested web page.

Summary of HTTP fields across domains:

Request header field:

Origin: < Origin >// Indicates the source of the precheck request or actual request.Access - Control - Request - Method: < Method >// Tell the server the HTTP method used for the actual request.Access - Control - Request - Headers: < field name - >, < field - name > *// Tell the server the header field carried by the actual request.
Copy the code

Response header field:

Access-Control-Allow-Origin: <origin> | *
Access-Control-Expose-Headers: X-My-Custom-Header, X-Another-Custom-Header
Access-Control-Max-Age: <delta-seconds>
Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: <method>[, <method>]*
Access-Control-Allow-Headers: <field-name>[, <field-name>]*
Copy the code

Five, the CDN

1. Fundamentals

1. When the user clicks the content URL on the website page, the local DNS system resolves the domain name, and the DNS system finally gives the resolution right to the CDN dedicated DNS server pointed to by CNAME.

2. The DNS server of the CDN returns the GLOBAL load balancing device IP address of the CDN to the user.

3. The user sends a content URL access request to the global load balancer of the CDN.

4. The CDN global LOAD balancing device selects a regional load balancing device of the region to which the user belongs based on the USER IP address and the URL of the requested content and tells the user to send a request to this device.

5. The LLB selects an appropriate cache server to provide services for users. The selection criteria are as follows: Determine which server is closest to the user based on the user’S IP address. According to the content name carried in the URL requested by the user, determine which server has the content required by the user; Query the current load of each server and determine which server has service capability. Based on the above analysis, the LAN load balancer returns the IP address of a cache server to the global load balancer.

6. The global load balancer returns the IP address of the server to the user.

7. The user initiates a request to the cache server, and the cache server responds to the user’s request and transmits the content required by the user to the user terminal. If the cache server does not have the content the user wants, and the zone balancer still allocates it to the user, the server requests the content from its upper-level cache server until the source server that traces it back to the web site pulls it locally.

2, summarize

1. The essence of CDN is cache, while the Internet spirit supporting it in the kernel is sharing

2. The essence of CDN is to cache media resources, dynamic and static pictures (Flash), HTML, CSS, JS and other contents to IDC closer to you, so that users can share resources and reduce the response time between sites. The essence of online game accelerator is to establish a high bandwidth computer room. Set up multi-node servers to speed things up for users.

Sixth, other

1. Number of concurrent browser requests

Browsers limit the number of concurrent requests for the same domain name to about six. The reasons are as follows:

On the front end: The browser makes multiple connections at the same time, forcing the browser to open several threads, which is sometimes not a lightweight resource because a context switch is expensive.

From the back end: even if the browser gives up protecting itself and sends all requests to the server at once, it is likely to cause the server to BAN the concurrency threshold

2. Why are the domain names of static resources different from those of general and data interfaces

From the perspective of cookies, if static resources and interface domain name are the same, each request for static resources will bring cookies, wasting performance. Therefore, a new domain name is generally used.