Definition: HyperText Transfer Protoco. HyperText is marked text.
The HTTP protocol defines how a Web client requests a Web page from a Web server and how the server delivers the Web page to the client. The HTTP protocol uses a request/response model. The client sends a request packet to the server, which contains the request method, URL, protocol version, request header, and request data. The server responds with a status line containing the protocol version, success or error code, server information, response headers, and response data.
HTTP is a stateless protocol, that is, stateless protocol. The protocol itself does not save the communication status of the request or response directly, so the connected parties cannot know the current identity and status of the other party. This is also one of the important reasons for cookie technology. For the state management of the client, the browser will automatically keep cookies according to the set-cookie header field information in the response packet sent from the server, and the client will carry cookies in the packet every time it sends an HTTP request. Identifies the identity status of the failed client on the server.
What happens when you go from URL input to page presentation?
DNS
Resolution: Resolves the domain name intoIP
Address;TCP
Connection:TCP
Three handshakes;- send
HTTP
request - The server processes the request and returns it
HTTP
message - The browser parses the rendered page
- Disconnect:
TCP
Four times to wave
URI
The HTTP protocol uses URIs to locate resources on the Internet:
URI
(Universal Resource Identifier)URL
(Universal Resource Locator)URN
(Universal Resource Name)
Common status codes:
1XX: The received request is being processed (informational status code);
2xx: The request is successfully processed (success status code).
3XX: Additional action is required to complete the request (redirection status code);
4xx: The server cannot process the request (client error status code);
5xx: The server fails to process the request (server error status code).
304: indicates that the resource is cached by the browser and does not need to request the server again.
400: for parameter verification, one parameter is missing or the parameter type is wrong.
401: Insufficient permissions;
403: Resource access is prohibited (this error occurs if your IP is blacklisted)
502: The back-end service is down or under too much pressure;
Request method:
GET
: Generally used to obtain server resources due to browser orweb
The server toURL
The length is limited, soget
Requests have size limits, the maximum length of which varies from browser to browser or Web server;POST
: generally used to transmit entity bodies;PUT
: Used to transfer files.DELETE
: Used to delete files.HEAD
: is used to obtain the packet head without returning the packet body.OPTIONS
: method used to ask for URI resource support;
TCP three-way handshake
- The client sends with
SYN
Flag packet – one handshake – server - The server sends a message with
SYN/ACK
Flag packet – second handshake – client - The client sends with
ACK
Flag packet – three-way handshake – server
TCP waved four times
- Sending a packet to the passive party.
Fin
,Ack
,Seq
, indicating that no data is being transmitted. And into theFIN_WAIT_1
state - The passive sends a packet.
Ack, Seq
To close the request. At this point, the host initiator entersFIN_WAIT_2
state - The passive sends a packet segment to the initiator.
Fin
,Ack
,Seq
Request to close the connection. And into theLAST_ACK
state - Sends a packet segment to the passive party.
Ack
,Seq
. And then go to waitTIME_WAIT
State. The passive party closes the connection after receiving the packet segment from the initiator. If the initiator waits for a certain period of time and does not receive a reply, the system shuts down normally.
HTTP cache
Mandatory cache:
- If the requested data is already in the cache database, the client directly obtains the requested data from the cache database. If the requested data is not in the cache database, the client obtains the requested data from the server.
- server-responsive
header
To indicate that:Expires
andCache-Control
; Expires
Is the data expiration time returned by the server. If the request time for a second request is shorter than the return time, the cached data is directly used. Pitfalls: Since server and client time may have errors, this will also lead to errors in cache hits, on the other handExpires
isHTTP1.0
Product, so now mostly usedCache-Control
Alternative;Cache-Control
There are many attributes, and different attributes can mean different things.
private
: The client can cache;public
: Both client and proxy servers can cache;max-age=t
: Cache contents will expire in t seconds;no-cache
: A negotiated cache is needed to validate cached data;no-store
: All data is not cached;
Negotiation cache:
- The client will first get an id of the cache data from the cache database, and then request the server to verify whether the id is invalid. If the id is not invalid, the server will return 304. At this time, the client will directly get the requested data from the cache.
- The header field used by the negotiation cache:
Last-Modified
andEtag
Last-Modified
: When the server responds to a request, it tells the browser the last modification time of the resource, which the browser will carry when making a request againif-Modified-Since
The server compares the request header with the last modification time of the requested resource. If the request is consistent, 304 and the response header are returned. The browser only needs to retrieve the information from the cache.defects: 1. Can only be timed in seconds. If the time can only be modified in one second, the cache will still be hit. 2. The last modification time may be updated even if the content is not changed, resulting in failure to hit the cache;Etag
: When the server responds to a request, this field tells the browser the unique identifier of the current resource generated on the server. This identifier is the unique identifier of the file content. This identifier is updated only when the file content changes.defects:Etag
Is calculated using algorithms, which take up server computing resources. All server resources are precious, so they are rarely usedEtag
.
If the two caching mechanisms exist at the same time, the strong cache takes precedence over the negotiated cache. If the strong cache hits, the data in the cache database is directly used, and the negotiated cache is not used.
HTTP and HTTPS
HTTPS
The agreement needs to apply for a certificate from CA (certificate authority). Generally, there are few free certificates, which need to pay fees.HTTP
The protocol runs onTCP
Above, all transmitted content is clear text,HTTPS
Running on theSSL/TLS
Above,SSL/TLS
Running on theTCP
Above, all transmitted content is encrypted;HTTP
andHTTPS
Using a completely different connection, using a different port number, the former 80, the latter 443;HTTP
Is simple, is stateless,HTTPS
Agreement is madeHTTP+SSL
Protocol construction can be encrypted transmission, identity authentication network protocol, can effectively prevent operators hijacking, solve a big problem of anti-hijacking, thanHTTP
Protocol security;
Web Security defense:
XSS
: cross-site scripting attacks are usedHTML
Can execute,<script>alert('a')</script>
Feature, try to inject the script into the page in the attack means.XSS
There are two kinds of attacks:
- The script is injected into the page by modifying the browser URL, which is attacked by Chrome field defense.
- The script code is injected into the database through the input box. Manual defense is required. It is recommended to use the whitelist filtering defense method of the ‘XSS’ library.
CSRF
: cross-site request forgery, fromweb
Implicit authentication mechanism,web
Implicit authentication can guarantee that a request is made from a user’s browser, but it cannot guarantee that the request is approved by the user.CSRF
The attack is usually resolved by the server, and the most common is the request carrying authentication information, such as a verification code ortoken
.- Clickjacking: A visual deception attack in which the attacker will need the site to pass
iframe
Embed in their own web pages, and williframe
Set it to transparent, and then induce the user to operate on the page. Defense method: back-end big guy solution, use oneHTTP
Response to the head –X-Frame-options
, which has three optional values:
deny
: This page is not allowed inframe
In, nesting is not allowed even in pages of the same domain name;sameorigin
: The page can be in the same domain pageframe
In the display;allow-from uri
: The page can be displayed at the specified sourceframe
In the display;
- Man-in-the-middle attack: An attacker establishes a connection with both a server and a client and makes the other party think the connection is secure, but in fact the attacker controls the entire communication process. Attackers can not only gain access to communications between the two sides, but also modify them. The essence of man-in-the-middle attack is authentication and trust between client and server. Defense: Symmetric encryption, asymmetric encryption, and hybrid encryption are not effective in preventing man-in-the-middle attacks, because a man-in-the-middle can intercept the first transfer of a secret key without the client or server knowing about it.
HTTPS
As the ultimate means of preventing man-in-the-middle attack, the certificate mechanism solves the trust problem between client and server, thus effectively preventing man-in-the-middle attack.