I have recently read Illustrated HTTP, which is a good introduction to HTTP. Some scattered HTTP knowledge in my mind before, through reading this book can have a more systematic understanding, I recommend you to read it, if time is not enough, you can also read this article. Here is the text:
First, Http is a subset of the TCP/IP protocol family.
TCP/IP protocol family
Layered TCP/IP protocol
Since Http is a subset of the TCP/IP protocol family, let’s take a look at it. The TCP/IP protocol family, a collection of protocols associated with the Internet, is called TCP/IP. TCP/IP is layered, from top to bottom:
- Application layer: Determines the activities of communication when providing application services to users. FTP, DNS, and HTTP are all located in this layer.
- Transport layer: The transfer of data between two computers in a network link.
- Network layer: Plan the transmission path of packets (the smallest data flow in the network).
- Data link layer: Hardware part of network connection, such as network cable, network adapter, and optical fiber.
Layering allows the parts of network communication to be uncoupled, with each layer focusing on its own business.
When the TCP/IP protocol family is used for network communication, the communication with the peer party is hierarchical and sequential. The sender goes down from the application layer, and the receiver goes up from the application layer.For example 🌰 : Xiao Ming enters a URL in the web page and presses enter to initiate a client to request web page data to the serverThe application layerThe HTTP request. Then, for easy transmission, inThe transport layer(TCP) Divides the data (HTTP request packets) received from the application layer, marks the serial number and port number on each packet, and forwards the packets to the network layer. inThe network layer(IP protocol), add a MAC address (communication destination, a complete network communication link will have multiple MAC relay stations) and then forward toData link layer; The server on the receiving end obtains the request data from the data link layer and transmits it from bottom to top until it reaches the application layer. Then the server receives the communication.
When transmitting data from layer to layer, the sender must print the header information of the layer every time it passes through the layer. The receiver, on the other hand, cancels out the corresponding headers as it passes from layer to layer.
Protocols in network transport
The IP protocol responsible for transport
The INternet Protocol (IP) is used to transmit various data packets to each other at the network layer. The IP Address and Media Access Control Address (MAC) Address are two important factors to ensure that protocols are transmitted to the peer party. The IP address indicates the IP address assigned to the node, and the MAC address indicates the fixed IP address of the nic.
The communication between IP addresses depends on MAC addresses. On the network, it is rare for two parties to communicate in the same LAN, usually through multiple computers and network equipment to connect to each other. During the transfer, the MAC address of the next transfer device is used to search for the next transfer destination. In this case, Address Resolution Protocol (ARP) is used. ARP is a protocol used to resolve addresses. The MAC address can be traced based on the IP address of the communication party.Copy the code
Therefore, the network layer will map out the transmission path of packets.
TCP to ensure reliability
TCP is located inThe transport layerTo provide reliable byte stream service. TCP uses the three-way handshake to deliver data to the destination accurately. After sending a packet using TCP, TCP does not ignore the situation after transmission. It must confirm whether the packet was successfully delivered to the other party.
DNS protocol responsible for domain name resolution
The Domain Name System (DNS) service is a protocol at the application layer like HTTP. It provides domain name to IP address resolution service. Users usually use host names or domain names to access each other’s computers, rather than IP addresses, because domain names are better suited to human memory habits.
The role of three protocols in the process
The URI and URL
Uniform Resource Identifier (URI) IDENTIFIES an Internet Resource with a string. Uniform Resource Locator (URL) indicates the location of resources (on the Internet). URL is a subset of URIs.
Simple HTTP protocol
The Http protocol is used for communication between the client and the server
The end that requests access to resources such as text or images is called the client, and the end that provides the resource response is called the server. When two computers communicate using HTTP protocol, one end of a communication line must be the client and the other end is the server. According to the HTTP protocol, a request is made from the client, and the server responds to the request and returns.
Http is a protocol that does not save state
HTTP is a stateless protocol that does not save state. The HTTP protocol does not store the state of communication between requests and responses.
Locate the resource by requesting the URI
HTTP uses URIs to locate resources on the Internet. Resources can be accessed anywhere on the Internet.
Inform the server of the intent of the request through the request method (GET/POST)
- The GET method is used to request access to a resource identified by a URI.
- The POST method is used to transfer the body of the entity. Because HTTP/1.1’s PUT method does not have an authentication mechanism of its own, anyone can upload a file and there are security issues, so it is not used by the average Web site. Suitable for sending information to the server.
- The HEAD method is the same as the GET method, except that it does not return the body part of the packet. Used to verify the validity of the URI and the date and time of resource updates.
Persistent connections
The characteristic of a persistent connection is that the TCP connection remains as long as neither end explicitly disconnects. This reduces the overhead caused by the repeated establishment and disconnection of TCP connections and reduces the load on the server. In addition, the time spent reducing overhead allows HTTP requests and responses to end earlier, which increases the speed of Web page display.
Use cookies to save request status
HTTP is a stateless protocol that does not manage the status of previous requests and responses. That is, the request cannot be processed based on the previous state. Cookie technology controls client status by writing Cookie information in request and response packets. More on cookies, short and clear tutorial
The Http message
The information used for HTTP interaction is called HTTP packets. HTTP packets sent by the requesting end (client) are called request packets, and those sent by the responding end (server) are called response packets. The HTTP message itself is a string of text composed of multiple lines of data.
The request message
The request packet consists of 1. Request method 2. Request URL 3. Optional request headers (including common headers — common headers for request and response, request headers, entity headers) 5. Composed of content entities.
The response message
The response packet consists of 1. Protocol version 2. Status code (numeric code indicating the success or failure of the request) 3. Reason phrases used to explain status codes 4. Optional response head field 5. Entity body composition
The Http status code
2xx indicates that the request is processed normally
- 200 indicates that the request from the client is processed on the server
- 204 indicates that the request from the client is processed on the server
- 206 indicates that the client made a scope request and the server successfully executed that part of the GET request. The response message contains the entity Content in the Range specified by content-range.
3 xx redirection
- 301 permanent redirect. Indicates that the requested resource has been assigned a new URI, and subsequent requests will request the latest URI
- 302 Temporary redirect. Indicates that the requested resource has been assigned a new URI and the user is expected to access it using the new URI. 302 It is forbidden to convert a POST to A GET, but in practice it is usually used to convert a post to a GET.
- 303 temporary redirect and should be accessed with GET.
- 304 Enable cache. Although 304 is in 3XX, it has nothing to do with redirection. Caching is enabled when a request is made with conditions (such as if-match, if-modified-since, if-none-match, if-range, if-unmodified-since) that are not met.
- 307 temporary redirection does not change from POST to GET.
4XX Client error
- 400 indicates that syntax errors exist in the request packet
- 401 indicates that login authentication is not performed
- 403 indicates that the login is authenticated but not authorized
- 404 indicates that no resource was found
5XX Server error occurs
- 501 Server bug
- 503 Indicates that the server is overloaded or shut down for maintenance and cannot process requests temporarily.
Inconsistency between status code and condition Many status code responses returned are incorrect, but the user may not be aware of this. It is not uncommon for the status code to return 200 OK after an error occurs within a Web application.
A Web server that works with Http
A single virtual host can achieve multiple domain names
Single virtual host can host multiple domain names, we use DNS domain name resolution, the actual access is IP address, single host IP address is the same.
In the same IP address, because the virtual Host can Host multiple different Host name and domain name of the Web site, so when sending HTTP requests, must be complete in the Host header refers to the Host name or domain name URI.
Communication data forwarding program
The agent
A proxy is a forwarding application that acts as a “middleman” between the server and the client, receiving requests sent by the client and forwarding them to the server, and receiving responses returned by the server and forwarding them to the client.
The proxy does not change the request URI and directly sends the request to the source server that holds the resource ahead. Multiple proxy servers can be cascades. When forwarding, you need to attach a Via header field to mark the passing host information. The purposes of using a proxy server are:
- Use caching proxy servers to reduce network bandwidth traffic (reduce requests). When a Proxy forwards a response, the Caching Proxy stores a copy of the resource in advance on the Proxy server. When the proxy receives a second request for the same resource, it can not fetch the resource from the source server, but return the previously cached resource as a response. Caches have an expiration date. After expiration, resources are fetched from the source server again.
- You can control access to a specific URI by setting up a proxy server within your organization.
The gateway
A gateway is a server that forwards communication data from other servers, and when it receives a request from a client, it processes the request as if it were a source server with its own resources. Sometimes the client may not even realize that its communication target is a gateway. The gateway enables the server on the communication line to convert HTTP requests to other protocols. Using gateways improves communication security because the communication line between the client and gateway can be encrypted to secure the connection. For example, a gateway can connect to a database and query data using SQL statements.
HTTP+ Encryption + Authentication + Integrity Protection =HTTPS
- HTTPS is often used for Web login and shopping cart billing.
- When using HTTPS, the URL is marked with a small lock (this varies by browser)
- Typically, HTTP communicates directly with TCP. When SSL is used, it communicates with SSL first, and then WITH TCP. In short, HTTPS is HTTP in an SSL shell. With SSL, HTTP has the encryption, certificate, and integrity protection features of HTTPS.
- HTTPS is 2-100 times slower than HTTP. One is slow communication; In addition, CPU and memory resources are consumed, slowing down the processing speed.
Why not use HTTPS all the time
If HTTPS is so secure, why don’t all Web sites use HTTPS all the time? One reason is that encrypted communication consumes more CPU and memory resources than plain text communication. If every communication is encrypted, it consumes a considerable amount of resources, and the number of requests that can be processed on a single computer is bound to decrease.
Web attack
Take the initiative to attack
An active attack refers to an attack mode in which an attacker directly accesses Web applications and sends attack codes. The attacker must be able to access Web server resources. For example, SQL injection attacks and OS command injection attacks.
SQL Injection is an attack on the database used by a Web application by running illegal SQL. This security hazard may cause great threats, sometimes directly lead to the disclosure of personal information and confidential information.
OS Command Injection attack means that an illegitimate operating system Command is executed to attack a Web application. Wherever Shell functions can be called, there is a risk of attack.
Passive aggression
Passive attack refers to using a full set of strategies to induce users to trigger traps instead of attacking Web applications directly. Examples include cross-site Scripting XSS and Site request forgery.Using passive attack, you can launch attacks on enterprise networks that cannot be accessed directly from the Internet. As long as the user steps into the trap set by the attacker, even the Intranet of the enterprise will be attacked within the network range that can be accessed by the user. XSSXSS attacks are passive attacks triggered by the attacker using pre-set traps. Therefore, the attacker will set traps for the attack in advance.
- Use false input forms to defraud users of personal information
- An attack to steal user cookies
CSRF cross-site Request Forgeries (CSRF cross-site Request Forgeries) attacks are passive attacks in which attackers force unexpected status updates such as personal information or setting information of users who have completed authentication through traps.