This is my first blog post in the industry, mainly based on HTTP. The reason for writing a blog is actually to record my growth in thinking and solving problems. A long time later, when you look back at your old writing, you will notice the small changes and improvements. Blogging is like planting a seed, as long as you remember to water, fertilize, weed and loosen the soil, there will be a harvest in the fall, keep updating ๐Ÿ˜‹

1. Understand Web and network basics

1.1 Layered TCP/IP Management

An important aspect of the TCP/IP protocol family is layering. The TCP/IP protocol family is divided into four layers: application layer, transport layer, network layer, and data link layer. Commonly used networks, including the Internet, operate on the basis of the TCP/IP protocol family. HTTP is a subset of that.

  1. Application layer: The TCP/IP protocol family stores various common application services. For example, FTP and DNS services are two of them. The HTTP protocol is also in this layer.
  2. Transport layer: Transport layer to upper application layer, providing data transfer between two computers in a network connection. There are two different protocols at the transport layer: TCP and UDP.
  3. Network layer (Network interconnection layer) : The network layer is used to handle the packets that flow over the network. A packet is the smallest unit of data transmitted over a network. This layer defines the transmission route through which to reach the other party’s calculator, and the packet to the other party; The role of the network layer is to select a transmission route among a number of options when communicating with the other computer through multiple computers or network devices.
  4. Link layer (network interface layer) : Handles the hardware part that connects to the network.

When the TCP/IP protocol family is used for network communication, the communication with the peer party is hierarchical and sequential. The sender goes down from the application layer, and the receiver goes up from the application layer.

Taking HTTP as an example, a complete communication process:

  1. First, the sending client makes an HTTP request at the application layer (HTTP protocol) to view a Web page.
  2. To facilitate transmission, the transport layer (TCP) divides the data (HTTP request packets) received from the application layer, marks the serial number and port number on each packet, and forwards the packets to the network layer.
  3. At the network layer (IP protocol), add the MAC address as the communication destination and forward the MAC address to the link layer. This way, the communication request to the network is ready.
  4. The server at the receiving end receives data at the link layer and sends the data to the upper layer in sequence, all the way to the application layer. HTTP requests sent by clients are received only when they are transmitted to the application layer.

When transmitting data from layer to layer, the sender must print the header information of the layer every time it passes through the layer. The receiver, on the other hand, cancels out the corresponding headers as it passes from layer to layer.

This practice of wrapping data information is called encapsulate.

1.2 Three handshakes

TCP uses a three-way handshaking strategy to deliver data accurately to the destination. After sending a packet using TCP, TCP does not ignore the situation after transmission. It must confirm whether the packet was successfully delivered to the other party. The handshake uses the TCP flag SYN(Synchronize) and ACK (Acknowledgement).

Note: If the handshake is interrupted at some stage, the TCP protocol sends the same packets in the same order again.

  1. The sender first sends a packet with the SYN flag to the peer.
  2. After receiving the packet, the receiving end sends a packet with the SYN/ACK flag to confirm the packet.
  3. Finally, the sender sends back a packet with an ACK flag, indicating the end of the handshake.

1.3 DNS Service for Domain name Resolution

The Domain Name System (DNS) service is a protocol at the application layer like HTTP. It provides domain name to IP address resolution service.

1.4 Relationship between various protocols and HTTP

The following figure shows the roles of IP, TCP, and DNS in HTTP communication.

1.5 the URI and URL

Absolute URI format:

Use a protocol scheme such as HTTP: or HTTPS: to access resources. Case insensitive, followed by a colon (:). You can also specify the schema name of the data or script using data: or javascript:.

  1. Login information (authentication) : Specifies the user name and password for obtaining resources from the server. This parameter is optional.
  2. Server address: To use the absolute URI, you must specify the address of the server to be accessed. The address can be similarhackr.jpThis DNS resolvable name, or192.168.1.1This type of IPv4 address can also be[0:0:0:0:0:0:0:1] The IPv6 address name enclosed in square brackets.
  3. Server port number: specifies the network port number to which the server is connected. If omitted, the default port number is automatically used. This option is also optional.
  4. Hierarchical file path: Specifies the file path on the server to locate the specified resource. This is similar to the document directory structure on UNIX systems.
  5. Query string: You can use the query string to pass in any parameter for a resource within a specified file path. This item is optional.
  6. Fragment identifiers: Fragment identifiers are often used to mark a child resource (a location within a document) within an acquired resource. However, the RFC does not specify its use method. This item is also optional.

2. Simple HTTP protocol

2.1 Communication is achieved through the exchange of requests and responses

According to the HTTP protocol, a request is made from the client, and the server responds to the request and returns. In other words, the communication must start with the client, and the server will not send a response until the request is received. This is a request to access the /index.htm page resource on an HTTP server.

The GET at the beginning of the opening line indicates the type of server being requested, called a method. The subsequent string /index.htm identifies the resource object requested, also known as the request URI(request-URI). The final HTTP/1.1, or HTTP version number, is used to indicate the HTTP protocol functionality used by clients.

A request message consists of the request method, request URI, protocol version, optional request header fields, and content entities.

The response message basically consists of the protocol version, the status code, the reason phrase used to explain the status code, the optional response header field, and the entity body.

2.2 HTTP is a protocol that does not save state

HTTP is a stateless protocol. The HTTP protocol itself does not store the state of communication between requests and responses. That is, at the HTTP level, the protocol does not persist requests or responses that have been sent. Although HTTP/1.1 is a stateless protocol, Cookie technology was introduced in order to achieve the desired state retention function. With cookies and HTTP communication, you can manage state.

2.3 HTTP method to inform the server of intent

  1. GET Gets resources: The GET method is used to request access to resources identified by the URI. The specified resource is parsed by the server and the response content is returned. That is, if the requested resource is text, return it as is.

  2. POST Transfers entity body: The POST method is used to transfer entity body. Although the body of an entity can be transferred using the GET method, it is usually transferred using the POST method instead of the GET method. While the function of POST is similar to GET, the primary purpose of POST is not to GET the body of the response.

  3. PUT Transfers files: The PUT method is used to transfer files. Similar to file uploading over FTP, the file content must be contained in the body of the request packet and saved to the location specified by the request URI.

  4. HEAD retrieves the packet HEAD: The HEAD method is the same as the GET method, but does not return the body of the packet. Used to verify the validity of the URI and the date and time of resource updates.

  5. DELETE Deletes a file: The DELETE method is used to DELETE a file. It is the opposite of PUT. The DELETE method deletes the specified resource based on the request URI.

  6. OPTIONS query supported methods: The OPTIONS method is used to query supported methods for the resource specified by the request URI.

  7. TRACE TRACE: The TRACE method allows the Web server to loop back previous request traffic to the client. At the time of sending the request, the forward field of max-forwards is filled with a value. After passing through each end, the value is reduced by one. When the value reaches zero, the transmission is stopped. The client can TRACE how the outgoing request was modified or tampered with. This is because requests that want to connect to the source target server may be routed through a proxy, and the TRACE method is used to confirm the sequence of operations that took place during the connection.

  8. CONNECT requires a tunnel protocol to CONNECT to the proxy: The CONNECT method requires a tunnel to be established when communicating with the proxy server to implement TCP communication using the tunnel protocol. Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols are used to encrypt communication content and then transmit it through network tunnels.

2.4 Usage Of commands

Methods supported by HTTP/1.0 and HTTP/1.1, Tips: Method names are case sensitive and use uppercase letters.

2.5 Persistent Connection saves traffic

Keep-alive

Feature: As long as neither end explicitly disconnects, the TCP connection remains in the state.

advantages:This reduces the overhead caused by the repeated establishment and disconnection of TCP connections and reduces the load on the server side. In addition, the portion of time spent reducing overhead allows HTTP requests and responses to end earlierThe display speed of Web pages increases accordingly.

Pipelines (pipelining)

Features: Persistent connections make it possible for most requests to be piped. After sending the previous request, wait and receive the response before sending the next request.

Advantages: With the advent of pipelining, the next request can be sent directly without waiting for a response. This allows you to send multiple requests in parallel at the same time without having to wait for one response after another.

For example, when requesting an HTML Web page with 10 images, using persistent connections can end the request faster than connecting one by one. Pipework is faster than persistent connections. The more requests there are, the more significant the time difference becomes.

2.6 Using Cookies for state Management

HTTP is a stateless protocol that does not manage the status of previous requests and responses. That is, the request cannot be processed based on the previous state.

If the Web page that requires login authentication cannot manage the login status (the login status is not recorded), you need to log in to the Web page again each time or add parameters in each request packet to manage the login status.

The Cookie notifies the client to save the Cookie based on the set-cookie header field in the response packet sent from the server. When the client sends a request to the server next time, the client automatically adds the Cookie value to the request packet and sends the request packet.

3.HTTP information in HTTP packets

3.1 Encoding improves transmission rate

HTTP can directly transmit data as it is, but can also improve the transmission rate through encoding during transmission.

3.2 Content encoding of compressed transmission

When adding an attachment to a message to be sent, in order to reduce the size of the message, we ZIP the file first and then add the attachment to send. A feature of the HTTP protocol called content encoding can do something similar.

Content encoding specifies the encoding format to be applied to entity content and keeps entity information compressed as is. The encoded entity is received and decoded by the client.

Commonly used content encoding is as follows:

  • Gzip (GNU Zip)
  • Compress (Standard compression for UNIX systems)
  • Deflate (zlib)
  • Identity (no coding)

3.3 Split sending block transmission coding

Block transfer encoding: During HTTP communication, the browser cannot display the requested page until all the encoded entity resources are transferred. When transferring large amounts of data, the browser can gradually display the page by dividing the data into multiple pieces.

3.4 A collection of multi-part objects that send a variety of data

When multi-part object sets are used in HTTP packets, content-Type must be added to the header field.

Multipart /form-data: used when uploading a Web form file. The base method is POST, which is implemented by combining the POST methods. The difference from the POST method is the request header and the request body.

Multipart/Byteranges: Status code 206 (Partial Content) used when the response packet contains multiple ranges of Content.

3.5 Scope request for partial content

In the past, users could not access the Internet with today’s high-speed bandwidth, and downloading a larger picture or file was already a struggle. If you encounter a network interruption during the download, you have to start all over again. In order to solve the above problems, a recoverable mechanism is needed. Recovery refers to the ability to recover a download from a previous download break.

5001 to 10000 bytes: Range:bytes=5001-10000

Range: bytes=5001-

Range: bytes=-3000, 5000-7000

For a range request, a 206 Partial Content response message is returned. In addition, for multiple range requests, the response will return a response message after the content-Type field indicates multipart/ Byteranges.

If the server is unable to respond to the range request, the status code 200 OK and the complete entity content are returned.

3.6 Content Negotiation Returns the most appropriate content

Content negotiation: If the default language of the browser is English or Chinese and you access a Web page with the same URI, the English or Chinese version of the Web page is displayed. Content consultation is based on the language, character set, encoding method of the responding resource.

There are three types:

  1. Server-driven Negotiation: Content Negotiation is carried out by the Server. The header field of the request is used as a reference and is automatically processed on the server side. But for users, judging by what the browser sends is not always a good way to filter out the best content.
  2. Agent-driven Negotiation: A method of content Negotiation by clients. The user manually selects from the list of options displayed in the browser. You can also make this selection automatically on a Web page using JavaScript scripts. For example, according to the OS type or browser type, the PC version of the page or mobile version of the page.
  3. Transparent Negotiation is a combination of server-driven and client-driven. It is a method of content Negotiation by the server and client respectively.

4. Return the HTTP status code of the result

4.1 Category of status codes

4.4.1 2 xx success

The response from 2XX indicates that the request was processed normally.

200 OK: Indicates that the request from the client is processed on the server.

204 No Content: The status code indicates that the request received by the server is successfully processed, but the response packet returned does not contain the body of the entity. This is usually used when only information needs to be sent from the client to the server, and no new information content needs to be sent to the client.

206 Partial Content: This status code indicates that the client made a range request and the server successfully executed that part of the GET request. The response message contains the entity Content in the Range specified by content-range.

4.1.2 3XX Redirection

The 3XX response results indicate that the browser needs to perform some special processing to properly handle the request.

301 Moved Permanently: Permanent redirection. This status code indicates that the requested resource has been assigned a new URI and that the URI to which the resource now refers should be used later. That is, if the URI corresponding to the resource is already bookmarked, it should be saved again as indicated in the Location header field.

302 Found: Temporary redirection. This status code indicates that the requested resource has been assigned a new URI and is expected to be accessed by the user using the new URI. For example, the user saves the URI as a bookmark, but does not update the bookmark as if the 301 status code were present, but retains the URI of the page that returns the 302 status code.

303 See Other: This status code indicates that the resource corresponding to the request has another URI. The GET method should be used to obtain the requested resource. The 303 status code has the same functionality as the 302 Found status code, but differs from the 302 status code in that the 303 status code explicitly states that the client should use the GET method to obtain the resource.

304 Not Modified: Indicates the condition that the server allows the request to access the resource when the client sends request A with conditions attached, but the conditions are Not met. The 304 status code returned does not contain any response body. Although 304 is classified as 3XX, it has nothing to do with redirection.

307 Temporary Redirect: Indicates Temporary redirection. The status code has the same meaning as 302 Found.

4.1.3 4XX Client Error Description

The 4XX response results indicate that the client is the cause of the error.

400 Bad Request: Syntax errors exist in the Request packet. When an error occurs, you need to modify the content of the request and send the request again. In addition, the browser treats the status code as if it were 200 OK.

401 Unauthorized: This status code indicates that the request to be sent requires HTTP authentication. In addition, if the request has been made once before, the user authentication fails.

403 Forbidden: Accessing resources is denied by the server. It is not necessary for the server to give a detailed reason for the rejection, but if it is desired, the reason can be described in the body of the entity so that the user can see it.

404 Not Found: Requested resource could Not be Found on the server. In addition, it can be used when the server rejects the request without giving a reason.

4.1.4 5XX Server Error

The response from 5XX indicates an error occurred on the server itself.

500 Internal Server Error: An Error occurred when the Server performed the request. It could also be a Web application bug or some temporary glitch.

503 Service Unavailable: This status code indicates that the server is temporarily overloaded or is down for maintenance and cannot process requests at this time.

5. Web servers that collaborate with HTTP

5.1 Use a single virtual host to achieve multiple domain names

On the Internet, domain names are mapped to IP addresses (domain name resolution) by the DNS service after accessing the target website. As you can see, when the request is sent to the server, it is already accessed as an IP address. At the same IP address, the virtual Host can Host multiple Web sites with different Host names and domain names, so when sending HTTP requests, you must specify the URI of the Host name or domain name in the Host header.

5.2 Communication data forwarding program

In HTTP communication, in addition to clients and servers, there are applications for forwarding communication data, such as proxies, gateways, and tunnels. They work with the server.

These applications and servers can forward requests to the next server on the communication line, and can receive the response from that server and forward it to the client.

5.2.1 agent

The basic behavior of a proxy server is to receive a request sent by a client and forward it to another server. The agent does not change the request URI and sends it directly to the target server holding the resource ahead. The server that holds the resource entity is called the source server. The response returned from the source server passes through the proxy server to the client.

1. Cache proxy

When a Proxy forwards a response, the Caching Proxy pre-stores a copy of the resource (cache) on the Proxy server. When the proxy receives a second request for the same resource, it can not fetch the resource from the source server, but return the previously cached resource as a response.

2. Transparent proxy

Transparent Proxy is a type of Proxy that forwards requests or responses without processing the packets. On the contrary, an agent that processes the packet content is called an opaque agent.

5.2.2 gateway

Gateways work much like proxies. The gateway enables the server on the communication line to provide non-HTTP services. Using gateways improves communication security because the communication line between the client and gateway can be encrypted to secure the connection. For example, a gateway can connect to a database and query data using SQL statements. In addition, the gateway can be linked with the credit card settlement system when the credit card settlement is carried out on the Web shopping website.

5.2.3 requires tunnel

A tunnel can establish a communication line with other servers as required and use encryption methods such as SSL to communicate. The purpose of the tunnel is to ensure secure communication between the client and the server. The tunnel itself does not parse HTTP requests. That is, the request is forwarded to the subsequent server as is. The tunnel ends when the communication ends.

5.3 Saving the Cache of Resources

Cache: the resource copy stored on the local disk of the proxy server or client. Using caching reduces access to the source server, thus saving traffic and communication time.

Cache server: Is a type of proxy server and is classified under the cache proxy type. In other words, when the proxy forwards the response returned from the server, the proxy server keeps a copy of the resource. The advantage is that caching avoids multiple forwarding of resources from the source server. So clients can fetch resources from the nearest cache server, and the source server does not have to process the same request multiple times.

Client-side cache: The cache can exist not only in the cache server, but also in the client browser. If the browser cache is valid, it does not have to request the same resource from the server and can be read directly from the local disk. Also, like the cache server, when the cache is determined to be expired, the resource is validated to the source server. If the browser cache is invalid, the browser requests new resources again.

6. The HTTP header

6.1 HTTP Header

HTTP request packets consist of methods, URIs, HTTP versions, and HTTP header fields.

An HTTP response packet consists of the HTTP version, status code, and HTTP header field.

6.2 HTTP header Fields

6.2.1 HTTP header Field Structure

HTTP header fields consist of header field names and field values separated by colons (:).

Field values can have multiple values for a single HTTP header field, as shown below

6.2.2 Four HTTP header Field Types

General Header Fields: The Header used by both request and response packets.

Request Header Fields: The Header used when sending Request packets from the client to the server. Supplementary information such as the addendum content of the request, client information, and priority of the response content.

Response Header Fields: The Header used to return Response packets from the server to the client. Additional content added to the response also requires the client to attach additional content information.

Entity Header Fields: the Header used for the Entity part of the request and response packets. Added entity-related information such as when the resource content was updated.

6.2.3 HTTP/1.1 Header Fields overview

Generic header field

Request header field

Response header field

Entity head field

6.2.4 End-to-end and Hop-by-hop Headers

HTTP header fields are defined as the behavior of cached and uncached proxies, divided into two types.

End-to-end Header: The Header in this category is forwarded to the final recipient of the request/response, must be stored in the response generated by the cache, and must be forwarded.

Hop-by-hop Header: The Header in this category is valid only for a single forward and will not be forwarded because it passes the cache or proxy. In HTTP/1.1 and later versions, if a hop-by-hop header is to be used, the Connection header field must be provided.

The following are examples of hop – by – hop header fields in HTTP/1.1. Except for these eight header fields, all other fields belong to the end-to-end header.

Low Connection

Second-ranking Keep Alive

Low Proxy – Authenticate

When the Proxy Authorization

Low Trailer

Low TE

Low Transfer – Encoding

When the Upgrade

6.3 HTTP 1.1 Generic Header Fields

Common header field: indicates the header used by both request and response packets.

Cache-Control

The parameters of instructions are optional, and multiple instructions are separated by a comma. Directives with the header field cache-control can be used in requests and responses.

1. No-cache: prevents expired resources from being returned from the cache. If a client request contains no-cache, the client will not receive the cached response, and the cache server must forward the client request to the source server. If the response from the server contains no-cache, the cache server cannot cache the resources. The source server will not confirm the validity of the resources in the request from the cache server and will not cache the response resources.

2. No-store: The cache cannot store any part of the request or response locally.

It is easy to mistake no-cache for no cache, but no-cache means no cache of expired resources. The cache will process the resource after the expiration date is confirmed by the source server. No-store means no cache.

Connection

1. Control the header field that is not forwarded to the proxy: Use the Connection header field to control the header field (hop-by-hop header) that is not forwarded to the proxy during the request sent by the client and the response returned by the server.

2. Manage persistent connections: HTTP1.1 defaults to persistent connections, where clients continuously send requests. If the server wants to disconnect, set the Connection header field to Close. Prior to HTTP1.1, the default was non-persistent connections. Therefore, if you want to continue the Connection over older VERSIONS of HTTP, you need to set the Connection header field to keep-alive.

Date

Indicates the date and time when the HTTP packet is created.

Upgrade

Used to detect whether HTTP and other protocols can communicate with a higher version, and the parameter value can be used to specify a completely different communication protocol.

6.4 Request Header Field

Field used in the request packet sent from the client to the server to supplement the additional information of the request, client information, and priority of the response.

Accept

Notifies the server of the media types that the user agent can handle and the relative priority of the media types. You can specify multiple media types at once using the type/subtype form.

Accept: text/html,application/xhtml+xml,application/xml; Q = 0.9 * / *; Q = 0.8Copy the code

Host

Tells the server the Internet host name and port number of the requested resource. The Host header field is the only header field in the HTTP1.1 specification that must be included in the request.

When a request is sent to the server, the host name is simply an IP address. However, if multiple domain names are deployed under the same IP address, the server will not be able to understand which domain name corresponds to the request. Therefore, the header field Host is used to explicitly indicate the Host name of the request. If the server does not have a host name, set it to null.

If-Match

If- XXX is called a conditional request. Once received, the server will execute the request only if it determines that the specified condition is true.

The first field, if-match, is one of the strings that tells the server the entity tag (ETag) value used to Match the resource. The server cannot use weak ETag values. The server compares the field value of if-match with the ETag value of the resource. The request is executed only when the two values are consistent.

You can use an asterisk *, and the server will ignore the Etag value and process the request as long as the resource exists.

If-None-Match

In contrast to if-match, If the entity tag ETag used to specify if-none-match does not Match the ETag of the requested resource, it tells the server to process the request.

If-Modified-Since

If the resource is updated after the date and time specified in the if-Modified-since field, the server accepts the request.

After specifying the date and time of the if-modified-since field value, the requested resource will be returned only If the content has been Modified after the given date and time, with status code 200, or If the requested resource has Not been updated, with status code 304 Not Modified.

If-Unmodified-Since

If-unmodified-since is the opposite of if-modified-since. It tells the server that the request will be processed only if the specified request resource has not been updated after the specified date and time. If an update occurs after the specified date and time, the feed Failed is returned in response with the status code 412.

If-Range

If-range fields that match the ETag value or updated date and time are treated as Range requests. Otherwise, all resources are returned.

6.5 Response header Field

The server sends the fields used in the response packets to the client to supplement the additional information about the response, server information, and additional requirements for the client.

ETag

Entity identification, a unique identification of a resource as a string. The server assigns an ETag value to each resource. When the resource is updated, the ETag value needs to be updated.

If the connection is broken or reconnected during the download, the resource is specified according to the ETag value.

6.6 Entity header Field

The header used in the entity part of the request message and the response message, used to supplement the update time of the content and other entity-related information.

Allow

Notify the client of all HTTP methods it can support. When receiving an unsupported HTTP Method, the server returns a response with the status code 405 Method Not Allowed. At the same time, all supported HTTP methods are written to the header field Allow and returned.

Content-Encoding

Inform the client of the content encoding method chosen by the server for the body of the entity. Content encoding refers to compression without loss of entity information. The main options are Gzip, COMPRESS, Deflate, and Identity

Content-Length

Indicates the size in bytes of the body part of the entity. You can no longer use the content-Length header field when transferring Content encoding to entity bodies.

Content-Type

Specifies the media type of the object in the entity body, assigned by type/subtype.

Content-Type: text/html; charset=UTF-8
Copy the code

Expires

Expires tells the client when the resource Expires. The cache server responds to a request with a cache after receiving an Expires response, and a copy of the response is kept until the Expires field value is specified. When the specified time passes, the cache server turns to the source server to request the resource when the request is sent.

When the source server does not want the cache server to cache the resource, it is best to write the same time value as Date in the Expires field. However, when a max-age is specified in the header field cache-Control, the max-age directive takes precedence over Expires.

Last-Modified

Contains the date and time that the resource identified by the source server was modified.

6.7 is the header field of the Cookie service

Set-Cookie

Set-Cookie: status=enable; expires=Tue, 05 Jul 2011 07:26:31 GMT; โ‡’ path = /; domain=.hackr.jp;Copy the code

Property of the set-cookie field

Once a Cookie is sent from the server side to the client, the server side has no way to explicitly delete the Cookie. However, the substantial deletion of client cookies can be achieved by overwriting expired cookies.

Cookie

Cookie: status=enable
Copy the code

When a client wants HTTP state management support, it includes a Cookie received from the server in the request. When multiple cookies are received, they can also be sent in the form of multiple cookies.

6.8 Other Header Fields

X-Frame-Options

X-Frame-Options: DENY
Copy the code

DNT of the header field can be specified as follows:

โ— DENY: DENY

โ— SAMEORIGIN: licensed only when pages match under the SAMEORIGIN domain name.

X-XSS-Protection

The header field X-xSS-protection belongs to the HTTP response header. It is a countermeasure against cross-site scripting attacks (XSS) and controls the browser’s XSS Protection mechanism on or off.

DNT of the header field can be specified as follows:

โ— 0: Sets XSS filtering to invalid state

โ— 1: Set XSS filtering to a valid state

DNT

DNT in the first field belongs to the first HTTP request. DNT is short for Do Not Track, which means to refuse to collect personal information and is a method to refuse to be tracked by accurate advertisements.

DNT of the header field can be specified as follows:

โ— 0: Consent to be tracked

โ— 1: Refuse to be tracked

7. Ensure Web security through HTTPS

7.1 Disadvantages of HTTP

โ— Communication is in clear text (not encrypted) and the content may be eavesdropped

โ— The identity of the communicating party is not verified and therefore may be disguised

โ— The integrity of the packet cannot be proved, so it may be tampered with

7.2 HTTP+ Encryption + Authentication + Integrity Protection = HTTPS

We call HTTP with encryption and authentication mechanisms HTTPS (HTTP Secure). HTTPS is not a new protocol at the application layer. The HTTP communication interface is replaced by the Secure Socket Layer (SSL) and Transport Layer Security (TLS) protocols.

SSL is independent of HTTP. Therefore, SSL can be used with other protocols, such as SMTP and Telnet, that run on the application layer.

Symmetric key encryption: Encryption and decryption using the same key is called shared key encryption.

HTTPS uses a mixture of shared key encryption and public key encryption. If the key can be exchanged securely, it is possible to consider using public-key encryption only for communication. However, public key encryption is slower than shared key encryption. Public key encryption is used in the key exchange and shared key encryption is used in the subsequent stage of establishing communication exchange messages.

The public key Certificate issued by the DIGITAL Certificate Authority (CA) and its related authorities is the trusted public key for authentication. The server sends the public key Certificate issued by the CA to the client for communication in public-key encryption mode. Public key certificates can also be called digital certificates or simply certificates.

HTTPS communication steps

  1. The Client sends a Client Hello packet to start SSL communication. The packet contains the SSL version supported by the client and the CipherSuite list (encryption algorithm used and key length).

  2. When SSL communication is enabled, the Server responds with Server Hello packets. As with the client, the message contains the SSL version as well as the encryption component. The server’s encryption component content is filtered from the received client encryption component.

  3. The server then sends a Certificate packet. The message contains a public key certificate.

  4. Finally, the Server sends a Server Hello Done packet to notify the client that the INITIAL SSL handshake negotiation is complete.

  5. After the first SSL handshake is complete, the Client responds with a Client Key Exchange packet. The packet contains a random password string called pre-master secret, which is used in communication encryption. The packet is encrypted with the public key in Step 3.

  6. The client then sends a Change Cipher Spec packet. The message prompts the server that all subsequent communications are encrypted with the Premaster Secret key.

  7. The client sent a Finished packet. Procedure The packet contains the overall checksum of all packets so far connected. Whether the handshake negotiation can succeed depends on whether the server can decrypt the packet correctly.

  8. The server also sends a Change Cipher Spec packet.

  9. The server also sent a Finished packet.

  10. After exchanging Finished packets between the server and client, the SSL connection is established. Of course, the communication is protected by SSL. This is where application layer protocol communication starts, that is, sending HTTP requests.

  11. Application layer protocol communication, that is, sending HTTP responses.

  12. Finally, the client disconnects. When disconnected, send close_notify.

In the preceding process, the application layer attaches a message digest called MAC (MAC) to the data it sends. The MAC checks whether packets are tampered to protect packet integrity.

Is SSL slow

There are two types of SSL slowness. One is slow communication. On the other hand, the processing speed slows down due to the large consumption of CPU and memory resources.

Compare to HTTP: network load can be 2 to 100 times slower. In addition to TCP connections and SENDING HTTP requests and responses, SSL communication is required, which inevitably increases the overall traffic volume.

There is no fundamental solution to the slow speed problem, we will use hardware like SSL accelerators to improve the problem. Use the SSL accelerator only for SSL processing to share the load.

The reason why HTTPS is not always used

Compared with plain text communication, encrypted communication consumes more CPU and memory resources. If every communication is encrypted, it consumes a considerable amount of resources, and the number of requests that can be processed on a single computer is bound to decrease.

For non-sensitive information, HTTP is used. For sensitive data, such as personal information, HTTPS is used for encrypted communication. You can save resources by encrypting only those where information needs to be hidden.

In addition, the desire to save on the cost of purchasing certificates is also a reason.

8. Verify the authentication of the access user

What is a certified

Collated information usually refers to:

โ— Password: string information that only you know.

โ— Dynamic token: only the one-time password displayed in the device that I hold.

โ— Digital certificate: only the information held by myself (terminal).

โ— Biometric authentication: personal physiological information such as fingerprints and iris.

โ— IC card, etc. : Only the information held by myself.

HTTP Authentication mode

โ— BASIC Certification

โ— DIGEST Authentication

โ— SSL client authentication

โ— FormBase authentication (form-based authentication)

9. Function addition protocol based on HTTP

9.1 WebSocket for Full-duplex Communication using a Browser

WebSocket technology is primarily designed to address the problems caused by the defects that come with XMLHttpRequest in Ajax and Comet.

9.2 the WebSocket protocol

Features:

Push function: The server pushes data to the client. This way, the server can send data directly without waiting for the client to request it.

Reduce traffic: As long as a WebSocket connection is established, you want it to stay connected. Not only is the total overhead per connection reduced compared to HTTP, but there is also less traffic due to the small size of the WebSocket header. Implementing WebSocket communication requires a “Handshaking” step after an HTTP connection is established.

Handshake ยท Request: In order to achieve WebSocket communication, the Upgrade header field of HTTP needs to be used to inform the server that the communication protocol has changed, so as to achieve the purpose of handshake.

GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Origin: http://example.com
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version: 13
Copy the code

The sec-websocket-key field records the essential Key values in the handshake process. The sec-websocket-protocol field records the subprotocol used.

Handshake ยท Response: Response from status code 101 Switching Protocols is returned for the previous request.

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Protocol: chat
Copy the code

The value of the SEC-websocket-Accept field is generated from the value of the SEC-webSocketKey field in the handshake request. After a successful handshake establishes a WebSocket connection, communication uses websocket-independent data frames instead of HTTP data frames.

WebSocket communication diagram:

The following is an example of calling the WebSocket API and sending data every 50ms

var socket = new WebSocket('ws://game.example.com:12010/updates');
socket.onopen = function () {
 setInterval(function() {
 if (socket.bufferedAmount == 0)
 socket.send(getUpdateData());
 }, 50);
};
Copy the code

9.3 HTTP/2.0(Improves user speed experience when using the Web.)

Features:

  1. HTTP/2.0 uses binary format rather than text format.

  2. HTTP/2.0 is fully multiplexed, not ordered and blocking — only one connection is required to achieve parallelism.

  3. Using header compression, HTTP/2.0 reduces overhead.

  4. HTTP/2.0 allows the server to actively “push” responses into the client cache.

10. Techniques for building Web content

10.1 HTML

HTML (HyperText Markup Language) is a Markup Language developed to send HyperText on the Web.

Hypertext is a document system that associates information anywhere in a document with other information (text, images, etc.), known as hyperlinked text.

10.2 Web application

A Web application is an application provided through Web functionality.

CGI starts up every time it receives a request, and once the traffic becomes too heavy, the Web server takes on a considerable load. Servlets, on the other hand, run in the same process as Web servers, so they are less loaded.

The environment in which servlets run is called a Web container or Servlet container. With the popularity of CGI, the CGI running mechanism of launching a new CGI program on every request became a performance bottleneck. Servlets live in memory and, on each request, can start servlets that are lighter than the process level, making the program more efficient.