preface

I wrote this article because I didn’t have a complete understanding of HTTP. In the past three years, I have been using HTTP protocol all the time, and I have also written several articles about the introduction and usage of some details of HTTP. But whenever someone asks what HTTP is, they get stuck. You can’t say that using HTTP over the years is just tuning libraries (not 🤐).

This usually happens because we learn HTTP using an inefficient methodology – knowledge gnawing. How many of you, like me, plan to learn HTTP from the Definitive GUIDE to HTTP, or just read RFC2616 (I’m calling you God). The result is fruitless, read is equal to did not see, or simply can not go down, because the knowledge is really too boring!

There is a relatively efficient methodology – what-how-why.

When facing a new knowledge point, if come up directly to talk about the principle, probably most people will be confused

But if we start from the surface:

  • What? – What is this thing?
  • How – Learn more about this thing and How it works?
  • Why – Finally, to understand how it works and Why it works the way it does.

Then the learning process will not be too boring, ask why, and then take the initiative to search for information to answer. This shift from passive rote learning to active learning is better for long-term retention.

Everyone more or less knows the concept of HTTP and how to use HTTP (call library 🤭), so this article chooses the point of view of HTTP, hoping to help you.

According to the formation of HTTP

In the course of learning HTTP, and more broadly when learning about other computer technologies, remember this: Form follows Function! Function determines form. When you’re wondering why HTTP is designed the way it is, you need to understand what problem HTTP is trying to solve.

It is impossible to speak of HTTP without mentioning its main author: Roy Fielding and his famous paper Architectural Styles and the Design of Network-Based Software Architectures

Roy Fielding mentioned in this paper why HTTP was created:

Berners-Lee writes that the “Web’s major goal was to be a shared information space through which people and machines could communicate.”

Berners-lee, the inventor of the World Wide Web, said the main purpose of the Web was for people and machines to communicate and share information Spaces. It’s about people and machines!

According to RFC2616:

HTTP is the foundation of data communication for the World Wide Web

That is, HTTP is a protocol implemented for the main purpose of the World Wide Web, which is for people and machines to communicate and share information. And that says a lot:

  • Have you ever wondered why HTTP packets are written in ASCII while other protocols (such as TCP and IP) are written in binary format? It is because to serve people, to facilitate people to read.
  • Why do scripts and styles run on the client side? Because it is easy for people to operate and people look beautiful.

Problems that must be faced to implement the World Wide Web

To implement the World Wide Web, web architectures must face the following problems. Roy Fielding mentioned in the paper: (1) Low threshold: The users of the World Wide Web are not only computer practitioners, but also the general public who use browsers to visit websites, so it should be simple and universal, which is the main reason for choosing Hypertext.

(2) Scalability: While simplicity makes it possible to deploy initial implementations of distributed systems, scalability avoids being stuck in the limitations of deployment forever. Even if it is possible to build a software system that perfectly meets the needs of users, those needs will change over time, and a system that wants to last as long as the Web must be prepared to accept change.

(3) Hypermedia in distributed systems: Distributed hypermedia allows presentation and control information to be stored on remote servers. That is, user operations in distributed hypermedia systems require the transfer of large amounts of data from the data storage location to the client. Therefore, a Web architecture must be designed for large-grained data transfer.

(4) Internet scale: The World Wide Web aims to be an Internet-scale distributed hypermedia system, which means more than just geographical dispersion. The Internet is about interconnected information networks that cut across organizational boundaries. Information service providers must be able to cope with Anarchic extensibility and the need for independent deployment of software components.

  • Anarchic expansibility: Anarchic means Anarchic. Most software systems are built with the implicit assumption that the entire system is under the control of one entity. But you can’t make that assumption when a system is operating openly on the Internet. Anarchic extensibility refers to the need to continue to run the architecture when given malformed or maliciously constructed data that may communicate with elements outside the control of the organization/government. Multiple organizational boundaries mean that there can be multiple trust boundaries in any communication. Applications should interact by assuming that any information received is untrusted, or require some additional authentication before giving trust.

    HTTP provides authentication mechanisms: basic authentication and digest authentication

  • Independent deployment: Multiple organizational boundaries also mean that systems must be prepared for gradual and fragmented change in which new and old systems need to coexist.

  • Forward compatibility: HTTP/0.9 since 1993, HTTP/1.0 since 1996. Later versions must be compatible with earlier versions.

    HTTP is designed to be distinguished by major and minor version numbers, and the server must adhere to the constraints of the HTTP version protocol contained in each message. Access from http://. Compatible with HTTP2 and WebSocket protocols using the Upgrade header

REST Architectural Style

Roy Fielding proposes a REST architectural style to address these issues.

Representational State Transfer REST is representational State Transfer. Roy Fielding’s set of architectural constraints and principles, any architecture that meets REST constraints and principles, is called a RESTful architecture.

Start by understanding REST in terms of presentation and state transition

  • describe

RESTful architecture is resource-based architecture, and the meaning of resource is broad: any information that can be named can be a resource. Such as documents, images, time services, and so on, a resource is a conceptual mapping to a set of entities.

So representation refers to some form of representation of a resource, usually a specific URI. (See my article for more information on URIs.)

So what is a RESTFUL STYLE URI? The emphasis is on nouns.

http://example.com/tickets/1111
http://example.com/orders/2021/12/25
Copy the code

It should be easy for callers to infer the meaning of URIs, which clearly identify a single resource.

http://example.com/tickets
http://example.com/orders/2021
Copy the code

But look at the differences between these two URIs. As you can see, these two URIs are not identifiers for a single thing, but rather for a collection of things (if the first URI represents all tickets, the second is for all orders for 2021). These collections are resources in their own right and should also be identified.

You can see that both tickets and orders identify the resource in a plural form, which is appropriate. While a language teacher may object to using plural for individual resources, for the sake of simplicity and consistency in the FORMAT of urIs, it’s good to stick to plurals without having to worry about choosing person or people.

Resources can be expressed in many ways, such as different language translation, different compression methods and so on. HTTP addresses this scenario using content negotiation.

  • State transition

State refers to resource state. The resource status is stored on the server. The client uses RESTful apis to specify the request method, resource path, and resource expression to perform CRUD on the resource status to change the resource status. This is called state transfer.

With resources defined, it’s time to determine which operations apply to them and how those operations will map to the API. RESTful provides policies for handling CRUD operations using HTTP methods, such as:

GET /tickets - GET all tickets GET /tickets/12 - GET a ticket POST /tickets - Create a new ticket PUT /tickets/12 - Modify the ticket information of a ticket PATCH /tickets/12 - Partially modify a ticket information DELETE /tickets/12 - DELETE a ticketCopy the code

The great thing about REST is that you can leverage existing HTTP methods to implement important functionality on a single/Tickets endpoint. There is no need to follow any method naming convention, and the URI structure is clean.

Roy Fielding in his paper deduces the REST architectural style from an empty architecture and adds constraints step by step

REST constraints

REST constraints are:

  • Client-server
  • stateless
  • The cache
  • Unified interface
  • Layered system
  • On-demand code (optional)

(1) Client-server

The client-server pattern separates user interface issues from data storage issues, improves user interface portability across multiple platforms, and improves scalability by simplifying server components. Most importantly for the Web, separation allows components to evolve independently, thus supporting the internet-scale requirements of multiple organizational domains.Resources are stored on a server. The client sends an HTTP request to the server, and the server returns the requested resources in the HTTP response. So client-server together constitute the basic components of the Web.

(2) Stateless communication must be stateless in nature, that is, there is no dependency between two requests, so that all the information required by the request must be contained from the client to the server, and no context can be stored on the server. This constraint results in improved visibility, reliability, and scalability. The disadvantage is to send duplicate request headers in a series of requests.

HTTP uses cookie mechanism to solve the user identification, persistent session scenario

(3) Cache Cache constraints are mainly used to improve network efficiency. A cache constraint requires that the data in the response to a request be marked either implicitly or explicitly as cacheable or non-cacheable. If the response is cacheable, the client cache can reuse the data for that response for future identical requests, reducing network interaction and improving efficiency, scalability, and user-perceived performance.

HTTP caches are divided into shared caches and private caches

(4) Unified Interface The core feature that sets REST architectural styles apart from other architecture-based styles is its emphasis on unified interfaces between components. By applying general-purpose software engineering principles to component interfaces, the overall system architecture is simplified and interaction visibility is improved. The separation of implementations from the services they provide encourages independent evolvability.

However, unified interfaces come at the cost of reduced efficiency because information is transferred in a standardized form rather than in a form specific to application requirements. RESTful interfaces are designed for efficient, large-grained hypermedia data transfer, optimized for common situations on the Web, but resulting in interfaces that are not optimal for other forms of architectural interaction.

(5) Layered layered system architecture constraints divide the architecture into several layers, delimit the boundary of each layer, thus reducing the complexity of each layer design. At the same time, layers can abstract the heterogeneity of the bottom layer, provide unified interfaces for the upper layer, and simplify the logic of the upper layer.

(6) Code on Demand REST allows you to extend client functionality by downloading and executing code in the form of applets or scripts. This simplifies the client by reducing the amount of functionality that needs to be implemented up front. Allowing post-deployment downloads improves system scalability. However, it also reduces visibility, so it is only an optional constraint in REST.

Based on TCP

Almost all HTTP traffic in the world is carried by TCP, and the client app opens a TCP connection to a server app that may be running anywhere in the world. Once the connection is established, messages exchanged between the client and server computers are never lost, corrupted or out of order.

TCP protocol

An HTTP connection is really just a TCP connection and some rules for using the connection. A TCP connection is a reliable connection over the Internet. To send data correctly and quickly, you need to understand some of the basics of TCP.

TCP is a connection-oriented, reliable, byte stream – based transport-layer communication protocol. Transport layer protocols refer to logical communication between application processes running on different hosts.

  • Connection-oriented: One-to-one is connection-oriented (client-server mechanism). TCP allows the client and server to exchange TCP control information before HTTP packets are sent. This handshake prompts the client and server to prepare for the arrival of a large number of packets. After the handshake phase, a TCP connection is established between the sockets of the two processes. The connection is full-duplex, that is, the processes of the two connected parties can send and receive packets on the connection.

  • Reliable data transfer: Communication processes can rely on TCP to deliver all sent packets error-free and in the proper order. When one end of an application passes bytes into a socket, it can rely on TCP to deliver the same byte stream to the receiving socket without byte loss and redundancy.

  • Byte stream based: Messages are borderless, so no matter how big a message is, it can be transmitted. In addition, the messages are sequential. If one packet is lost, subsequent packets cannot be used. You can use the subsequent packets only after resending the lost packet.

You probably won’t notice when you first look at the HTTP packet format (and there are a lot of articles that don’t mention it)Each line is clearly segmented: CRLF This is designed to accommodate TCP. TCP is based on byte stream. When a large packet is transmitted to TCP, the packet is divided into multiple segments (called MSS) according to the size of the packet segment specified by the handshake parties. TCP only transmits segments, but does not handle boundaries, which requires HTTP to delimit boundaries. HTTP packets other than entities are processed in CRLF segments.

The entity body (packet) of the message also needs boundary processing. HTTP packet transport can be done in two ways:

  • Content-length is used to represent a fixed-length package. When content-Length is less than the actual number of bytes in the package, the browser dismisses subsequent bytes. If greater than, an error is reported.
  • Variable packet length. Transfer-encoding indicates chunk transmission. The packet is divided into chunks for transmission. The last chunk is denoted by last-chunk and the total length of the packet is finally carried.

Come back to

For reasons of space, what has not been mentioned is left to the reader to explore and interpret. Looking back at the HTTP definition in RFC7230, are these adjectives easy to understand?

The Hypertext Transfer Protocol (HTTP) is a stateless application-level request/response protocol that uses extensible semantics and self-descriptive message payloads for flexible interaction with network-based hypertext information systems

At the end

Creation is not easy, please move your hands to point out a praise.

For more articles, please go to Github, if you like, please click star, which is also a kind of encouragement to the author.

This paper quotes the following information:

  • Architectural Styles and the Design of Network-Based Software Architectures
  • The Definitive GUIDE to HTTP
  • Detailed Description of Web Protocol and Packet Capture