1.1 Accessing the Web Using HTTP

Request the client -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - '>' server (using the HTTP protocol communication) clients' < '-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- a server responseCopy the code

The client (Web browser) displays the Web page by sending a request to the Web server to obtain information such as file resources

1.2 The birth of HTTP

Before we dive into HTTP, let’s take a look at the background of its birth. Understanding the background will also help us understand why HTTP was developed

1.2.1 Origins of the Web

HTTP was born in March 1989, at the dawn of the Internet, when it was still owned by a few.

The original idea was to make the WWW (World Wide Web) accessible to each other by means of HyperText, which is formed by the interrelation of multiple documents.

Now there are 3 technologies for building WWW:

  1. HTML (HyperText Makeup Language)
  2. HyperText Transfer Protocol (HTTP)
  3. Uniform Resource Lacator URL (Uniform Resource Lacator)

1.2.2 The Age of Web growth

1.2.3 HTTP stands still

1.3 Basic NETWORK TCP/IP

To understand HTTP, we need to understand the TCP/IP protocol family.

Commonly used networks, including the Internet, operate on the basis of the TCP/IP protocol family. HTTP is a subset of that.

1.3.1 TCP/IP protocol family

To communicate with each other, computers and network devices must be based on the same method and rules, and we call such rules protocol, and we collectively call internet-related protocols TCP/IP

1.3.2 Layered TCP/IP Management

An important aspect of the TCP/IP protocol family is layering. The TCP/IP protocol family is divided into the following layers:

  1. The application layer: determines the activities that communicate when providing application services to users.
    • File Transfer Protocol (FTP)
    • DNS (Domain Name System)
    • HyperText Transfer Protocol (HTTP)
  2. The transport layer: Provides data transfer between two computers in a network link for the upper application layer.
    • Transmission Control Protocol (TCP)
    • User Data Protocol (UDP)
  3. Network layer: The smallest unit of network transmission used to process packets flowing over a network. This layer defines the path through which the packets reach each other’s computers and are transmitted to each other.
  4. Data link layer: Handles the part of the hardware connected to the network
    • Control operating system
    • Device driver for hardware
    • Nic (Network Interface Card)
    • Optical fiber and other physical visible parts
    • Other hardware…

Benefits of hierarchy:

  1. When a design change is needed somewhere, just replace the changed layer
  2. The design becomes relatively simple

1.3.3 TCP/IP Traffic

Let’s use HTTP as an example:

The sender:

  1. Application layer: The client sends HTTP requests
  2. Transport layer: For convenient transmission, the transport layer (TCP) divides the data (HTTP request packets) received from the application layer, marks the serial number and port number on each packet, and forwards the packets to the network layer
  3. Network layer: Adds the MAC address as the communication destination and forwards the MAC address to the link layer
  4. Link layer: Sends data

The receiver:

  1. Link layer: Receives data
  2. Network layer:…
  3. Transport layer:…
  4. Application layer: Received HTTP request

When transmitting data from layer to layer, the sender must print the header information of the layer every time it passes through the layer.

The receiver, on the other hand, cancels out the corresponding headers as it passes from layer to layer.

This practice of wrapping data information is called encapsulate.

1.4 Protocols closely related to HTTP: IP, TCP, and DNS

Here are three protocols (IP, TCP, and DNS) that are closely related to HTTP in the TCP/IP protocol family.

1.4.1 IP protocol responsible for transmission

By layer, the Internet Protocol (IP) is located at the network layer.

“IP protocol “! == “IP address”

The role of IP protocol: the various packets to each other. And to ensure that it actually gets to the other side, all sorts of conditions have to be met,

Two important conditions are IP Address and MAC Address (Media Access Control Address)

The IP address indicates the IP address assigned to the node, and the MAC address indicates the fixed IP address of the nic. An IP address can be paired with a MAC address. The IP address can be changed, but the MAC address is basically unchanged.

The communication between IP addresses depends on MAC addresses

In general, the communication parties are rarely in the same LAN. They need to go through multiple computers and network devices to connect to each other. During the transfer, the MAC address of the transfer device at the next station is used to search for the next transfer target. Then the Address Resolution Protocol (ARP) is used.

The Address Resolution Protocol (ARP) is used to resolve addresses. Based on the IP Address of the communication party, the CORRESPONDING MAC Address can be traced.

No one has a complete picture of what’s going on over the Internet

Computers and network devices such as routers have only a rough idea of the route they take before reaching their destination.

This mechanism, called routing, is a bit like a Courier company’s delivery process. People who want to send express, as long as their goods to the distribution center, you can know whether the express company is willing to accept the delivery, the distribution center of the express company to check the delivery address of the goods, clear the next station to which region of the distribution center. The distribution center in that area then decides whether it can be delivered to the other person’s home.

We want to use this metaphor to show that no computer, no network device, can fully grasp the details of the Internet.

1.4.2 TCP to ensure reliability

In terms of layers, TCP is located at the transport layer and provides reliable Byte Stream Service.

Byte Stream Service: Divides a chunk of data into packets based on segment for easy transmission. A reliable transmission service is one that can transmit data accurately and reliably to the other party. In a nutshell, TCP splits up big data to make it easier to move it around, and TCP can make sure the data gets to each other.

TCP uses a three-way handshaking strategy to deliver data accurately to the destination.

Three-way handshaking: When a packet is sent using TCP, TCP does not ignore what happens after it is sent. It always confirms that it was delivered successfully. TCP flags SYN (Synchronize) and ACK (Acknowledgement) are used in the handshake.

Steps:

  1. The sender first sends a packet with the SYN flag to the peer.
  2. After receiving the packet, the receiving end sends a packet with the SYN/ACK flag to confirm the packet.
  3. Finally, the sender sends back a packet with an ACK flag, indicating the end of the handshake. If the handshake is interrupted, the TCP protocol sends the same packets in the same order again.

    In addition to the three-way handshake, TCP has other means to ensure the reliability of communication.

1.4.3 DNS service responsible for domain name Resolution

The Domain Name System (DNS) service is a protocol at the application layer like HTTP. It provides domain name to IP address resolution service.

Computers can be assigned IP addresses as well as host names and domain names. Such as www.hackr.jp.

Users usually use host names or domain names to access each other’s computers, rather than directly through IP addresses. That’s because it’s better to remember a computer name as a combination of letters and numbers than as a set of pure numbers for an IP address.

But getting computers to understand names is relatively difficult. Because computers are better at processing long strings of numbers.

In order to solve the above problems, DNS service came into being. The DNS provides the service of searching IP addresses by domain names or reverse-searching domain names from IP addresses.

1.5 Relationship between Various Protocols and HTTP

After learning the various protocols in the TCP/IP protocol family, which is inseparable from HTTP, let’s take a look at the roles played by IP, TCP, and DNS in the communication process using HTTP.

1.6 the URI and URL

We are more familiar with Uniform Resource Locator (URL) than URI (Uniform Resource Identifier). A URL is a Web page address that you need to enter when accessing a Web page using a Web browser. For example, hackr.jp/ below is the URL.

1.6.1 Uniform Resource Identifier (URI)

URI stands for Uniform Resource Identifier. RFC2396 defines these three words as follows.

Uniform

Specifying a uniform format makes it easy to handle many different types of resources without having to identify specific access methods for resources based on context. It is also easier to join new protocol schemes such as HTTP: or FTP:.

Resource

A resource is defined as “anything identifiable”. Not only documents, but images or services (such as the weather forecast for the day) that can be distinguished from other types can be used as resources. In addition, resources can be not only a single, but also a collection of many.

Identifier

Represents an identifiable object. Also called identifiers.

In summary, a URI is a location identifier for a resource represented by a protocol scheme. A protocol scheme is the name of the protocol type used to access resources.

If HTTP is used, the protocol scheme is HTTP. In addition, there are FTP, Mailto, Telnet, file, etc. There are about 30 standard URI protocol schemes, which are Assigned by ICANN (Internet Corporation for Assigned Names and Numbers, Internet Assigned Numbers Authority (IANA) manages and promulgates the Internet Assigned Numbers Authority.

A URI identifies an Internet resource as a string, and a URL represents the resource’s location (its location on the Internet)

Visible urls are a subset of URIs

The URI example is as follows:

http://www.ietf.org/rfc/rfc2396.txt  ldap: / / / 2001: db8: : 7 / c = GB? ObjectClass? One   at mailto:[email protected] News: comp. Infosystems. www.servers.unix  tel: + 816-555-1212  telnet://192.0.2.16:80/  Urn: oasis: names: specification: docbook: a DTD, XML: 4.1.2Copy the code

1.6.2 URI format

Protocol Scheme Name

Use HTTP: or HTTPS: to access the resource, specify the protocol type, which is case insensitive, with a colon (:) at the end. You can also use data: or javascript: to specify the scheme name of the data or script.

Login Information (Authentication)

Specify the username and password as the login information (authentication) necessary to obtain resources from the server. This item is optional.

Server address

With absolute URIs, you must specify the server address to be accessed. The address can be a DNS resolvable name like hackr.jp, an IPv4 address name like 192.168.1.1, or an IPv6 address name enclosed in square brackets like 0:0:0:0:0:0:0:1.

Server port number

Specifies the network port number to which the server is connected. This option is optional. If omitted, the default port number is automatically used.

Hierarchical file path

Specifies the file path on the server to locate the specified resource. This is similar to the file directory structure on UNIX systems.

Query string

You can use the query string to pass in arbitrary parameters for resources within the specified file path. This item is optional.

Fragment identifier

Fragment identifiers are often used to mark a child resource (a location within a document) within an acquired resource. However, the RFC does not specify its use method. This item is also optional.