directory
UDP and TCP UDP TCP three handshake four wave three domain name and IP IP address domain name domain name resolution domain name record four CDN front technology working principle five, communication data forwarding proxy gateway tunnel six, "wall"Copy the code
This article is an original article, ten thousand words long, it is recommended to read after the first code.
The introduction
In 1969, the first real Internet appeared in the United States, and it became known as ARPANET. Initially, ARPANET was used only for military purposes; With the access of various universities, it began to be used in the field of scientific research; In 1983, the United States Advanced Research Projects Agency and the United States Department of Defense Communications Agency successfully developed the TCP/IP protocol for heterogeneous networks, the United States Berkeley university of California to use the protocol as a part of its BSD UNIX (the software group at the University of California Berkeley operating system), making the protocol became popular in the society. Thus was born the true Internet. NSFnet, established by the National Science Foundation (NSF), uses TCP/IP. By 1990, ARPANET had retired; NSFnet becomes one of the important backbone networks of Internet.
First, the TCP/IP
Protocol family
For computers and network equipment to communicate with each other, they must do so in the same way. For example, rules about how to detect a communication target, which side initiates the communication first, which language to use, and how to end the communication need to be determined in advance. Communication between different hardware, operating systems, all of this requires a set of rules. And we call that a protocol.
TCP/IP is the general name of all kinds of protocols related to the Internet, known as TCP/IP protocol family, there are also said that TCP/IP refers to TCP and IP these two protocols. The networks we usually use (including the Internet) operate on the basis of the TCP/IP protocol family.
Two of the most important protocols are TCP and IP:
- TCP protocol: located in the transport layer, providing reliable byte stream service.
- TCP/IP protocol:
- The Internet Protocol (IP) resides at the network layer
- Function is to send various data packets to each other; All sorts of conditions need to be met to make sure it actually gets there. Two of the most important conditions are the IP Address and MAC Address (Media Access Control Address).
- The IP address indicates the address assigned to the node, and the MAC address indicates the fixed address to which the NIC belongs. IP addresses can be paired with MAC addresses. IP addresses can be changed, but MAC addresses are basically unchanged. In ARP, the MAC address of the peer party can be undetected based on the IP address. In RARP, the IP address can be undetected based on the MAC address.
Hierarchical management
Now let’s talk about the hierarchical structure of the network that you are familiar with
- ISO OSI/RM protocol (7-layer protocol)
- TCP/IP layer 4 protocol
Here we mainly talk about TCP/IP layer 4 protocol
- Application layer: determines the communication activities when providing application services to users; Provide a variety of common services, we are familiar with THE HTTP protocol, FTP, DNS are in this layer.
- Transportation layer: provides data transmission between two computers in the network for the application layer; There are two important protocols in this layer: TCP and UDP.
- Network layer: used to process the data packets flowing over the network and determine what transmission route should be used to reach the target computer and transmit the data packets to the other party. The main protocol is IP protocol.
- Network interface layer: also known as the link layer. It mainly deals with the hardware part of the network, such as the control operating system, device driver, NIC, and optical fiber.
(Packet: the smallest unit of data transmitted over a network)
Let’s take a common HTTP request as an example to see how the layered protocol works
- Client issue
- First, the client makes an HTTP request at the application layer
- Then, the transport layer (TCP) divides the received HTTP request packet into multiple copies, adds the TCP header (tag, port number, etc.) to each copy, and forwards the packets to the network layer
- The network layer (IP protocol) adds the IP header (destination MAC address) and forwards the IP header to the network interface layer
- The network interface layer plus the Ethernet header is sent through the hardware device
- The server receives the data received at the network interface layer and sends it to the upper layer in sequence until it reaches the application layer. When it passes through each layer, it will peel off the corresponding head.
UDP and TCP
Both UDP and TCP are in the transport layer, responsible for the transmission of data.
UDP
User Data Protocol (UDP) Transmits Data based on datagram.
Take a look at the structure of UDP packets
The UDP packet header contains four fields
- Source port: 16-bit UDP port used by the sender. This parameter is optional. The default value is 0
- Destination port: 16-bit port used by the receiver.
- Length: 16 bits, which indicates the length of the UDP datagram, including the UDP packet header and the LENGTH of the UDP data. The length of a UDP packet header is 8 bytes, so the minimum value of this field is 8.
- Check value: a 16-bit value used to check whether data is damaged during transmission
UDP adds a header to the packet and directly forwards the packet to the network layer (IP protocol). Therefore, the processing speed is fast. However, it does not care whether the data can be successfully received. Common application scenarios include the ping command, live broadcast, chat room, and Internet of Things.
TCP
Transmission Control Protocol (TCP) is a connection-oriented, reliable, and byte stream-based transport layer communication Protocol.
The TCP protocol can receive data byte by byte. The received data consists of segments of different lengths. The TCP header is added to each segment and then forwarded to the network layer.
Old rules, first look at the TCP header (header) structure, more complex, borrowed a picture
- Source port: 16-bit, the port number of the sender application
- Destination port: 16-bit port number of the receiving application
- Sequence Number field: 32 bits, which represents the Number of the first byte of the data sent by this article. In A TCP connection, each byte of the byte stream passed is numbered sequentially. When the SYN flag is not 1, this is the sequence number of the first letter of the current data segment; If SYN has a value of 1, the value of this field is the initial sequence value (ISN), which is used to synchronize the sequence number
- Acknowledgment Number (ACK Number) field: 32-bit, indicating the Number of the first byte of data expected by the receiver to receive the next packet segment from the sender. The value is the next sequence number to be received by the receiving computer, that is, the sequence number of the next byte received plus one.
- Header length: The 4-bit header, also known as the “data offset field,” determines the length of the TCP data segment header and tells the receiving application where to start the data
- Reserved: Four bits, Reserved for future USE in extending TCP. Currently, all fields must be 0
- Flag bit field(Each field occupies 1 bit) :
- CWR: Congestion Window Reduce (CWR) flag, used to indicate that it receives TCP packets with the ECE flag set. Moreover, after receiving the message, the sender reduces the sending rate by reducing the size of the sending window.
- ECE (ECN Echo) : Indicates that a TCP end has the ECN function during the TCP three-way handshake. During data transmission, it is also used to indicate that the ECN of the IP header of the received TCP packet is set to 11, i.e. the network line is congested.
- URG (Urgent) : indicates whether the data sent in the article contains Urgent data. URG=1 indicates urgent data. The following emergency pointer field is valid only when URG=1.
- ACK: indicates whether the preceding acknowledgment number field is valid. If ACK=1, it is valid. The preceding acknowledgment field is valid only if ACK=1. TCP specifies that after a connection is established, the ACK must be 1.
- PSH (Push) : indicates whether to Push data to the upper layer immediately after receiving the packet. A value of 1 indicates that the data should be submitted to the upper layer immediately rather than cached.
- RST: indicates whether to reset the connection. If the RST is 1, a serious error (such as a host crash) has occurred in the TCP connection and the connection must be released and re-established.
- SYN: used when a connection is established to synchronize the serial number. When SYN=1 and ACK=0, it indicates that the packet segment is a request for establishing a connection. When SYN=1 and ACK=1, the peer agrees to establish a connection. When SYN=1, it indicates that the packet is a request to establish a connection or a consent to establish a connection. SYN is 1 only in the first two handshakes.
- FIN: Indicates whether data is sent successfully. If FIN=1, the data has been sent and the connection can be released.
- Window Size: 16 bits. It represents how many bytes of data can be received from the Ack Number and how much space is left in the receive window at the current receiver. This field can be used for TCP traffic control.
- TCP Checksum: 16 bits. It is used to confirm whether the transmitted data is corrupted. The sender verifies a value based on the data content, and the receiver verifies a value based on the received data. The two values must be the same for the data to be valid. If the two values are different, the packet is dropped. The Checksum is calculated based on the pseudo header + TCP header + TCP data.
- Urgent Pointer: has 16 bits. It is meaningful only when the current URG control bit is 1. It indicates the number of bytes in the data segment that are critical data. When all the emergency data has been processed, TCP tells the application to return to normal operation. Emergency data can be sent even if the current window size is 0, because emergency data does not need to be cached.
- Optional: the length varies, but the length must be an integer multiple of 32bits.
Three-way handshake
When you think of TCP, you instinctively think of the three-way handshake.
TCP is a connection-oriented protocol. Each request needs to be confirmed by the other party. Before communication between the TCP client and the TCP server, a three-way handshake is required.
Pictured above,
- The first handshake
- Client initiation
- Syn = 1 tells the server the synchronization sequence number and SEQ is equal to the initial sequence value
- The system enters the SYN_SENT state and waits for confirmation from the server
- Second handshake
- Server initiate
- When the server receives a message from the client, it first sends an acknowledgement to the client (ACK=x+1)
- A SYN packet (SEQ=y) is then sent to the client to establish the connection
- The SYN_RECV state is displayed
- The third handshake
- Client initiation
- (ACK=y+1, SEQ=x+1)
- After the packets are sent, the client and server enter the ESTABLISHED state and start data transmission
The communication between client and server is a data transmission process. Communication messages will be transmitted in the form of packets; To ensure sequentiality, each package has an ID. When a packet is lost, the TCP protocol resends the packet using the data resend function.
Four times to wave
When the data transmission between the client and the server is complete, the connection is terminated by four waves.
- First wave
- Client initiation
- The client randomly generates a serial number SEQ=x and sets FIN=1 to tell the server to disconnect
- Second wave
- Server initiate
- The server sends an ACK packet, indicating that it has received the disconnection request
- Randomly generate a sequence number SEQ=y. Since the response is sent by the client, add 1 to the sequence number x requested by the client to get ACK=x+1
- Third wave
- Server initiate
- This command is used to confirm that all data has been transferred to the client
- Randomly generate a sequence number SEQ=z again. The ACK is still x+1 because it is still responding to the TCP disconnect request sequence number SEQ=x from the client
- The fourth wave
- Client initiation
- Indicates that the TCP connection disconnection packet is received
- An ACK packet is sent to the server and the sequence number SEQ=x+1 is generated. Since the server is replying, the value of the ACK field is added to the sequence number SEQ=z sent by the server to disconnect the TCP connection, resulting in ACK=z+1
summary
- TCP is connection-oriented and UDP is connectionless
- UDP has a simple structure
- TCP is byte stream oriented and UDP is datagram based
- TCP ensures data correctness. UDP may cause packet loss
- TCP guarantees data order, while UDP does not
Domain name and IP address
The IP address
In a single LAN segment, computers can communicate with computers directly using MAC addresses provided by the network interface layer. However, if there is a route between computers in a routed network, it is difficult to transmit data using MAC addresses, because MAC addresses cannot represent hierarchies such as country, province, city, district, street, road, and number, nor can they introduce logical structures in the address space. Therefore, to carry out data transmission, the network must be organized using a logical, hierarchical addressing scheme, which is IP address.
An Internet Protocol Address (IP) is a unique IP Address. It is a unified Address format provided by the IP Protocol. IP addresses assign a logical address to each network and host on the Internet to mask differences in physical addresses.
An IP address consists of a network ID and a host ID
- Network ID: Identifies the network location of the host
- Host ID: Indicates the location of a host on the network
IP addresses are generally divided into five categories in the form of 32-bit binary:
- Class A: The first 8 bits represent the network ID and the last 24 bits represent the host ID. Generally for government use; The value ranges from 0.0.0.0 to 127.255.255.255
- Class B: The first 16 bits represent the network ID, and the last 16 bits represent the host ID. Allocated to medium-sized enterprises; The value ranges from 128.0.0.0 to 191.255.255.255
- Class C: The first 24 bits represent the network ID, and the last 8 bits represent the host ID. Allocated for personal use; Range: 192.0.0.0 to 223.255.255.255
- Class D: Does not distinguish between network IDS and host ids. For multicast; The value ranges from 224.0.0.0 to 239.255.255.255
- Class E: Does not distinguish between network IDS and host ids. Used for experiments; Ranges from 240.0.0.0 to 255.255.255.254
Data is transmitted in the network by identifying the NETWORK ID in the IP address, so as to send the data to the correct network; If there is a subnet (divide the host ID into subnet ID and host ID), find the subnet in the network, and then send the data to the target host based on the host ID.
The domain name
A Domain Name, also known as a network Domain, is a string of names separated by dots. It represents the Name of a computer or computer group on the Internet. It is used to identify the location of the computer (and sometimes refers to the geographical location) during data transmission.
Computers can be given both IP addresses and host names and domain names. Users typically access each other’s computers using a host name or domain name, rather than directly through an IP address. Because a combination of letters and numbers to specify a computer name is more consistent with human memory than a set of pure numbers for an IP address
A domain name consists of several parts, each part of which is passed by. Connection, for example, www.melonfield.club. The end of the domain name sometimes has a point, which is reserved for the root node, but we usually omit it when we write it, and the software will fill it in when we query it.
A domain name can be divided into the root domain, top-level domain (level-1 domain), and subdomain (level-2 domain and level-3 domain).
- Root domain: the “. “at the end of the domain name. , the software usually fills in by itself. Currently there are 13 root domain servers (not 13, but 13 IP addresses).
- Top-level domain: The highest level of domain names. Each domain ends in a top-level domain, such as club in the above example. Each domain has a DNS server, also known as an authoritative DNS server
- Subdomain name: The left side of the top-level domain is the second-level domain name, and the left side of the second-level domain name is the third-level domain name
Domain name resolution
With a domain name, how to contact the domain name with the IP address?
In this case, the domain name must be resolved. The domain name server (DNS server) usually resolves the domain name.
Domain name resolution is also called domain name pointing, server setting, domain name configuration, and reverse IP address registration.
- Forward lookup: The domain name is resolved into an IP address.
- Reverse resolution: Translates an IP address into a domain name.
Let’s take melonfield.club as an example. Now I need to visit this site and my little computer needs to find the IP address for this domain name
- My little computer will check itself first
- Check whether the IP address of the domain name exists in the hosts file
- If not, the local DNS resolver cache is looked up
- If not, the next step is to connect to the local DNS server (the preferred DNS server set in the TCP/ IP parameters of my small computer)
- Authority resolution: Specifies the domain name to be queried, which is included in the local configuration area resources
- Non-authoritative resolution: The domain name to be queried cannot be resolved by the local DNS server, but the domain name exists in the cache
- If you can’t find it on your local DNS server, you have two options
- Use the forwarder to query the IP address to the next-level DNS server
- Query the 13 root DNS servers
- If no, the local DNS server queries the latest root DNS server
- The root DNS server returns the IP address of a server that manages the top-level domain (.club) of this domain
- The local DNS server queries the server in the club domain
- The.club domain server is also unable to resolve and returns the address of the next-level domain server (melonfield.club) in the.club domain
- The local DOMAIN name server queries the melonfield.club domain server
- At this point, we found the IP address of melonfield.club domain name
The queries between my little computer and the local DNS server are recursive queries, while the interactive queries between the DNS servers are iterative queries.
Let me just quote one more diagram to understand, this is probably the most detailed diagram
Domain name record
You know how domain name resolution generally works, but how does the domain name you just registered point to your own host? This step requires us to proactively tell the DNS database which IP address this domain name is pointing to.
The DNS database contains Resource Records. Each Resource record consists of a set of fields:
- Name – A label indicating the name or owner of the record. This field can be the root domain name (denoted by @) or a subdomain name (for example, WWW).
- Type – The type of the record. For example, A (address) record.
- TTL – (Lifetime) The frequency at which a copy of a record stored in the cache (local storage space) must be updated (extracted from the original storage space) or discarded. The shorter the lifetime, the more frequently the record is extracted (the result is slower access, but newer data). The longer the persist, the less frequently the record is extracted (the result is faster access, but older data). The default value is 1 hour.
- Data: Recorded data, which varies depending on the type of record. For example, the data recorded by A is the IP address of the host. This is the data returned during the DNS search
Common record types are as follows:
- A: Indicates the IP address of the host. Data value: IP address (IPv4 address)
- CNAME: Points a domain name to another domain name, achieving the same access effect as the domain name pointed to. Data value: usually a domain name provided by the host service provider
- MX: Points to a mail server, which is used by the email system to locate the mail server according to the address suffix of the recipient when sending emails; Data value: records of email interactions
- TXT: text records, usually verification records are used; Data value: any text
- NS: Specifies the DNS server that resolves the domain name. Data value: DNS server
- AAAA: Similar to record A, except that AAAA refers to an IPv6 address. Data value: IP address (IPv6 address)
- SPF: as a part of the SPF protocol, SPF data is stored in TXT as a temporary practice. Data value: SPF record
- SRV: SRV records which computer provides which service. Data format: Name of the service. Protocol type (for example, _example-server._tcp)
- CAA: Controls the issue of SSL certificates with a single domain name and can also control wildcard certificates. Data format: CAA Certification body Restriction mark Certificate attribute label Certificate authority, policy violation report email address, etc
- Explicit URL forwarding: When a user accesses a domain name, the user automatically redirects to the destination address. Go to www.melonfield.club and jump to www.google.com. The address bar displays www.google.com. Data value: HTTP (s) protocol address
- Hidden URL forwarding: Similar to explicit URL forwarding, except that www.melonfield.club is still displayed in the address bar after you jump to www.google.com. Data value: HTTP (s) protocol address
Fourth, the CDN
The full name of the CDN is Content Delivery Network. A distributed network composed of edge node server groups distributed in different regions is established and covered on the bearer network. The purpose is to increase a new layer of network architecture in the existing Internet, the content of the website is published to the closest to the user’s network “edge”, so that users can get the required content nearby, improve the response speed of users to visit the website.
Leading technology
The implementation of CDN depends on many technologies, including load balancing technology, dynamic content distribution and replication technology, and caching technology.
- Load balancing means to Balance and distribute the Load (work tasks, access requests) across multiple units of operations (servers, components) for execution. It is the ultimate solution for high performance, single point of failure (high availability), scalability (horizontal scaling). To solve the single point of failure, the same application can be deployed on multiple machines in redundancy mode. To solve the problem of unified access, you can add a load balancing device in front of the cluster to distribute traffic. It can be divided into DNS load balancing, HTTP load balancing, IP load balancing, link layer load balancing and so on.
- Dynamic content distribution and replication technology simply says that most of the static pages, images, streaming media data distribution and replication to nodes in various places
- Caching technology is to improve the user access speed; This includes local caches and ISP caches
The working principle of
The CDN is usually recorded using the CNAME and is provided by the CDN service provider.
Pictured above,
As shown in the figure, a user accesses a site that uses the CDN service. Assume that the domain name set by www.a.com is CNAME and the value is www.a.tbcdn.com. Here we simplify the domain name resolution process
- When a user visits the www.a.com site, assuming that no domain name record is found on the local computer, the user sends a request to the local DOMAIN name server (LDNS), which is located here
- The local DNS server does not find any records of the domain name, so it requests the authorized DNS server to obtain www.a.tbcdn.com (CNAME) and returns it to the local DNS server
- The local DNS server requests the DNS dispatching system for the IP address of www.a.tbcdn.com. Then the dispatching system sends the IP address of the nearest node (the Beijing node device) to the user
- The local DNS server returns the IP address of the node to the user, and the user requests the IP address of the node.
- If the node does not have data, the source site is asked again
- If the node has data, it is returned directly to the user
5. Communication data forwarding
Whether it is a proxy, gateway or tunnel, the function is to forward the request to the next station server on the communication line; At the same time, the server can send back the response to the client.
The agent
A Proxy, also known as a network Proxy, is a special network service that allows one network terminal (usually a client) to connect indirectly to another network terminal (usually a server). The computer system or other type of network terminal that provides Proxy service is called Proxy Server.
The basic behavior of a proxy server is to receive requests from clients and forward them to other servers. The agent does not change the request URI and does not send the request directly to the target server that holds the resource in front of it. Some proxy protocols allow the proxy server to change the client’s original request or the target server’s original response. The proxy server allows users to enter the proxy address to mask their original network activity and can bypass Internet filtering to successfully access the target server.
Using an agent has the following functions:
- Improve access speed: Usually the proxy server has a large buffer, when there is external information through, but also saved to the buffer, when other users access the same information, directly from the buffer to take out the information to the user.
- Control the access to internal resources: such as a university FTP (provided that the proxy address is within the scope of the resources allowed to access), use the free proxy server within the address segment of the education network, you can be used for all kinds of FTP download and upload open to the education network, and all kinds of data query and sharing services.
- Filter content: such as restricting access to specific computers, translating data from one language into another, or defending against aggressive access from both sides of the proxy server.
- Hide real IP: You can use the proxy server to hide your IP address from attacks. A more secure approach is to create a chain of agents using a specific tool (e.g., Tor).
- Bypass their own IP access restrictions, access to foreign sites. For example, users of the Education network and 169 network can access foreign websites through proxies.
- Bypass content filtering restrictions to access filtered sites. (em… Can’t go into details)
Proxies can be further subdivided into caching proxies and transparent proxies
- Caching Proxy Before forwarding a response, the Caching Proxy saves a copy (cache) of the resource to the Proxy server. When the proxy receives another request for the same resource, instead of getting the resource from the source server, it returns the previously cached resource as a response.
- Transparent Proxy A Transparent Proxy is a Proxy that does not process the request or response. Conversely, an agent that processes the message content is called an opaque agent.
The gateway
In a computer network, a Gateway (English: Gateway) is a server that forwards communication data from other servers. When receiving a request from a client, it processes the request as if it were the source server with its own resources. Sometimes the client may not even realize that it is communicating to a gateway.
For historical reasons, the definition of a gateway in modern networking terms differs from that in traditional TCP/IP terms
- Traditional: In the host (also known as the end system), packets need to go through TCP/IP layer 4 protocol processing, but in the gateway (also known as the intermediate system, intermediate system) only need to reach the Internet Layer (Internet layer), after determining the path can be forwarded. A gateway is a router.
- Modern: A gateway moves data between different protocols, while a router moves data between different networks and is the traditional equivalent of an IP gateway.
Gateways work much like proxies. Gateways enable servers on the communications line to provide non-HTTP services.
The communication line between the client and the gateway can be encrypted to secure the connection. For example, a gateway can connect to a database and query the data using SQL statements. In addition, when the credit card settlement is carried out on the Web shopping site, the gateway can be linked with the credit card settlement system.
The tunnel
Tunnels, also known as Tunneling Protocol, use a transmission Protocol to encapsulate a different Tunneling Protocol in the load part. To be able to transfer data over incompatible networks and to provide a secure path over insecure networks.
A tunnel is an application that transfers between a client and a server that are far apart and keeps the communication connection between them. The tunnel itself does not resolve the request. That is, the request remains unchanged and is forwarded to a later server. The tunnel itself is transparent and the client does not care about the existence of the tunnel.
Function:
- Firewall circumventing: a protocol blocked by the firewall can be encapsulated in another protocol that is not blocked by the firewall
- Encrypted communication: Encrypting plaintext network traffic for secure transmission over the Internet. A tunnel creates a line of communication with other servers as required, using encryption such as SSL.
A firewall
Before we get started, a quick look at the firewall.
The basic function of a firewall is to isolate a network. The firewall divides a network into zones (usually called zones) and works out access control policies between zones to control the data flow between zones with different trust levels.
There are two types of firewalls at the network layer and the application layer. Some firewalls operate at both the network layer and the application layer.
- Network layer firewall: works at the network layer and filters packets. Only the packets that meet the preset rules are allowed to pass through. Newer firewalls filter packets based on attributes such as IP address, port number, and type of service (HTTP or FTP). Can also be transmitted via communication protocol, TTL value, source domain name or network segment… Etc.
- Application-layer firewall: Operates at the application layer and can monitor network packets and block incoming and outgoing packets that do not conform to the rules set by the firewall, as well as system calls.
To sum up, filtering is implemented roughly by judging various attributes of packets, including IP, port, DNS, and so on.
conclusion
Because I am not a net worker from, if there is improper expression in the article, please correct the big guy. At the same time, the article is posted in the personal public number of 10,000 words long, to master the necessary network knowledge (the first part), welcome to MelonField
Reference:
- Blog.csdn.net/qq_38560742…
- Blog.csdn.net/zhang622328…
- Zh.wikipedia.org/wiki/%E5%9F…
- Zh.wikipedia.org/wiki/DNS%E8…
- Support.google.com/domains/ans…
- www.jianshu.com/p/215b55751…
- www.zhihu.com/question/36…
- Zh.wikipedia.org/wiki/%E4%BB…
- Zh.wikipedia.org/wiki/%E7%BD…
- Zh.wikipedia.org/wiki/%E9%98…