DNS is introduced

  • The Domain Name System (DNS) is used to convert Domain names into IP addresses upon network requests. It makes it easier for users to access the Internet without having to remember IP numbers that can be read directly by the machine.
  • The traditional PUBLIC DNS service based on UDP protocol is prone to DNS hijacking, which causes security problems.

DNS hijacking problem?

  • 1, for the Internet, domain name is the first hop to access, and this jump will often “slip” (especially the mobile network), such as access to the wrong content, failed connection, so that users in the Internet to swim freely instantly disappear;
  • 2, often appear domain name cache, hijacking, cross network
    • DNS cache problem The DNS cache refers to an old IP address, causing the user to fail to connect to the server

The mobile DNS converges to layer 7 NGINX for forwarding because the mobile DNS has a high resolution cost

Find the edge node by finding IP by domain name

Domain name Query Process

Recursive query

If the local DNS server queried by the host does not know the IP address of the domain name to be queried, the local DNS server sends query packets to other root DNS servers as DNS clients rather than allowing the host to perform further query on its own.

Iterative query

When the root DNS server receives an iterative query request packet from the local DNS server, it either gives the IP address to be queried or tells the local DNS server which DNS server to query next. Then let the local DNS server do subsequent queries instead of doing subsequent queries for the local DNS server.

The query between the Local DNS server and the upper-layer DNS server is recursive.

The DNS server and root DNS server are used for iterative query.

DNS issues

Local DNS hijacking: The Local DNS hijacks domain names to other domain names for ulterior purposes.

1. Records are random and each record attracts roughly the same number of requests; 2. SRV record weight (not supported by HTTP) 3, generally maintain the physical location of the mapping table, according to the “recent” original reply;

Domain name cache means that the LocalDNS caches the resolution results of service domain names and does not initiate recursion to the authoritative DNS.

  • Ensure that users’ access traffic is digested in the network: there are great differences in bandwidth resources, inter-network settlement fees, IDC room distribution and ICP resource distribution within the network among domestic Internet access operators. To ensure the access quality of users on the network and reduce cross-network settlement, carriers set up content caching servers on the network. By forcibly pointing domain names to the IP addresses of content caching servers, the local network traffic is completely retained locally.
  • Push ads: Some LocalDNS will replace some of the content cached by domain name resolution results with ads from third-party AD consortiums.

  • In addition to domain name caching, carrier’s LocalDNS also resolves and forwards domain names.
  • Resolution forwarding refers to the behavior that the carrier forwards domain name resolution requests to the recursive DNS of other carriers instead of performing recursive domain name resolution.

  • In order to save resources, some small operators directly forward resolution requests to the recursive LocalDNS of other operators.
  • As a result, the source IP address of the domain name resolution request received by the authoritative DNS is the IP address of another carrier. As a result, user traffic is directed to the wrong IDC and user access is slowed down.

  • The LocalDNS recursive egress NAT indicates that the carrier’s LocalDNS recurses according to the standard DNS protocol. However, because multiple egress exist on the network and target route NAT is configured, As a result, when the LocalDNS performs recursive resolution, the egress IP address may not be the local IP address.
  • As a result, the source IP address of the domain name resolution request received by the DNS is another carrier’s IP address. As a result, user traffic is diverted to the wrong IDC and user access slows down.

Highly available DNS design

Real-time monitoring + business promotion: this scheme has a long cycle. After all, it takes time to push operators to solve this problem through administrative means. In addition, through big data analysis, we come to the conclusion that the Top3 problematic users are all mobile Internet users. For this part of the user, we have what technical means can solve the above problems? Bypassing automatic DNS assignment and using 114DNS or Google Public DNS: How to construct domain name requests on the user side: It is not difficult for PC clients to construct a standard DNS request package. However, it is technically possible to send standard DNS requests to a specified LocalDNS on the mobile end, and it is compatible with various iOS and Android versions, but the cost of compatibility is high. Pushing users to change their configuration is extremely high: If you want to push users to manually change their DNS configuration on PCS, it’s barely feasible to do so under WiFi on PCS and mobile clients. However, it is very difficult for users to modify the DNS configuration in the mobile Internet environment.

HTTPDNS

If you want to use this kind of solution, first you have to get an accurate IP address library to determine the user’s attribution, and then develop a protocol to build a service to do scheduling, and then do scheduling changes to the access layer. This scheme is the same as the two schemes. It is not impossible, but the cost will be relatively high, especially for a company with such a large volume of business. The current mainstream solution: HTTPDNS appears!

The HTTPDNS uses HTTP to interact with DNS servers instead of the traditional UDP-based DNS, bypassing carriers’ Local DNS, preventing domain name hijacking, and improving domain name resolution efficiency. In addition, the DNS server obtains the real client IP address instead of the Local DNS IP address. In this way, the DNS server can accurately locate the client location and carrier information, effectively improving the scheduling accuracy.

DNS– > Edge node –> Central equipment room when multi-active traffic switchover to modify the edge node route configure layer 7 forwarding

HTTPS -> HTTP (token) -> NET/RPC, GRPC

HTTP DNS directly requests HTTP to obtain the record address of server A through IP, and does not ask the local carrier for domain resolution. Therefore, hijacking is avoided. Reduced average access latency: Domain resolution is eliminated because IP access is direct. Reduced user connection failure rate: The server with a high failure rate is sorted by algorithms. The server is sorted by the recently accessed data. The server is sorted by historical access success records

Radical resolution of domain name exceptions: Because the carrier’s LocalDNS is bypassed, users’ domain name resolution requests are directly and transparently transmitted to the IP address of the HTTPDNS server through HTTP. Users’ domain name resolution requests on clients are not affected by domain name resolution exceptions. Precise scheduling: HTTPDNS can directly obtain user IP, by combining the IP address library and speed measurement system, can ensure that users will guide access to the fastest IDC node; Low implementation cost: access to HTTPDNS services only need to do a little change to the client access layer, no user phone root or jailbreak; And because HTTP protocol request construction is very simple, compatible with all versions of mobile operating system is not a problem; In addition, the backend configuration of HTTPDNS completely reuse the existing authoritative DNS configuration, and the management cost is very low.

If there is only one VIP, you can increase the TTL of DNS records to reduce the parsing delay. Anycast can use an IP to route data to the nearest set of servers and declare this IP through BGP, but this has two problems: if a node is overloaded with users, BGP route calculation may lead to connection reset, so a “stable Anycast” technology is needed to achieve this.

CDN system architecture

User DNS– request local DNS– > DNS recursive query –> local DNS cache data –> users get the best IP address and then connect to the edge node –> connect to the central machine room

The caching proxy

Through the screening of intelligent DNS, the user’s request is transparently directed to the nearest provincial backbone node, minimizing the transmission distance of user information.

Routing to accelerate

Access nodes (A-Nodes) are connected to trunk nodes or multi-line nodes.

Security and protection

No matter in the face of penetration or DDoS attack, the target of the attack is mostly directed to the CDN, thus protecting the user source site.

Cost savings

The CDN node machine room only needs to be in the single-line machine room of local operators or in cities with relatively cheap bandwidth, and the purchase cost is low.

Content routing

DNS system, application layer redirection, and transport layer redirection.

Content distribution

  • PUSH: Active distribution. Initiated by the content management system, the content is distributed from the source to the Cache node of the CDN.
  • PULL: Passive distribution technology, driven by user request, miss in user request content, real-time acquisition of content from source or other CDN nodes.

Content store

Random read, sequential write, small file distributed storage.

Content management

Improve the efficiency of content service and cache utilization of CDN.

CDN data consistency

PUSH

There are no data consistency issues.

PULL

If the cache is not updated in time and data consistency is a problem, you can set the cache expiration time to achieve the final consistency. Can also be used if users have high requirements for consistency? The version=xx technique can also return the URL each time the image is uploaded in a different way instead of the version number. CDN stores resource copies with specified expiration time, so cached image files can be valid in an hour or a month. Any resource cached on the CDN is a potential historical version because there is always a delay in updating and transferring between the source data and the copy.

Expires

That is, specify the expiration time in the HTTP header (HTTP/1.0)

Cache Control

Max-age specifies an expiration time in seconds in the HTTP header, taking precedence over Expires(HTTP/1.1)

Last-Modified / If-Modified-Since

The last time the file was modified (accuracy in seconds, HTTP/1.0), requiring cache-control to expire.

Etag

The unique identifier of the current resource on the server (the generation rule is determined by the server) takes precedence over last-Modified

Static/dynamic CDN acceleration

CDN best practices

Live system

Business classification

Services are graded according to certain standards and core services are selected. Only the core scenarios of core services are designed to reduce the overall complexity and implementation cost. For example: 1. 2. Core scene; 3. Income; Avoid entering all businesses to all live, stage by stage by scene.

Data classification

After selecting the core business, it is necessary to further analyze the data related to the core business in order to identify all the data and data characteristics, which will affect the subsequent scheme design. Common data feature analysis dimensions are: 1. Data volume; 2. Uniqueness; 3. Real-time; 4. Losability; 5. Recoverability;

Data synchronization

After determining the characteristics of the data, we can design different synchronization schemes according to different data. Common data synchronization schemes are as follows: 1. Storage system synchronization. 2. Message queue synchronization; 3. Repeated generation;

Exception handling

No matter how the data synchronization scheme is designed, once extreme anomalies occur, there will always be some data anomalies. For example, synchronization delay, data loss, and data inconsistency. Exception handling assumes what the system will do to deal with these problems when they occur. Common exception handling measures: 1. Multi-channel synchronization; 2. Synchronous and asynchronous access 3. Log recording; 4. Compensation;

More live is not the whole system of business more live, but divided into different dimensions, different importance of more live, such as our business viewing experience (Taobao to transaction units, buyers as the dimension), then the first premise is browsing, watching more live. We divide resources into three categories:

  • Global resource: Resources shared by multiple zones are accessed by each Zone. However, in the Global layer, only one Core Zone is written. That is, single write + multi-read and data replication (one-way write Zone) is used to achieve the final consistency scheme.
  • Multi Zone resources: Multiple zones are deployed in fragments. Each Zone has part of the Shard data. For example, user A may be in ZoneA, and user B may be in ZoneB. Multi-write + multi-read, using data replication (write Zone bidirectional replication);
  • Single Zone Resources: single-server deployment services;

The core focuses on: PC/APP home page can be watched, video details page can be opened, account can be logged in and authentication can be carried out. We believe that the most suitable scenario for our viewing business is to adopt Global resource strategy, and the community (comments, bullet screen) may adopt Multi Zone strategy.

Don’t work when you’re hungry

There are three most important roles in a business process: the user, the merchant, and the rider. An order consists of three steps:

  • When users open our APP, the system will recommend a variety of delicacies near the user’s location. The recommendation sequence combines user habits, recommendation ranking, and promotion of merchants. Users find the food they like, place an order, pay for it, and the order goes to the merchant.
  • The merchant takes the order and starts making the food. When the food is finished, the system dispatches the rider to the store and picks up the food.
  • The rider delivers the food to the customer according to the delivery address.

Business cohesion

The booking process of a single order must be completed in one machine room and cannot be invoked across the machine room.

  • This principle is to ensure real-time performance. Only by not relying on the service of another machine room in order to ensure no delay.
  • We call each room an Ezone, and an Ezone contains all the services required by Ele. me.
  • A business can be clustered in one Ezone, so that the users, merchants and riders involved in an order can all be in the same machine room, so that the order can be transferred between different roles at the fastest speed, and there will be no delay due to various abnormal situations.
  • Coincidentally, our business is regional, and business cohesion can be achieved through reasonable regional division.

Availability first

  • When there is a fault switch in the machine room, the system is available first, so that users can place orders for meals, tolerate data inconsistency within a limited period of time, and repair afterwards.
  • Each ezone has a full amount of service data. When one ezone fails, other ezones can take over users.
  • Users’ orders placed in one ezone are copied to other Ezones in real time.

Make sure the data is correct

In case of availability, data protection is required to avoid errors. If the status of an order is found to be inconsistent between the two equipment rooms during switchover or failure, the order is locked to prevent changes to ensure data correctness.

Business knowledge

Because the infrastructure is not strong enough to erase the differences between equipment rooms, the service needs to be aware of the multi-live logic, and the service code needs to be modified, including: the service code needs to be able to identify the ownership of service data, process only the data in the local Ezone, and filter out irrelevant data. Improve the business state machine to detect and correct data inconsistencies.

In order to achieve business cohesion, we first need to select a Sharding Key to partition services so that users, merchants and riders can correctly converge into the same Ezone. The partitioning scheme is the foundation of the whole living, and it determines all the logic that follows.

According to hungry? Business characteristic, the choice of our natural geographical location (geographic fences, according to the provincial boundary the geographical fence division, combined with local fine-tuning) the division of the business unit, the geographical position close to the user, merchants, riders into the same ezone, such an order fulfillment of single process can be completed in a room, to ensure minimum delay, In case of problems in one machine room, users, merchants and riders can be packed and moved to another machine room according to geographical location.

Based on geographical location division rules, a unified API Router layer is developed. This layer is responsible for routing API calls from clients and directing traffic to the correct Ezone. API Router Deployed in multiple public cloud equipment rooms, users can access the API Router of the public cloud to improve access quality.

The most basic shunt label is the location. With the location, the AR can calculate the correct shard attribution. However, the business is very complex, and not all calls can be directly associated with a geographical location. We use a hierarchical routing scheme. The core routing logic is geographical location, but some other high-level Sharding keys are also supported. These Sharding keys are converted from APIRouter to core Sharding keys, as shown in the following figure. In this way, the workload of business transformation is reduced and more partitioning methods can be extended. In addition to inbound routing, we also developed an SOA Proxy for routing SOA calls based on the same routing rules as the API Router.

Ali live

Suning live

Facebook Memcache consistency

References

https://zhuanlan.zhihu.com/p/32009822
https://zhuanlan.zhihu.com/p/32587960
https://zhuanlan.zhihu.com/p/33430869
https://zhuanlan.zhihu.com/p/34958596
https://mp.weixin.qq.com/s/ooPLV039BAGBsiDZagWNHw
https://mp.weixin.qq.com/s/VPkQhJLl_ULwklP1sqF79g
https://mp.weixin.qq.com/s/ty5GltO9M648OXSWgLe_Sg
https://mp.weixin.qq.com/s/GdfYsuUajWP-OWo6lbmjVQ
https://developer.aliyun.com/article/57715
https://mp.weixin.qq.com/s/RQiurTi_pLkmIg_PSpZtvA
https://mp.weixin.qq.com/s/LCn71j3hgm5Ij5tHYe8uoA
http://afghl.github.io/2018/02/11/distributed-system-multi-datacenter-1.html
https://zhuanlan.zhihu.com/p/42150666
https://zhuanlan.zhihu.com/p/20827183
https://myslide.cn/slides/733
https://blog.csdn.net/u012422829/article/details/83718296
https://blog.csdn.net/u012422829/article/details/83932829
https://www.cnblogs.com/king0101/p/11908305.html
https://mp.weixin.qq.com/s/WK8N4xFxCoUvSpXOwCVIXw
https://mp.weixin.qq.com/s/jd9Os1OAyCXZ8rXw8ZIQmg
https://cloud.tencent.com/developer/article/1441455
https://mp.weixin.qq.com/s/RQiurTi_pLkmIg_PSpZtvA
https://help.aliyun.com/document_detail/72721.html
https://mp.weixin.qq.com/s/h_KWwzPzszrdGq5kcCudRA
https://www.cnblogs.com/davidwang456/articles/8192860.html
Copy the code