For more exciting articles, please pay attention to the author’s wechat official number: Code worker notes
background
Between 23:50 on October 4 and 5:20 on October 5 (15:50 to 21:20 UTC), all Facebook websites and applications, including Facebook, Instagram and whatsapp, were down.
After a few hours of work to restore the system, Facebook and Cloudflare explained and analyzed the cause of the accident from both internal and external perspectives.
This is a good opportunity for us to learn/review the basics of the Internet.
The cause of the accident
According to the official technology blog of FB [1], the direct causes of the accident are as follows:
-
An engineer ran a script to evaluate the capacity availability of FB internal backbone network, which caused all devices on the backbone network to be disconnected, that is, all servers in FB data center were disconnected from each other and from the external Internet.
-
Meanwhile, the official DNS server of FB is independently maintained and has its fixed IP address. It broadcasts its IP segment through BGP so that external networks can know about it and send relevant requests to it. There is a fault detection logic inside its DNS server: if it finds it cannot connect to FB’s data center, it stops broadcasting its BGP.
When all data centers are disconnected, the DNS server of FB finds that it cannot connect to the data center successfully, and then thinks that there is something wrong with its own network. Therefore, the DNS server of FB proactively stops BGP broadcasting to its IP address segment, so that the external world cannot connect to the DNS server of FB.
-
Due to the failure of DNS, some internal diagnostic tools of FB could not be used, so engineers had to be sent to the machine room to solve the problem, which further prolonged the downtime.
Next, we will study and review DNS and BGP.
The DNS
DNS (domain name resolution server) is used to convert domain names to IP addresses. You can use the dig command to query information about domain names.
The command for querying the domain name information of facebook.com is as follows. For details about the returned fields, see the comment (//) :
$ dig facebook.com +noall +answer +stat
; <<>> DiG 9.106. <<>> facebook.com +noall +answer +stat
;; global options: +cmd
// The fields are described as follows:
// TTL (TTL) Network TYPE VALUE
facebook.com. 45 IN A 162.1252.6.
// Note: TYPE A (Address) indicates an Address.
;; Query time: 5 msec
// 53 is the UDP port number used by DNS
;; SERVER: 192.1683.1.#53(192.1683.1.);; WHEN: Tue Oct12 01:50:58 CST 2021
;; MSG SIZE rcvd: 46
Copy the code
Dig defaults to using the DNS server configured in the /etc/resolve.conf file:
$cat/etc/resolv. Conf nameserver fe80: : 960 e: 6 BFF: fea9:479 a nameserver 192.168.3.1Copy the code
You can also specify the DNS server explicitly on the command line:
// specify $dig @8.8.8.8 facebook.com as the domain name serverCopy the code
However, servers like 8.8.8.8 (Google) do not know the corresponding relationship between all domain names and IP addresses. In fact, hundreds of millions of domain names are distributed on countless DNS servers around the world. We can use dig ns to see which servers maintain facebook.com domain name information.
$ dig facebook.com ns +noall +answer
; <<>> DiG 9.106. <<>> facebook.com ns +noall +answer
;; global options: +cmd
//NS stands for Name Server, and the last field stands for DNS Server address
facebook.com. 103522 IN NS a.ns.facebook.com.
facebook.com. 103522 IN NS b.ns.facebook.com.
facebook.com. 103522 IN NS c.ns.facebook.com.
facebook.com. 103522 IN NS d.ns.facebook.com.
Copy the code
As we know above, DIG will request domain name server 192.168.3.1 for facebook.com domain name information by default, while a.ns.facebook.com and other servers save the specific domain name information of Facebook.com. So how did 192.168.3.1 get in touch with A.ns.facebook.com?
We can use the dig command’s +trace option to print out the details of domain resolution:
$ dig +trace facebook.com
Copy the code
The output results are segmented as follows:
; <<>> DiG 9.10.6 <<>> +trace facebook.com; global options: +cmd . 275678 IN NS f.root-servers.net. . 275678 IN NS d.root-servers.net. . 275678 IN NS e.root-servers.net. . 275678 IN NS b.root-servers.net. . 275678 IN NS c.root-servers.net. . 275678 IN NS i.root-servers.net. . 275678 IN NS l.root-servers.net. . 275678 IN NS j.root-servers.net. . 275678 IN NS m.root-servers.net. . 275678 IN NS k.root-servers.net. . 275678 IN NS a.root-servers.net. . 275678 IN NS h.root-servers.net. . 275678 IN NS g.root-servers.net. ;; Received 228 bytes from 192.168.1.1#53(192.168.1.1) in 7 msCopy the code
Note that 192.168.1.1 is the Unicom router on the current LAN, which internally records the addresses of 13 global root domain (“.”) servers.
Observant readers will notice that the 192.168.3.1 specified in resolve.conf is 192.168.1.1. This is because I have two routers here, and the WIFI used by the computer is provided by a Huawei router (192.168.3.1), which is connected to the China Unicom router (192.168.1.1), and its DNS is specified as 192.168.1.1 in the configuration page of the Huawei router.
The next step is to request the.com top-level DNS address from the 13 global root DNS servers, whichever comes first.
com. 172800 IN NS a.gtld-servers.net. com. 172800 IN NS b.gtld-servers.net. com. 172800 IN NS c.gtld-servers.net. com. 172800 IN NS d.gtld-servers.net. com. 172800 IN NS e.gtld-servers.net. com. 172800 IN NS f.gtld-servers.net. com. 172800 IN NS g.gtld-servers.net. com. 172800 IN NS h.gtld-servers.net. com. 172800 IN NS i.gtld-servers.net. com. 172800 IN NS j.gtld-servers.net. com. 172800 IN NS k.gtld-servers.net. com. 172800 IN NS l.gtld-servers.net. com. 172800 IN NS m.gtld-servers.net. com. 86400 IN DS 30909 8 2 E2D3C916F6DEEAC73294E8268FB5885044A833FC5459588F4A9184CF C41A5766 com. 86400 IN RRSIG DS 8 1 86400 20211022050000 20211009040000 14748 . fcKx2jK2VQRHTjWXC3GXgRMnnDDdOFse96oeGZzPK6nrNc5iGCsUs7kB t4uKF03f5cepLSHEl+BfzhLNk/RiMlm5yR85NuiktsusWrmYMfwIqcOO UAZesk6HfVMxpk4Wl7bkT7gqWA9B4dwTjorzSJWHHaxm6PL6tBqUbD2p mFVARK7R5l4qyIXDtxtpXUCSCS6gRE8MKhxNRv11GwUU8DZju+KH9s+B BXCLoX9H12p/iemkvpU9VPCZUSmaLjZdCbS0TEEWoXofGI0lkOYAF1mt oj420RygKS9kJSBud/U9jbUPa67z0rVrfAMEZdKpLpOFRvgnp1iAmJ13 JkUFYw== ;; Received 1172 bytes from 199.7.83.42#53(l.root-servers.net) in 7 msCopy the code
Here l.root-servers.net replies first, returning the domain names and addresses of 13.com top-level domain servers.
Let’s continue the request to these.com top-level domain servers for facebook.com secondary domain server addresses
facebook.com. 172800 IN NS a.ns.facebook.com.
facebook.com. 172800 IN NS b.ns.facebook.com.
facebook.com. 172800 IN NS c.ns.facebook.com.
facebook.com. 172800 IN NS d.ns.facebook.com.
CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN NSEC3 1 1 0 - CK0Q1GIN43N1ARRC9OSM6QPQR81H5M9A NS SOA RRSIG DNSKEY NSEC3PARAM
CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN RRSIG NSEC3 8 2 86400 20211015042328 20211008031328 39343 com. H1oBuRZXju7c+fi2/am00pD0N4j8e+g/Q+qDd5/NHtSA+OhdRG2BcmXk m1cjsA50akyzYQmKzfAE/1msrzkiULmh23LjZ3P53/mLUBtDIVOXkN2F 5nuSGSpv7Ngvn6vGsHXukdZpG97b9lSgqnv1FgRPWedEVjOdD0FMJBbc XikWESWI5Ue0vBj3oxfS24gww4tLXSOHjtLjGapaphWZ1g==
I28G6CI6H7LILO18C929Q5HCLS95D2FC.com. 86400 IN NSEC3 1 1 0 - I28GTDKPVCUQGJKUK7EP5QM6TTI5TO3A NS DS RRSIG
I28G6CI6H7LILO18C929Q5HCLS95D2FC.com. 86400 IN RRSIG NSEC3 8 2 86400 20211014044832 20211007033832 39343 com. D3APaNt+ZOxCHupj+tyNPTUrtS+Kq1d2tI2ZJiI9R7WREYO+I56zzvhh zNkttftMmUekNVCGHILNww/ekuabwLcRKyg7Bs/YowpcXr3LgB0UZDVB bj4GtUMC8s52nuPU6pIH6iHgZ04cq9E3MCCep9H9RbX3dYmUXDLVBWcJ DoAY1vEtdCZKVKjYvGr3dNYF0JpUiKkqBvtTJwpA34W1lA==
;; Received 833 bytes from 2001:503:d2d::30#53(k.gtld-servers.net) in 179 ms
Copy the code
The first reply was k.gtld-servers.net, which returned to Facebook’s four DNS servers:
- a.ns.facebook.com
- b.ns.facebook.com
- c.ns.facebook.com
- d.ns.facebook.com
Also, as you can see, the TTL for facebook.com is 172800, or 48 hours (we’ll use that later).
The following requests are made to each of the four DNS servers for A records (i.e. IP addresses) :
Facebook.com. 105 IN A 157.240.2.50; Received 46 bytes from 2001::67fc:720b#53(c.ns.facebook.com) in 11 msCopy the code
The first reply was to c.ns.facebook.com, which gave the FACEbook.com IP address 157.240.2.50.
Note: At the time of this accident, the 4 DNS servers all cancelled THE BGP broadcast, so they could not be connected. As reflected in the DIG command, the current step would timeout and fail. The following error message was obtained by the CloudFare engineers using DIG (SERVFAIL) :
➜ ~ dig @1.1.1.1 facebook.com; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 31322 ; Facebook.com. IN A ➜ ~ dig @8.8.8.8 facebook.com; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 31322 ; facebook.com. IN ACopy the code
DNS cache
To reduce the pressure on the DNS root server, and because the mapping between domain names and IP addresses does not change very often, each DNS server can cache the records of domain name A that has been previously requested. Based on this, for a particular domain name, DNS servers can be divided into two categories:
- Authoritative Name Server (AA), the Server that is responsible for maintaining the Authoritative Name information
- Cache domain name server, cache only domain name information
If there is no caching during a local or recursive lookup, it ends up getting the domain name information from an authoritative DNS server. If there is a cache and the information on the authoritative DNS server is updated, expired information may be retrieved. To obtain the latest information, you can use the @ authoritative server address to forcibly obtain the information from the authoritative server. Otherwise, you can only wait for the TTL of cached records to expire.
Let’s take a look at how Amazon.com’s domain name server is set up:
$ dig ns amazon.com +noall +answer ; <<>> DiG 9.10.6 <<>> NS Amazon.com +noall +answer; global options: +cmd amazon.com. 800 IN NS pdns1.ultradns.net. amazon.com. 800 IN NS ns4.p31.dynect.net. amazon.com. 800 IN NS ns3.p31.dynect.net. amazon.com. 800 IN NS ns2.p31.dynect.net. amazon.com. 800 IN NS ns1.p31.dynect.net. amazon.com. 800 IN NS pdns6.ultradns.co.uk.Copy the code
As you can see, amazon interestingly uses two external DNS providers to maintain its domain name information in order to ensure stability.
Some students may wonder why, when FB DNS fails, they can not directly switch to an external DNS service provider to temporarily provide domain name conversion.
The reason is that facebook.com’s domain name server is maintained by a top-level domain name server (TLD), which is usually set to expire in 48 hours. This means that even if an external DNS provider is temporarily changed, some users need two days to connect to Facebook, which is not practical.
BGP
BGP has been mentioned several times. Here we introduce BGP.
To understand BGP, we first need to understand autonomous systems (AS).
Autonomous System (AS)
An Autonomous System is an IP network that has the same routing policy under the jurisdiction of one entity.
-
Each AS in the BGP network is assigned a unique AS number
-
The 2-byte and 4-byte AS numbers range from 1-65535 and 1-4294967295, respectively. The 4-byte AS numbers are compatible with 2-byte AS numbers
-
Facebook.com: AS32934
-
-
Some websites provide services to query AS numbers by domain name [8]
Generally speaking, large companies may apply for one or more separate AS numbers.
Routing protocols are required to realize interconnection among nodes on the network. Routing protocols can be classified into the following two types according to the scope of application:
-
IGP: used for routing protocols within ass, such AS RIP, OSPF, and IS-IS
-
EGP: used for routing between ass, mainly BGP
BGP protocol
The Border Gateway Protocol (BGP) is a routing Protocol of AS running on TCP. It is used to exchange routing information between ass.
BGP features:
-
TCP (port 179) is used as the transport protocol. BGP routers establish sessions based on TCP, and BGP peers do not need to be directly connected
-
The ROUTER running BGP is called a BGP Speaker. To exchange BGP routes, two BGP routers need to establish a peer relationship. There are two types of BGP peer relationships: EBGP and IBGP:
-
EBGP: BGP that runs between ass
-
IBGP: BGP used between routers in an AS
-
-
After the BGP peer relationship is established, only incremental updates or trigger updates are sent (not periodic updates).
-
BGP has rich path attributes and powerful policy tools
-
BGP can carry a large number of route prefixes and is used in large-scale networks
Take FB AS an example, it has millions of servers inside, forming a huge internal network (AS) with its own routing management mechanism. When the client application accesses www.facebook.com server inside FB through IP address 128.242.240.20, these IP packets are first sent to the public network router through the user’s ISP access point. The public network router needs to follow the routing algorithm, Find the shortest path to FB internal server. Because FB is an autonomous network (AS) inside, the “last kilometer” of routing can only be completed by FB router, that is, the public network routing only needs to send packets to FB router, and FB router is responsible for the final internal routing task.
So how does the public network router find its way to the FB router?
The BGP protocol is designed for this purpose. According to BGP, the BGP router of FB can send corresponding UPDATE messages to its peers (public network routers), telling them which IP segments they are responsible for. This information will be transmitted recursively to other peers of these routers, and so on. All public network routers can know the path (AS_Path) of one or more routers to FB. When these public network routers receive packets destined for these IP segments, they will forward the packets to FB routers along the optimal path.
We can log in to the simulated router like Route-views.isc.routeviews.org with Telnet to observe the specific BGP configuration information [12] :
$Telnet route-views.isc.routeviews.org Trying 149.20.4.24... Connected to route-views.isc.routeviews.org. Escape character is '^]'. Hello, This is FRRouting (version 7.3-RV). Copyright 1996-2005 Kunihiro Ishiguro, Et al. route-views.isc.routeviews.org> show IP BGP 31.13.80.36 BGP routing table entry for 31.13.80.0/24 Paths: (6 available, best #4, Table default) Not advertised to any peer 30286 3356 1299 32934 198.32.176.142 from 198.32.176.142 (10.2.1.2) Origin IGP, valid, external Community: 3356:3 3356:22 3356:86 3356:575 3356:666 3356:901 3356:2011 Last update: Sun Sep 26 12:53:14 2021 199524 1299 32934 198.32.176.226 FROM 198.32.176.226 (10.255.65.68) Origin IGP, valid, external Last update: Thu Jul 22 07:09:01 2021 19151 174 32934 198.32.176.164 From 198.32.176.164 (66.186.193.17) Origin IGP, metric 0 valid, external Last update: Fri Aug 6 07:29:28 2021 6939 32934 198.32.176.20 from 198.32.176.20 (216.218.252.165) Origin IGP, Valid, External, best (Older Path) Last update: Tue Jun 29 12:08:27 2021 7575 2914 174 32934 198.32.176.177 FROM 198.32.176.177 (202.158.215.120) Origin IGP, valid, external Community: 7575:1003 7575:2520 7575:6003 Last update: Tue Jun 29 14:29:41 2021 36351 32934 198.32.176.207 FROM 198.32.176.207 (173.192.18.26) Origin IGP, valid, external Community: 36351:202 65501:140 65523:200 Last update: Sat Aug 28 10:27:27 2021Copy the code
31.13.80.36 is a server IP address of Facebook. It can be seen from the above information that there are 6 paths from the current router to Facebook’S AS32934, among which the fourth is the best path:
-
30286 3356 1299 32934
-
199524 1299 32934
-
19151 174 32934
-
6939 32934 (Best)
-
7575 2914 174 32934
-
36351, 32934,
When the IP packet arrives at the router, it will be forwarded along the best route until it reaches FB’s AS32934.
Rules for BGP route Advertisement:
-
When there are multiple paths to the same destination, the BGP router selects only the optimal route (when load balancing is not enabled).
-
BGP transmits only the routes it uses (that is, the routes it considers to be optimal) to its peers
-
A BGP router advertises routes obtained from EBGP peers to all its BGP peers (including EBGP and IBGP).
-
BGP routers do not advertise routes learned from IBGP peers to their IBGP peers (except when route reflectors exist).
-
Whether a BGP router advertises routes learned from IBGP peers to its EBGP peers depends on the IGP and BGP synchronization
-
When routes are updated, the BGP device sends only the updated BGP routes
Type of BGP packets:
-
OPEN: indicates the first packet sent after a TCP connection is established
-
UPDATE: Exchanges routing information between peers (advertise reachable routes or revoke unreachable routes).
-
NOTIFICATION: BGP sends this message to its peer when detecting an error status. The BGP connection is immediately interrupted
-
KEEPALIVE: Maintains the peer relationship
-
Route-refresh: indicates that the peer is required to send the routing information of the specified address family when the routing policy changes
According to Cloudflare’s analysis report, when the FB failure occurred, the BGP router on the FB side sent a lot of UPDATE messages, a large part of which were used to cancel routes [2].
Note: BGPlay website [11] allows you to view historical and real-time changes of SPECIFIC BGP messages with a specified ASN or IP segment.
AS can be seen from the figure above, AS is normally connected at 2021-10-04 13:51, and AS32934 of FB is connected with many other AS.
But after 16:12, when the first spike had passed, 32934 was disconnected from all the other ass, and All of Facebook’s servers were disconnected.
Then after 22:00, connections with other AS were restored, and at this time FB service had been fully restored. The spikes at the time of the problem and at the time of the recovery are highlighted in red. These spikes represent the time during which AS32934 sent a large number of UPDATE messages.
The resources
- Engineering.fb.com/2021/10/05/…
- Blog.cloudflare.com/october-202…
- www.thousandeyes.com/blog/facebo…
- Baijiahao.baidu.com/s?id=160398…
- zhuanlan.zhihu.com/p/25433049
- www.jianshu.com/p/babca8224…
- zhuanlan.zhihu.com/p/51684918
- Query asN by domain name: bgp.he.net
- Whois query: mip.chinaz.com/Ip/IpWhois
- www.cnblogs.com/sikewang/p/…
- BGPlay:stat.ripe.net/widget/bgpl…
- JVNS. Ca/blog / 2021/1…