What is a data center
A data center is a facility that houses computer systems and related components, such as telecommunications and storage systems. To meet its own business requirements, ensure data stability and reliability. Whether it is the Internet or the traditional industry, there will be their own or large or small data centers, and even such as Ali Cloud and Amazon, specializing in renting computing resources of cloud computing companies, is the establishment of different data centers around the world. Despite the fact that in today’s era of cloud computing, the data center’s own resources are virtualized to achieve higher utilization, one thing is certain: the lack of physical resources determines the virtual resource ceiling. Physical network characteristics, such as bandwidth, MTU, and latency, ultimately directly or indirectly determine the characteristics of virtual networks. When optimizing network performance, some physical network features can be improved by upgrading devices or lines, but some are related to network architecture. The risks and costs associated with upgrading or modifying the network architecture are enormous. Therefore, the selection and design of the network architecture should be especially careful when setting up a data center. So, from the traditional data center in the past, to today’s cloud computing era data center, what has gone through the transition?
2. Traditional data center network architecture
In traditional large data centers, networks are typically three-tier. Cisco calls it the Hierarchical Inter-Networking Model. Three-layer network structure is a three-layer network with hierarchical architecture, which has three levels: core layer (high-speed switching backbone of the network), aggregation layer (policy-based connection) and access layer (connecting workstations to the network). This model is as follows:
- Access Layer: Access switches are usually located at the Top of the Rack, so they are also referred to as ToR (Top of Rack) switches, which physically connect to servers.
- Aggregation Layer: The Aggregation switch connects to the Access switch and provides other services, such as firewall, SSL offload, intrusion detection, network analysis, etc.
- Core Layer: Core switches provide high-speed forwarding of packets to and from the data center, connectivity for multiple aggregation layers, and a resilient L3 routing network for the entire network, typically. A diagram of a three-layer network architecture is shown below:
Generally, the aggregation switch is the demarcation point between L2 and L3. The lower aggregation switch is L2 and the higher one is L3. Each aggregation switch group manages a Point Of Delivery (POD), and each POD has an independent VLAN network. Migrating servers within pods does not require changing IP addresses and default gateways, because one POD corresponds to an L2 broadcast domain.
Aggregation switches and access switches are connected through Spanning Tree Protocol (STP). STP makes only one aggregation switch available for a VLAN network, and the other aggregation switches are used in the event of a failure (dotted line in figure above). In other words, the sink layer is an active-passive HA mode. In this way, horizontal scaling is not possible at the aggregation layer, because even if multiple aggregation layer switches are added, only one is still working. Some proprietary protocols, such as Cisco’s Virtual Port Channel (vPC), can improve the utilization of switches at the aggregation layer. However, on the one hand, this is a proprietary protocol. On the other hand, VPCS cannot fully scale horizontally. The following figure shows an aggregation layer that serves as L2/L3 boundary and adopts the vPC network architecture.
STP is a very important protocol in layer 2 network. There is a rather contradictory point in the second layer, which is the contradiction between reliability and security.
- Reliability refers to the layer-2 network that adopts device redundancy and link redundancy.
- Layer 2 switches are in the same broadcast domain. Broadcast packets may be transmitted repeatedly in loops, which may cause broadcast storms. Therefore, loops must be prevented. To achieve both, STP (Spanning tree Protocol) can be used for automatic control, that is, redundant devices and redundant links are backed up. In normal cases, the redundant ports and links are blocked. When the link fails, the redundant ports and links are opened.
Generally, the STP network size does not exceed 100 switches due to convergence performance. This mechanism of STP results in insufficient utilization of layer 2 links, especially when network devices have a fully connected topology. As shown in the figure below, when the STP layer-2 design is adopted, STP blocks most links, reducing the bandwidth between access and convergence to 1/4 and that between convergence and core to 1/8. The more switches close to the root, the more congested the ports are and the more bandwidth resources are wasted.
Third, the impact of the development of cloud computing on data centers
With the explosion of data brought by the development of the Internet and the development of virtualization technology, computing resources are pooled, posing new challenges to data centers: dynamic migration and high performance. Using large layer 2 network architecture, the whole data network can be L2 broadcast domain, so that dynamic migration can be realized. In the large Layer 2 network architecture, L2/L3 is separated by the core switch. Below the core switch, that is, the entire data center, is the L2 network (of course, it can contain multiple vlans, and the vlans are connected through the core switch as routes). The large Layer 2 network architecture is shown as follows:
Compared with the previous infrastructure, it has the following characteristics:
① Resource pooling: Hardware servers integrate some hardware resources using virtualization technology to create computing resource pooling
② Unified management: Create VMS on a virtualization platform and deploy services on the VMS to implement unified maintenance and management of VMS on the platform
③ Horizontal expansion — computing resources are not enough, you can directly supplement the hardware server to achieve resource expansion
However, the disadvantages of the traditional large layer 2 are also obvious. BUM (packets at layer 2 data link layer) storms caused by the shared L2 broadcast domain increase with the increase of the network scale and affect the normal network traffic. At the same time, VMS can be migrated, but how can users be unaware of the migration process and IP addresses remain unchanged? That is to achieve dynamic migration. The development of cloud computing technology not only relies on virtualization, but also has a very important virtualization management software platform, such as openstack, which virtualizes all network functions, computing functions, storage functions and security functions through the connection between x86 servers and layer 2 switches. All the functions of traditional data center hardware are realized in the form of virtual machines. All the components are integrated into a virtualization management software platform to provide virtual storage, network, computing and other resources externally. This is the so-called “hyper-converged” platform.
Challenges brought by data center traffic enrichment
The Internet has developed very fast in recent years. However, Internet companies are also data companies in essence. Data carries most of the value of the company, so data security and reliability are becoming more and more important. In the early days, small-scale data centers were dominated by north-south traffic, while data center virtualization brought about by the explosive data growth of the Internet also requires higher east-west traffic, and even cross-data center traffic.
- North-south traffic: traffic between clients outside the data center and the data center server, or traffic between the data center server and the Internet.
- East-west traffic: Traffic between servers in a data center.
- Cross-data center traffic: Cross-data center traffic, such as disaster recovery between data centers and communication between private and public clouds.
Cisco’s analysis predicts that by 2020, east-west traffic will account for 77% of total bandwidth, 9% across data centers, and north-south traffic will account for just 14% of total bandwidth. The traditional three-tier network architecture is mainly designed for north-south traffic, although it also supports east-west traffic, but its deficiencies are very obvious. East-west traffic is classified into L2 and L3. If it is L2 traffic, if the source and destination hosts are under the same access layer switch, then full speed can be achieved because the access switch can complete forwarding. If it is required to cross racks but still in an aggregation layer POD, it needs to be forwarded through aggregation layer switches. The bandwidth depends on the forwarding rate of aggregation layer switches. If it is L3 traffic, it must be forwarded through the core switch, which not only wastes precious core switch resources, but also increases the delay of multi-layer forwarding.
In the large Layer 2 network architecture, both L2 and L3 traffic needs to pass through the core switch, which poses new challenges to the performance of the core switch.
Five, the summary
Traditional three-tier network architectures have existed for decades and are still used in some data centers today. The main reason is cost. For one thing, early L3 routing equipment was much more expensive than L2 bridging equipment. Even now, core switches are more expensive than aggregation access layer devices. In the early data centers, on the other hand, most of the traffic was north-south. For example, a WEB application is deployed on a server for use by clients outside the data center. Using this architecture, the core switches can uniformly control the inflow and outflow of data, add load balancers, and perform load balancing for data traffic.