Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.
All controllers of high-end storage systems use RDMA networking to connect to each other. Controller enclosures also use RDMA networking to connect to intelligent NVMe disk enclosures and intelligent SAS disk enclosures. The data can be transferred by RDMA link. The transfer work is completed by the interface module, and the data can be directly RDMA to the memory of the peer node, which greatly improves the data transmission efficiency and reduces the access delay. Storage systems use the RDMA technology based on THE RoCE channel, which provides lower latency for reliable communication than PCIE and SAS links. The following figure compares I/O interactions between PCIE links and RoCE links.
Data transfer via RoCE and PCIe consists of three phases: starting the control command, transmitting the transmission to the peer end, and receiving the data for validation and response messages from the peer end. In the PCIe communication model, after data is sent from controller A to controller B, the CPU of controller A notifies controller B of the data delivery (triggering the interrupt of controller B) through the control flow. Controller B invokes the interrupt processing process to verify the message and respond to the message. This process does not exist for RoCE links. After data is successfully sent, controller A does not need to notify controller B that data has been sent. Controller B polls and processes the received data and responds. Compared with PCIE, RoCE reduces the notification process of data arrival, reduces the number of interactions, and has lower latency and higher bandwidth.