CORN: Cloud-optimized RDMA Networking

Published: 01 Jan 2023, Last Modified: 16 May 2025IPCCC 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Remote Direct Memory Access (RDMA) characteristics, such as high bandwidth, low latency, and low CPU utilization, have positioned RDMA as mainstream for interconnect of cloud-based High-Performance Computing (HPC) services. However, existing RDMA technologies, including InfiniBand and RoCEv2, have limitations in terms of compatibility with legacy networks, scalability in large-scale deployment, and cost-inefficient. In order to address these challenges, we propose Cloud-optimized RDMA Networking (CORN). It features cloud-optimized congestion control, which considers the Bandwidth Delay Product (BDP) and the inflight packets to determine the amount of traffic to be transmitted. This congestion control scheme significantly reduces the likelihood of packet loss due to overflowing buffers on the network switches. CORN leverages the traditional Selective ACK (SACK) to deal with packet drops caused by network congestion or H/W fault. Consequently, CORN can support lossy RDMA networks on Ethernet. In addition, the two features of CORN are designed to operate without any modifications or configurations of the network switches. CORN functions as a shim layer between UDP and RDMA, operating solely within the end host. This design ensures the seamless deployment of CORN. The implementation using ns3 shows that CORN is feasible and more efficient than congestion control schemes like DCQCN, TIMELY, and HPCC.
Loading