Unlocking Credit Loop Deadlocks

Alexander Shpiner; Eitan Zahavi; Vladimir Zdornov; Tal Anker; Matty Kadosh

Unlocking Credit Loop Deadlocks

Alexander Shpiner, Eitan Zahavi, Vladimir Zdornov, Tal Anker, Matty Kadosh

Published: 01 Jan 2016, Last Modified: 07 Aug 2024HotNets 2016EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The recently emerging Converged Enhanced Ethernet (CEE) data center networks rely on layer-2 flow control in order to support packet loss sensitive transport protocols, such as RDMA and FCoE. Although lossless networks were proven to improve end-to-end network performance, without careful design and operation, they might suffer from in-network deadlocks, caused by cyclic buffer dependencies. These dependencies are called credit loops. Although existing credit loops rarely deadlock, when they do they can block large parts of the network. Naive solutions recover from credit loop deadlock by draining buffers and dropping packets. Previous works suggested credit-loop avoidance by central routing algorithms, but these assume specific topologies and are slow to react to failures.In this paper we present distributed algorithm to detect, assure traffic progress and recover from credit loop deadlock for arbitrary network topologies and routing protocols. The algorithm can be implemented over commodity switch hardware, requires negligible additional control bandwidth, and avoids packet loss after the deadlock occurs. We introduce two flavors of the algorithm and discuss their trade-off. We define simple scenario that assures credit loop deadlock to occur and use it to test and analyze the algorithm. In addition, we provide simulation results over 3-level fat-tree network. Last, we describe our prototype implementation over commodity data center switch.

Loading