RAPID-LLM: Resilience-Aware Performance analysis of Infrastructure for Distributed LLM Training and Inference

George Karfakis, Faraz Tahmasebi, Binglu Chen, Lime Yao, Saptarshi Mitra, Tianyue Pan, Hyoukjun Kwon, Puneet Gupta

Published: 2025, Last Modified: 27 May 2026CoRR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Loading