Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs.

Youhe Jiang, Fangcheng Fu, Xiaozhe Yao, Guoliang He, Xupeng Miao, Ana Klimovic, Bin Cui 0001, Binhang Yuan, Eiko Yoneki

15 Jan 2026ICML 2025EveryoneCC BY-SA 4.0
Loading