Keywords: Large Language Model Serving, Efficienct Serving Systems, Decentralized LLM Serving, Distributed LLMs
TL;DR: We propose WWW.Serve, a fully decentralized framework for trustless and collaborative LLM serving, which improves efficiency, latency, and scalability while preserving privacy.
Abstract: Recent Large language model (LLM) services remain mostly centralized, restricting both scalability and privacy. Decentralization could address these limitations, but impose challenges of trustless coordination, fair scheduling, and efficiency. To this end, we propose WWW.Serve, a decentralized framework for interconnecting LLM servers worldwide. It preserves service providers’ anonymity and privacy, while supporting self-organizing request dispatch, dynamic workload balancing, and autonomous control over resources and policies. Three key designs are integrated: a blockchain-inspired credit system for trustless collaboration, gossip-driven peer synchronization for flexible participation, and a duel-and-judge mechanism for robust contributor evaluation. Empirically, WWW.Serve improves global SLO attainment by up to $1.5\times$ and lowers latency by 27.6\%. Its performance approaches, and in some cases surpasses, centralized scheduling, while preserving the benefits of decentralization. These results highlight WWW.Serve as a promising foundation for trustless and collaborative LLM serving.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 27
Loading