Keywords: Distributed AI, Peer-to-Peer Networks, LLM Inferencing, Model Sharding, Ring Pipeline
TL;DR: "HyperCluster" is a framework for decentralized collaborative inference, allowing large language models (LLMs) to run across multiple resource-constrained wireless devices.
Abstract: The substantial memory and computational requirements of large language models (LLMs) hinder their deployment on individual resource-constrained wireless devices. This paper introduces HyperCluster, a framework for fully decentralized collaborative inference over wireless networks. Our system presents two core innovations: first, a ring-based pipelined inference protocol where nodes deterministically self-organize into a computational ring based on device capabilities, passing intermediate states directly between peers by leveraging content-addressed networking and document synchronization primitives. Second, we introduce a generalizable model sharding methodology built upon the Hugging Face Transformers library, which automatically partitions any dense LLM across heterogeneous devices according to their available compute and memory. We provide a practical validation of this architecture, demonstrating successful distributed inference of models up to a billion parameters on a network of consumer-grade devices.
Submission Number: 31
Loading