HyperCluster: Decentralized Large Language Model Inference over Peer-to-Peer Wireless Networks

Samarth P; Vyoman Jain; Sanjiv Raghunandan; Akepati Ramya Sri; Richa Sharma PES

HyperCluster: Decentralized Large Language Model Inference over Peer-to-Peer Wireless Networks

Samarth P, Vyoman Jain, Sanjiv Raghunandan, Akepati Ramya Sri, Richa Sharma PES

Published: 26 Jan 2026, Last Modified: 26 Jan 2026AAAI 2026 Workshop on ML4Wireless PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Distributed AI, Peer-to-Peer Networks, LLM Inferencing, Model Sharding, Ring Pipeline

TL;DR: "HyperCluster" is a framework for decentralized collaborative inference, allowing large language models (LLMs) to run across multiple resource-constrained wireless devices.

Abstract: The substantial memory and computational requirements of large language models (LLMs) hinder their deployment on individual resource-constrained wireless devices. This paper introduces HyperCluster, a framework for fully decentralized collaborative inference over wireless networks. Our system presents two core innovations: first, a ring-based pipelined inference protocol where nodes deterministically self-organize into a computational ring based on device capabilities, passing intermediate states directly between peers by leveraging content-addressed networking and document synchronization primitives. Second, we introduce a generalizable model sharding methodology built upon the Hugging Face Transformers library, which automatically partitions any dense LLM across heterogeneous devices according to their available compute and memory. We provide a practical validation of this architecture, demonstrating successful distributed inference of models up to a billion parameters on a network of consumer-grade devices.

Submission Number: 31

Loading