AdaptLink: A Heterogeneity-Aware Adaptive Framework for Distributed MLLM Inference

Xinyi. Hu; Zihan Chen; Kun Guo; Meng Zhang; Howard Hao Yang

AdaptLink: A Heterogeneity-Aware Adaptive Framework for Distributed MLLM Inference

Xinyi. Hu, Zihan Chen, Kun Guo, Meng Zhang, Howard Hao Yang

23 Nov 2024 (modified: 22 Dec 2024)AAAI 2025 Workshop AI4WCN SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Distributed Learning, Multimodal LLM, Edge Computing

TL;DR: AdaptLink: a framework that enables multiple edge devices to perform collaborative parallel inference on MLLMs, fully utilizing the available but heterogeneous resources.

Abstract: Multimodal Large Language Models (MLLMs) have demonstrated exceptional performance in tasks such as commonsense reasoning and visual scene understanding. Despite their success, deploying such models onto resource-constrained edge devices remains challenging due to their cost-intensive properties, while lightweight on-device deployment techniques often compromise performance, which is particularly undesirable for tasks requiring fine-grained generalization. In this paper, we propose AdaptLink, a framework that enables a set of edge devices to perform collaborative parallel inference on MLLMs, fully utilizing the available but heterogeneous resources. Unlike prior approaches that uniformly split the model and assign equal computational workload across devices, neglecting the diversity across their computational and hardware conditions, AdaptLink dynamically partitions the model into sub-blocks and assigns computing tasks based on device-specific capabilities, accounting for heterogeneous computational power, memory capacity, and inter-device bandwidth. AdaptLink achieves desirable inference performance by ensuring parallel execution, minimizing idle time, and optimizing resource utilization. Additionally, the proposed framework incorporates a stability assurance mechanism by pre-loading backup sub-blocks onto inactive devices, aiming to mitigate delay caused by device dropout and redeployments. Extensive experiments on LLaVA-series models demonstrate that compared to baseline methods, AdaptLink achieves up to a $1.37\times$ speedup in inference throughput while maintaining model performance. These results underscore the potential of AdaptLink as a robust solution for deploying large-scale MLLMs in mobile edge systems, offering efficiency and adaptability in heterogeneous networks.

Submission Number: 4

Loading