Service-Oriented AI Model Compression for Computing Continuum Environments

Adriano Puglisi, Flavia Monti, Christian Napoli, Massimo Mecella

Published: 2025, Last Modified: 02 Mar 2026ICSOC (1) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Modern neural networks often rely on over-parameterized architectures to ensure stability and accuracy, but in many real-world scenarios, such as the Internet of Things and edge devices, large models are difficult to deploy due to computational and memory limitations. Although compression techniques exist, they are rarely integrated into a service-oriented architecture that allows dynamically adapting AI models for heterogeneous devices. In this work, we propose a cloud continuum framework for AI model optimization as a service, where edge devices possibly send neural networks to the cloud, they are automatically compressed and returned in a lightweight version, ready for local execution. At the core of this process is ImproveNet, a method that structurally reduces the size of a neural network during training, without compromising its ability to solve the original task. Starting from a standard sized network, the system monitors performance during training and, once accuracy requirements are met, applies channel reduction and internal layer elimination, progressively simplifying the architecture. The resulting model is returned to the device, enabling AI-on-the-continuum deployment and execution.

External IDs:dblp:conf/icsoc/PuglisiMNM25