Leveraging Pruning, Quantization and Multi-objective Optimization for an Efficient Deployment of Multi-modal Models
Abstract: The rise of large-scale machine learning models, such as GPT, LLaMa, and DALL-E, has revolutionized the development of AI systems but poses challenges for real-world deployment, particularly in edge computing environments with limited resources. This paper presents a comprehensive benchmark for simulating edge computing with multi-modal AI models, focusing on optimizing these complex architectures for the Artificial Intelligence of Things (AIoT). We address key challenges such as model size reduction, computational resource constraints, and federated learning environments. Specifically, we propose methods including model pruning, quantization, and reinforcement learning-based optimization for adaptive offloading to enhance performance while maintaining accuracy. By leveraging techniques such as Vision Transformer pruning and quantization, we demonstrate how to scale down large multimodal networks for efficient deployment on edge devices. Our experiments show significant improvements in computational efficiency and reduced training times, making these models more feasible for real-world AIoT applications.
Loading