Less Bits, More Qubits: A Hybrid Quantum Approach to Large Language Model Compression

Less Bits, More Qubits: A Hybrid Quantum Approach to Large Language Model Compression

ACL ARR 2025 February Submission6834 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Transformer-based LLMs achieve strong results but demand large computational and memory resources. We propose a hybrid quantum-classical approach that embeds variational quantum circuits into transformers for compression. By replacing portions of feed-forward and attention sub-layers with compact quantum modules, we cut parameters while preserving perplexity. Theoretical analysis shows these quantum circuits can approximate large transformations with fewer parameters, and experiments on LLaMA and Qwen confirm memory savings and faster inference. We also discuss quantum hardware feasibility and GPU-based simulation. Overall, our method offers a promising avenue for deploying LLMs in resource-constrained environments.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: Quantum neural network

Contribution Types: NLP engineering experiment, Approaches to low-resource settings

Languages Studied: English

Submission Number: 6834

Loading