Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, but their deployment in resource-constrained environments remains challenging due to substantial memory and computational requirements. Benefiting from the sparse event-driven computation paradigm of Spiking Neural Networks (SNNs), some research has focused on designing spike-based language models. However, existing spike-based language models achieve only partial computational efficiency gains and fail to address memory constraints comprehensively. In this paper, we propose an evolved and quantized spike-driven language model (EQ-SpikeLM) to address identified challenges. This model incorporates two primary innovations. First, inspired by the artificial bee colony algorithm in evolutionary computation, we propose an architecture evolution method, namely ABC-Arc. This method optimizes network topology by systematically removing redundant neural pathways. Second, a dynamic post-training quantization (DynPTQ) strategy is developed for the evolved SpikeLM, facilitating the conversion of floating-point parameters to lower-bit precision without requiring model retraining. By combining these two methods, EQ-SpikeLM significantly reduces storage and computational demands while preserving model performance. Experimental evaluation on the GLUE benchmark demonstrates EQ-SpikeLM’s ability to maintain performance equivalent to its uncompressed counterpart, with a substantial reduction in both model size and power consumption. These results position EQ-SpikeLM as a viable approach for deploying large language models in resource-constrained edge computing scenarios.
External IDs:doi:10.1109/tevc.2025.3606613
Loading