TVMamba:Towards Efficient Visual Mammba With Ternary Weights and Activations

02 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Ternary quantization, Visual Mamba, Quinary-to-ternary annealing, Frequency-aware routing, Edge inference
Abstract: Visual Mambas based on state space models have recently emerged as strong vision backbones. To improve efficiency on resource-constrained devices, many studies explore quantization to represent weights and activations at low precision. However, most state-of-the-art methods reach ternary weights while activations stay at 8 bits or higher, which limits practical efficiency gains. We present TVMamba, to our knowledge the first approach that achieves ternary weights and ternary activations. Our analyses show that uneven channel distributions make ternary activations difficult, leading to unstable optimization and amplified spectral distortions. To address this, TVMamba introduces two components: (1) a staged codebook that trains with five level activations for stability and collapses to ternary at deployment, and (2) a lightweight quantization aware frequency routing module that preserves high frequency detail while maintaining the low pass behavior of the SSM core. Empirically, on two mainstream Visual Mamba backbones, VMamba and Vim, our method delivers competitive accuracy. Wall clock measurements across devices show matrix multiplication at various sizes accelerated between 17 times and 87 times with joint ternarization of both weights and activations.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 1006
Loading