Learning to Quantize for Training Vector-Quantized Networks

Peijia Qin; Jianguo Zhang

Learning to Quantize for Training Vector-Quantized Networks

Peijia Qin, Jianguo Zhang

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: TLDR: We propose Meta Quantization (MQ), a bi-level optimization-based VQ training framework inspired by meta-learning.

Abstract: Deep neural networks incorporating discrete latent variables have shown significant potential in sequence modeling. A notable approach is to leverage vector quantization (VQ) to generate discrete representations within a codebook. However, its discrete nature prevents the use of standard backpropagation, which has led to challenges in efficient codebook training. In this work, we introduce **Meta-Quantization (MQ)**, a novel vector quantization training framework inspired by meta-learning. Our method separates the optimization of the codebook and the auto-encoder into two levels. Furthermore, we introduce a hyper-net to replace the embedding-parameterized codebook, enabling the codebook to be dynamically generated based on the feedback from the auto-encoder. Different from previous VQ objectives, our innovation results in a meta-objective that makes the codebook training task-aware. We validate the effectiveness of MQ with VQVAE and VQGAN architecture on image reconstruction and generation tasks. Experimental results showcase the superior generative performance of MQ, underscoring its potential as a robust alternative to existing VQ methods.

Lay Summary: When computers learn to create images, they often use a "dictionary" of visual patterns, but it's challenging to create an optimal dictionary. Often, only a few patterns are used, and the dictionary isn't perfectly tuned for the specific task, like generating realistic faces, which can limit the quality of the generated images. Our research introduces "Meta-Quantization," a smarter, two-level approach to build this visual dictionary. A special "hyper-network" first generates the dictionary, and then the main image-processing system learns using it. Afterwards, this hyper-network refines the dictionary based on how well the main system performed, making the dictionary more effective and specifically tailored to the task. This method results in more efficient visual dictionaries, leading to higher-quality image generation and reconstruction. Computers can make better use of all learned visual patterns and can learn faster. Our approach provides a more robust way to train artificial intelligence models for these visual tasks, ultimately improving their overall performance

Link To Code: https://github.com/t2ance/MQVAE

Primary Area: Deep Learning->Generative Models and Autoencoders

Keywords: Vector Quantization, Meta-Learning, Generative Models

Submission Number: 4652

Loading