Exploiting Multiple Features for Hash Codes Learning with Semantic-Alignment-Promoting Variational Auto-encoder

Jiayang Chen, Qinliang Su

Published: 2023, Last Modified: 12 Mar 2026NLPCC (1) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Semantic hashing is an effective technique to empower information retrieval. Currently, considerable efforts have been dedicated to generating high-quality hash codes by modeling document features using generative models and other approaches. However, most of these methods rely solely on a single type of feature, such as TFIDF features, BERT embeddings, etc. As different types of features have distinct but complementary information of documents, e.g. TFIDF mainly contains the keywords information and BERT focuses on the semantics, hash codes generated solely from either may not capture the full essence of the documents. To overcome this challenge, we propose a semantic-alignment-promoting variational auto-encoder to generate hash codes from multiple document features. Specifically, a VAE-based generative model is first developed to model the multiple features. Then, we propose a semantic-alignment-promoting inference network to estimate the parameters of the variational posterior from multiple features. Additionally, the quality of hash codes is further improved by promoting the semantic alignment between the hash codes of connected documents in a constructed connection graph. The results of extensive experiments on three public datasets demonstrate that our proposed model significantly outperforms current state-of-the-art models.

External IDs:dblp:conf/nlpcc/ChenS23