Seq2Tok: Deep Sequence Tokenizer for Retrieval

Adhiraj Banerjee; Vipul Arora

Seq2Tok: Deep Sequence Tokenizer for Retrieval

Adhiraj Banerjee, Vipul Arora

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone

Keywords: sequence representation learning, audio search, music retrieval

Abstract: Search over sequences is a fundamental problem. Very efficient solutions exist for text sequences, which are made up of discrete tokens chosen from a finite alphabet. Sequences, such as audio, video or sensor readings, are made up of continuous-valued samples with a large sampling rate, making similarity search inefficient. This paper proposes Seq2Tok, a deep sequence tokenizer that converts continuous-valued sequences to discrete tokens that are easier to retrieve via sequence queries. The only information available for training Seq2Tok is pairs of similar sequences, i.e., depending on how we form the pairs, the similarity semantics are learnt. Seq2Tok compresses the query and target sequences into short sequences of tokens that are faster to match. Experiments show consistent performance of Seq2Tok across various audio retrieval tasks, namely, music search (query by humming) and speech keyword search via audio query.

One-sentence Summary: Represent query and target sequences as compressed token sequences for quick retrieval; similarity semantics are learned from sequence pairs

11 Replies

Loading