Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw

Published: 01 Jan 2021, Last Modified: 10 May 2025CoRR 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We present a number of low-resource approaches to the tasks of the Zero Resource Speech Challenge 2021. We build on the unsupervised representations of speech proposed by the organizers as a baseline, derived from CPC and clustered with the k-means algorithm. We demonstrate that simple methods of refining those representations can narrow the gap, or even improve upon the solutions which use a high computational budget. The results lead to the conclusion that the CPC-derived representations are still too noisy for training language models, but stable enough for simpler forms of pattern matching and retrieval.
Loading