Improving Speech Recognition with Drop-in Replacements for f-Bank Features

Sean Robertson, Gerald Penn, Yingxue Wang

Published: 2019, Last Modified: 06 Oct 2024SLSP 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: While a number of learned feature representations have been proposed for speech recognition, employing f-bank features often leads to the best results. In this paper, we focus on two alternative methods of improving this existing representation. First, triangular filters can be replaced with Gabor filters, a compactly supported filter that better localizes events in time, or with psychoacoustically-motivated Gammatone filters. Second, by rearranging the order of operations in computing filter bank features, the resulting coefficients will have better time-frequency resolution. By merely swapping f-banks with other types of filters in modern phone recognizers, we achieved significant reductions in error rates across repeated trials.