Detection of Glottal Closure Instants from Raw Speech using Convolutional Neural Networks

Mohit Goyal, Varun Srivastava, Prathosh AP

24 Sept 2023 (modified: 24 Sept 2023)OpenReview Archive Direct UploadReaders: Everyone

Abstract: Glottal Closure Instants (GCIs) correspond to the temporal lo- cations of signiﬁcant excitation to the vocal tract occurring dur- ing the production of voiced speech. GCI detection from speech signals is a well-studied problem given its importance in speech processing. Most of the existing approaches for GCI detection adopt a two-stage approach (i) Transformation of speech signal into a representative signal where GCIs are localized better, (ii) extraction of GCIs using the representative signal obtained in ﬁrst stage. The former stage is accomplished using signal pro- cessing techniques based on the principles of speech produc- tion and the latter with heuristic-algorithms such as dynamic- programming and peak-picking. These methods are thus task- speciﬁc and rely on the methods used for representative signal extraction. However in this paper, we formulate the GCI detec- tion problem from a representation learning perspective where appropriate representation is implicitly learned from the raw- speech data samples. Speciﬁcally, GCI detection is cast as a su- pervised multi-task learning problem solved using a deep con- volutional neural network jointly optimizing a classiﬁcation and regression cost. The learning capability is demonstrated with several experiments on standard datasets. The results compare well with the state-of- the-art algorithms while performing bet- ter in the case of presence of real-world non-stationary noise.

0 Replies