A Joint Spectro-Temporal Relational Thinking Based Acoustic Modeling Framework

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: acoustic modeling, speech recognition, relational thinking, Bayesian deep learning, graph theory
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Relational thinking refers to the inherent ability of humans to form mental impressions about relations between sensory signals and prior knowledge, and subsequently incorporate them into their model of their world. This ability plays a key role in human understanding of speech, yet it has not been a prominent feature in any artificial speech recognition systems. Recently, there have been some attempts to correct this oversight, but these have been limited to coarse utterance-level models that operate exclusively in the time domain. In an attempt to narrow the gap between artificial systems and human abilities, this paper presents a novel spectro-temporal relational thinking based acoustic modeling framework. Specifically, it first generates numerous probabilistic graphs to model the relations among consecutive speech segments across both time and frequency domains. These graphs are then coupled and transformed into latent representations for downstream tasks, during which meaningful spectro-temporal patterns formed by the co-occurrence of certain node pairs can be uncovered. Models built upon this framework outperform state-of-the-art systems with a 7.82% improvement in phoneme recognition tasks. In-depth analyses further reveal that our proposed relational thinking modeling mainly improves the model's ability to recognize vowel phonemes.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7013
Loading