- Keywords: Transformer, Spoken Language Understanding
- TL;DR: We show that how transformer-based architecture can be used for building end to end SLU system
- Abstract: End-to-end spoken language understanding (SLU) systems directly map speech to intent through a single trainable model whereas conventional SLU systems use Automatic Speech Recognition (ASR) to convert speech to text and utilize Natural Language Understanding (NLU) to get intent. In this paper, we show how transformer-based architecture can be used for building end to end SLU systems. We conducted experiments on the Fluent Speech Commands (FSC) dataset, where intents are formed as combinations of three slots namely action, object, and location. We also demonstrate how state-of-the-art results can be obtained using a combination of various data augmentation methods.
- Double Submission: No