Keywords: speech, NLP, spoken language
TL;DR: Time is ripe for the unification of NLP and speech processing.
Abstract: Human language is firstly spoken and only secondarily written.
Text, however, is a very convenient and efficient
representation of language, and modern civilization has made it
ubiquitous. Thus the field of NLP has
overwhelmingly focused on processing written rather than spoken
language. Work on spoken language, on the other hand, has been
siloed off within the largely separate speech processing community
which has been inordinately preoccupied with transcribing speech into
text.
Recent advances in deep learning have led to a fortuitous
convergence in methods between speech processing and mainstream NLP.
Arguably, the time is ripe for a unification of these two fields,
and for starting to take spoken language seriously as the primary
mode of human communication. Truly natural language processing
could lead to better integration with the rest of language science
and could lead to systems which are more data-efficient and more
human-like, and which can communicate beyond the textual
modality.
0 Replies
Loading