Abstract: Turn-taking is a significant aspect of a smooth conversation system. Detecting end-of-turn can be difficult for automatic conversation systems, and this can cause misleading conversation systems. To make a conversational system recognizing turn transition points, we propose a token-level turn-taking segmentation using linguistic features. This task imitates the automatic speech recognition environment by organizing several settings. Moreover, we utilize GPT-2, which is well known as a pretrained generative language model, to be able to predict in token-level live text stream. We evaluate our model compared to RNN series models in general conversation datasets and explore model prediction with test sample scenarios.
Loading