Abstract: Conversational agents are rich in content today. However, they are entirely oblivious to users’ situational context, limiting their ability to adapt their response and interaction style. To this end, we explore the design space for a context augmented conversational agent, including analysis of input segment dynamics and computational alternatives. Building on these, we propose a solution that redesigns the input segment intelligently for ambient context recognition, achieved in a two-step inference pipeline. We first separate the non-speech segment from acoustic signals and then use a neural network to infer diverse ambient contexts. To build the network, we curated a public audio dataset through crowdsourcing. Our experimental results demonstrate that the proposed network can distinguish between 9 ambient contexts with an average F1 score of 0.80 with a computational latency of 3 milliseconds. We also build a compressed neural network for on-device processing, optimised for both accuracy and latency. Finally, we present a concrete manifestation of our solution in designing a context-aware conversational agent and demonstrate use cases.
0 Replies
Loading