Self Evaluation As A Method For Generating A Chatbots Q ValuesDownload PDF

Anonymous

08 Oct 2022 (modified: 05 May 2023)Submitted to Deep RL Workshop 2022Readers: Everyone
Keywords: Self directed learning, chatbot, q values
TL;DR: A chatbot learns by evaluating its own responses and suggesting better ones
Abstract: As a conventional approach, the generation of natural language responses is seen as an exercise in statistical learning: determining the patterns in human-provided data and providing appropriate responses with the same statistical properties. As a goal-directed process, dialogue may also be described as speakers’ attempts to achieve a particular goal. We introduce a way to get a chatbot to improve using a unique type of reinforcement learning. We get the chatbot itself to evaluate its responses and indicate alternate responses that would be better in quality.Here both the actor and the critic are the same system. We then teacher force the better response against the utterance that was parsed to the chatbot. Our experiments show that this may be a good way to optimize a chatbots "policy".
0 Replies

Loading