Keywords: Self directed learning, chatbot, q values
TL;DR: A chatbot learns by evaluating its own responses and suggesting better ones
Abstract: As a conventional approach, the generation of natural language responses is seen
as an exercise in statistical learning: determining the patterns in human-provided
data and providing appropriate responses with the same statistical properties. As
a goal-directed process, dialogue may also be described as speakers’ attempts to
achieve a particular goal. We introduce a way to get a chatbot to improve using
a unique type of reinforcement learning. We get the chatbot itself to evaluate its
responses and indicate alternate responses that would be better in quality.Here
both the actor and the critic are the same system. We then teacher force the better
response against the utterance that was parsed to the chatbot. Our experiments
show that this may be a good way to optimize a chatbots "policy".
0 Replies
Loading