Approximate Policy Iteration using Large-Margin ClassifiersDownload PDF

2003 (modified: 16 Jul 2019)IJCAI 2003Readers: Everyone
Abstract: We present an approximate policy iteration algorithm that uses rollouts to estimate the value of each action under a given policy in a subset of states and a classifier to generalize and learn the improved policy over the entire state space. Using a multiclass support vector machine as the classifier, we obtained successful results on the inverted pendulum and the bicycle balancing and riding domains.
0 Replies

Loading