Approximate Policy Iteration using Large-Margin Classifiers

Michail G. Lagoudakis, Ronald Parr

2003 (modified: 16 Jul 2019)IJCAI 2003Readers: Everyone

Abstract: We present an approximate policy iteration algorithm that uses rollouts to estimate the value of each action under a given policy in a subset of states and a classifier to generalize and learn the improved policy over the entire state space. Using a multiclass support vector machine as the classifier, we obtained successful results on the inverted pendulum and the bicycle balancing and riding domains.

0 Replies