Abstract: One of the bottlenecks of training autonomous
vehicle (AV) agents is the variability of training environments.
Since learning optimal policies for unseen environments is often
very costly and requires substantial data collection, it becomes
computationally intractable to train the agent on every possible
environment or task the AV may encounter.
This paper introduces a zero-shot filtering approach to
interpolate learned policies of past experiences to generalize
to unseen ones. We use an experience kernel to correlate
environments. These correlations are then exploited to produce
policies for new tasks or environments from learned policies.
We demonstrate our methods on an autonomous vehicle driving
through T-intersections with different characteristics, where its
behavior is modeled as a partially observable Markov decision
process (POMDP). We first construct compact representations
of learned policies for POMDPs with unknown transition
functions given a dataset of sequential actions and observations.
Then, we filter parameterized policies of previously visited
environments to generate policies to new, unseen environments.
We demonstrate our approaches on both an actual AV and
a high-fidelity simulator. Results indicate that our experience
filter offers a fast, low-effort, and near-optimal solution to
create policies for tasks or environments never seen before.
Furthermore, the generated new policies outperform the policy
learned using the entire data collected from past environments,
suggesting that the correlation among different environments
can be exploited and irrelevant ones can be filtered out.
Loading