Trajectory representation learning for Multi-Task NMRDPs planning

Firas JARBOUI; Vianney PERCHET; Roman EGGER

Trajectory representation learning for Multi-Task NMRDPs planning

Firas JARBOUI, Vianney PERCHET, Roman EGGER

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: Representation Learning, State Estimation, Non Markovian Decision Process

TL;DR: Expanding NMRDPs into MDPs using trajectory representation learning

Abstract: Expanding Non Markovian Reward Decision Processes (NMRDP) into Markov Decision Processes (MDP) enables the use of state of the art Reinforcement Learning (RL) techniques to identify optimal policies. In this paper an approach to exploring NMRDPs and expanding them into MDPs, without the prior knowledge of the reward structure, is proposed. The non Markovianity of the reward function is disentangled under the assumption that sets of similar and dissimilar trajectory batches can be sampled. More precisely, within the same batch, measuring the similarity between any couple of trajectories is permitted, although comparing trajectories from different batches is not possible. A modified version of the triplet loss is optimised to construct a representation of the trajectories under which rewards become Markovian.

Code: https://iclr2020submission.blob.core.windows.net/iclr2020/Trajectory representation learning for Multi-Task NMRDPs planning.tar.xz

Original Pdf: pdf

8 Replies

Loading