Abstract: The Trip Planning Query (TPQ), which returns the optimal path from the starting point to the destination that satisfies multiple types of points of interest (POIs) specified by the user, has attracted more and more attention. The most straightforward approach is to enumerate all the POI combinations that meet the user’s needs, and then select the path with the shortest distance. So this problem can be regarded as a combinatorial optimization problem and solved with deep reinforcement learning. Hence, in this paper, we explore the application of deep reinforcement learning in solving TPQ problem. Since the selection of POI can be considered as a sequence decision problem, we model it as a seq2seq problem. Firstly, to help the model reduce the difficulty of selection, we remove POIs that can not be the result, and propose a candidate set generation method. Its nodes are enough to meet the query POIs for the model to select different node sequences. Secondly, we use the encoder-decoder model base on attention mechanism. We concatenate the embedding of the start point, the end point and the selected nodes as the query part of the attention mechanism. We mask the same poi after select so that the model can solve the TPQ problem. Finally, we employ the REINFORCE method for training with a greedy baseline. Our model has a good performance on different maps, different POI densities, and different numbers of required POIs.
Loading