- Abstract: This paper presents a method to solve the city metro network expansion problem using reinforcement learning (RL). In this method, we formulate the metro expansion as a process of sequential station selection, and design feasibility rules based on the selected station sequence to ensure the reasonable connection patterns of metro line. Following this formulation, we train an actor critic model to design the next metro line. The actor is a seq2seq network with attention mechanism to generate the parameterized policy which is the probability distribution over feasible stations. The critic is used to estimate the expected reward, which is determined by the output station sequences generated by the actor during training, in order to reduce the training variance. The learning procedure only requires the reward calculation, thus our general method can be extended to multi-factor cases easily. Considering origin-destination (OD) trips and social equity, we expand the current metro network in Xi'an, China, based on the real mobility information of 24,770,715 mobile phone users in the whole city. The results demonstrate the effectiveness of our method.