\section{Related Work}
\textbf{Kinetics-and-Aerodynamics Methods}
The Kinetics-and-Aerodynamics methods \citep{thipphavong2013adaptive,benavides2014implementation,soler2015multiphase, tang20154d} divide the entire flight process into several phases, and establish motion equations for each phase to formulate the flight status. For example, \citet{wang2009prediction} adopted basic flight models to construct horizontal, vertical, and velocity profiles based on the characteristics of different flight phases. \citet{Zhijing7867472} combined the dynamics-and-kinematics models and grayscale theory to predict future trajectories. The grayscale theory can address the parameter missing problem in dynamics-and-kinematics models and improve the prediction performance. Due to numerous unknown and time-varying flight parameters of aircraft, these fixed-parameter methods cannot accurately describe the flight status, leading to poor performance and limited application scenarios.

\textbf{State-Estimation Methods}
The Kalman Filter and its variants \citep{xi2008simulation, Yan6972562} are the typical single-model state-estimation algorithms for FTP tasks, which applies the predefined state equations to estimate the next flight status based on the current observation. For example, \citet{xi2008simulation} applied the Kalman Filter to track discrete flight trajectories by calculating a continuous state transition matrix. However, single-model algorithms cannot adapt to the complex ATC environment. To address this issue, Interactive Multi-Model algorithms \citep{hwang2003flight, li2005survey} have been proposed and successfully applied for trajectory analysis. Although multi-model algorithms can achieve better prediction performance, the computational complexity is high and cannot satisfy the real-time requirement. 

\textbf{Deep Learning Methods}
With the rapid development of deep learning, there has been a surge of deep learning methods for FTP task \citep{ xu2021multi, pang2022bayesian, Sahadevan, Zhang2023FlightTP, Guo2023FlightBERT, Guo2023FlightBERT++}. These learning-based approaches can extract high-dimensional features from raw data, which have achieved a more magnificent performance compared to previous methods. For example, \citet{Sahadevan} used a Bi-directional Long-Short-Term-Memory (Bi-LSTM) network to explore both forward and backward dependencies in the sequential trajectory data. \citet{Zhang2023FlightTP} proposed a wavelet transform-based framework (WTFTP) to perform time-frequency analysis of flight patterns for trajectory prediction.
FlightBERT \citep{Guo2023FlightBERT} employed binary encoding to represent the attributes of the trajectory points and considered the FTP task as a multi-binary classification problem. However, these works predict the future trajectory recursively and suffer from serious error accumulation. Recently, FlightBERT++ \citep{Guo2023FlightBERT++} has been introduced for DMS prediction, which considers the prior horizon information and directly predicts the differential values between adjacent points. 