Abstract: Highlights•We employ diverse embedding methods and attention mechanisms to fuse multi-source visual and semantic information.•An end-to-end trajectory prediction model is proposed, which incorporates a spatial–temporal cross attention mechanism.•We construct a spatial–temporal graph to preserve temporal information and explore spatial dependencies within trajectory.•A Geographical Road Gating mechanism is proposed, which utilizes limited parameters to constrain predicted trajectory.•Extensive experiments were conducted on the Argoverse 2 dataset, demonstrating that our model has advanced performance.
Loading