- Abstract: Modeling the parser state in transition-based parsing is key to good performance. Recurrent Neural Networks considerably improved the performance of transition-based systems by global state modeling, e.g. stack-LSTM parser, or local state modeling of contextualized features, e.g. Bi-LSTM parser. Given the success of Transformer architectures in recent parsing systems, this work explores modifications of the sequence-to-sequence Transformer architecture to model either global or local parser states. We show that modifications of the cross attention mechanism of the Transformer considerably strengthen performance both on dependency and Abstract Meaning Representation (AMR) parsing tasks.