In paper 'Attention-Passing Models for Robust and Data-Efficient End-to-End Speech Translation',   the authors mention that there is also another method that decompose their two-stage model in a similar way, which is from another paper that you've read. Provide the full name of that paper.