Bridging the Gap Between Cascade and End-to-End Cross-modal Translation Models: A Zero-Shot ApproachDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Zero-Shot, End-to-End, Speech Translation
Abstract: One of the main problems in cross-modal translation, such as Speech Translation or OCR Image Translation, is the mismatches among different modalities. The second problem, scarcity of parallel data covering multiple modalities, means that the end-to-end multi-modal neural network models tend to perform worse than cascade models, although there are exceptions under favorable conditions. To address these problems, we present a differentiable cascade translation model, connecting two pre-trained uni-modality modules in a trainable way. We adapt the Word Rotator’s Distance loss using the Optimal Transport approach, which effectively handles the multi-modal discrepancy. Furthermore, the approach naturally enables zero-shot multi-modal training, reducing the dependence of end-to-end models on large amounts of data, and at the same time allowing end-to-end training when data do become available. Our comprehensive experiments on the MuSTC benchmarks show that our end-to-end zero-shot approach performs better than or as well as those of the CTC-based cascade models, and that our end-to-end model with supervised training matches the latest state-of-the-art results.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
Supplementary Material: zip
10 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview