MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies

Shiyue Zhang, Shijie Wu, Ozan Irsoy, Steven Lu, Mohit Bansal, Mark Dredze, David S. Rosenberg

Published: 01 Jan 2023, Last Modified: 27 Feb 2024ACL (1) 2023Readers: Everyone

Abstract: Shiyue Zhang, Shijie Wu, Ozan Irsoy, Steven Lu, Mohit Bansal, Mark Dredze, David Rosenberg. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023.

0 Replies