MLM with Global Co-occurrenceDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: MLM pre-training, Multilingual model, Machine Learning for NLP, Language Modeling
TL;DR: We present MLM-GC (Masked Language Modeling with Global Co-occurrence) for multilingual tasks.
Abstract: When pre-training models with the objective of MLM (masked language modeling) on multilingual corpora, the model learns to refine different language spaces to overlap each other for forming isomorphic spaces by understanding structural similarities from local bidirectional information. Global co-occurrence information is the primary source of information available to all methods, which potentially gives additional structural similarities to the model. In this work, we push MLM pre-training further to leverage global co-occurrence information. The result is MLM-GC (MLM with Global Co-occurrence) pre-training that the model learns local bidirectional information from masking and global co-occurrence information from a log-bilinear regression. In our experiments, MLM-GC pre-training substantially outperforms MLM pre-training for 4 downstream multilingual/cross-lingual tasks and 1 additional monolingual task, showing the advantages of capturing embedding analogies.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Supplementary Material: zip
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
14 Replies

Loading