Abstract: Most Bilingual Lexicon Induction (BLI) methods map monolingual word embeddings (WEs) into a shared semantic space and treat nearest cross-lingual neighbors as translation pairs. A common challenge with these techniques is the propensity for dissimilar semantic words to cluster together in the WE space, posing difficulties in accurately identifying translations. To address this problem, we propose a novel method that leverages antonym knowledge to enhance the separation between words with different semantics in the WE space. The knowledge of generalized antonyms is mined from commonly used data in BLI. Specifically, we jointly use seed lexicons and monolingual word embeddings (WEs) to identify semantically different words, which we refer to as “generalized antonyms.” These generalized antonyms share high cosine similarity within the monolingual WE space and raise semantic confusion. The identified ”generalized antonyms” then serve as “fixed anchor points” to guide the training of the BLI model. The method requires no additional data and can be applied to any language pair. Comprehensive experiments demonstrate that our proposed method outperforms existing state-of-the-art (SOTA) BLI methods across nearly all diverse language pairs. The analysis study also proves that our method effectively enhances the distinction between words.
Loading