Unsupervised Personality-Aware Language IdentificationDownload PDF

Anonymous

17 Sept 2021 (modified: 05 May 2023)ACL ARR 2021 September Blind SubmissionReaders: Everyone
Abstract: Recognizing the language of ambiguous texts remains a main challenge in language identification (LID). When using multilingual applications, users have their own language preferences, which can be regarded as external knowledge for LID. Nevertheless, current studies marginally consider the inter-personal variations due to the lack of user annotated training data. To fill this gap, we introduce personality-aware LID and propose a novel unsupervised learning strategy. Concretely, we extract training samples for each user from a standard LID corpus according to his/her language preference. Furthermore, we contribute the first user labeled LID test set called "U-LID''. Experimental results reveal that the proposed model can incarnate user traits and significantly outperforms existing LID systems on handling ambiguous texts. Our code and dataset will be released upon the acceptance.
0 Replies

Loading