AI-generated News May Be Coming in Your Language: A Case Study of ItalianDownload PDF

Anonymous

16 Oct 2023ACL ARR 2023 October Blind SubmissionReaders: Everyone
Abstract: The number of available Large Language Models (LLMs) is growing steadily for English, but less so for other languages. This may create the impression that the potentially harmful applications of LLMs are also limited to English. We present a case study of Italian to investigate the possibility of generating fluent news-like texts with Llama, an existing LLM that was mostly trained on English, with only 40K Italian news articles for fine-tuning. We find that this is sufficient for producing texts that native speakers of Italian struggle to identify as synthetic. We also experiment with two statistical methods of detecting synthetic texts (log-likelihood and DetectGPT), finding that they perform better than human raters. However, these methods are unusable in practice, since they require access to token likelihood information.
Paper Type: short
Research Area: Multilinguality and Language Diversity
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English, Italian
0 Replies

Loading