AI-generated News May Be Coming in Your Language: A Case Study of Italian

Anonymous

AI-generated News May Be Coming in Your Language: A Case Study of Italian

Anonymous

16 Oct 2023ACL ARR 2023 October Blind SubmissionReaders: Everyone

Abstract: The number of available Large Language Models (LLMs) is growing steadily for English, but less so for other languages. This may create the impression that the potentially harmful applications of LLMs are also limited to English. We present a case study of Italian to investigate the possibility of generating fluent news-like texts with Llama, an existing LLM that was mostly trained on English, with only 40K Italian news articles for fine-tuning. We find that this is sufficient for producing texts that native speakers of Italian struggle to identify as synthetic. We also experiment with two statistical methods of detecting synthetic texts (log-likelihood and DetectGPT), finding that they perform better than human raters. However, these methods are unusable in practice, since they require access to token likelihood information.

Paper Type: short

Research Area: Multilinguality and Language Diversity

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English, Italian

0 Replies

Loading