On the Effectiveness and Robustness of Open-Weight LLMs for Danish Hate Speech Detection

On the Effectiveness and Robustness of Open-Weight LLMs for Danish Hate Speech Detection

ACL ARR 2026 January Submission6667 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: hate speech detection, Danish, low-resource languages, large language models, prompting strategies, class imbalance, fine-tuning, cross-prompt generalization, multilingual NLP

Abstract: We present a study of open-weight large language models for Danish hate speech detection. We evaluate four models (LLaMA, Mistral, Gemma, Qwen) across various prompting strategies, cross-prompt generalization, Danish orthographic effects, and robustness under balanced and imbalanced distributions. Under balanced evaluation, models achieve strong performance with minimal fine-tuning, with Gemma excelling on format-based prompts and Qwen showing consistent performance across theory-driven patterns. Surprisingly, zero-shot fine-tuning matches or exceeds few-shot performance while introducing fewer failures, suggesting that in-context examples may interfere with fine-tuned representations. However, imbalanced evaluation reflecting real-world distributions reveals substantial degradation, with Gemma maintaining the strongest performance under class skew. Theory-informed prompts grounded in linguistic frameworks prove more robust under class imbalance than simple format-based patterns. Cross-prompt generalization varies substantially by model, though format-constrained patterns consistently fail to transfer while semantically-grounded patterns show more robustness. ASCII transliteration of Danish characters (æ, ø, å $\rightarrow$ ae, oe, aa) significantly degrades performance, demonstrating that multilingual pre-training has established meaningful orthographic representations. These findings demonstrate that strong balanced performance does not guarantee real-world readiness and we recommend complementing balanced training with imbalanced evaluation to estimate deployment performance.

Paper Type: Short

Research Area: Low-resource Methods for NLP

Research Area Keywords: NLP Applications, Resources and Evaluation, Question Answering

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data analysis

Languages Studied: Danish

Submission Number: 6667

Loading