Abstract: Storytelling is a fundamental human skill that preserves cultural heritage and fosters cognitive development. While Large Language Models (LLMs) have revolutionized narrative generation, they often exhibit Western biases, resulting in stories that lack authenticity for non-Western contexts. This gap is particularly evident in Saudi Arabia, where a rich tapestry of dialects, traditions, and regional customs remains underserved by current linguistic models. To address this, we introduce the Rawi Dataset, a comprehensive benchmark comprising 550 Arabic stories grounded in verified Saudi cultural elements. Leveraging structured prompt engineering and data from the Absher and SaudiCulture benchmarks, Rawi integrates specific regional details such as cuisine, architecture, and dialects across three age-appropriate categories. Furthermore, we establish an automated evaluation framework using the LLM-as-Judge paradigm-utilizing the ALLAM and Qwen models-to assess linguistic clarity and cultural authenticity. By bridging the gap between LLMs and Saudi heritage, Rawi provides a vital resource for advancing culturally aware computational storytelling.
External IDs:doi:10.36227/techrxiv.176772666.67217015/v1
Loading