Compact Wisdom at Small Scale: Can Small Language Models Serve as Cultural Assistants?

20 Sept 2025 (modified: 29 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Small Language Models (SLMs), Retrieval-Augmented Generation (RAG), Thirukkural, Resource-Constrained AI
TL;DR: We show that small language models, aligned with an English instruction dataset and a lightweight RAG pipeline over the Thirukkural, can deliver faithful retrieval and grounded explanations at a fraction of the cost of LLMs.
Abstract: Large language models (LLMs) achieve state-of-the-art reasoning and generation, but their high compute and energy costs limit deployment in frugal or low-infrastructure settings. Small language models (SLMs), with hundreds of millions of parameters, are emerging as alternatives for narrow-domain applications, yet their effectiveness relative to LLMs remains underexplored. We study this question through the case of the \emph{Thirukkural}, a classical Tamil text of 1,330 aphoristic couplets widely used for ethical and educational reference. We construct an English-based instruction dataset that pairs queries with relevant couplets, translations, and short grounded explanations, and use it to align SLMs for retrieval-augmented generation (RAG). Our contributions are: (i) a compact instruction-tuning corpus over the Thirukkural, (ii) a lightweight RAG pipeline optimized for faithfulness and brevity, and (iii) a comparative study of SLMs against LLM baselines. Results show that SLMs, when aligned with structured supervision, approach LLM-level fidelity in this constrained setting while enabling deployment on commodity hardware. These findings clarify the trade-offs between efficiency and quality, and position SLMs as practical, grounded assistants for cultural and educational AI.
Primary Area: datasets and benchmarks
Submission Number: 23229
Loading