Keywords: Optical Character Recognition, Indic Languages, Multilingual OCR, Deep Learning, Natural Language Processing, Computer Vision, Document Understanding, South Asian Languages, Language Technology, Low-Resource Languages
TL;DR: Nayana enables multilingual vision-language capabilities in low-resource languages through synthetic data generation and efficient model adaptation, demonstrating strong performance across OCR and Visual Questioning tasks
Abstract: We introduce Nayana, a scalable and efficient framework for adapting Vision-Language Models (VLMs) to low-resource languages. Despite significant advances, modern VLMs remain constrained by the scarcity of training data in non-English languages, limiting their global applicability. Our framework addresses this fundamental challenge through a novel layout-aware synthetic data generation pipeline combined with parameter-efficient adaptation techniques. Instead of requiring extensive manually annotated datasets, Nayana enables existing models to learn new languages effectively using purely synthetic data. Using Low-Rank Adaptation (LoRA), we demonstrate this capability across ten Indic languages: Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, and Telugu. Through extensive experiments in OCR tasks, we show that models can achieve strong performance in new languages without the traditional requirements of large-scale annotated datasets or extensive model modifications. Nayana's success in adapting VLMs to new languages with synthetic data establishes a practical pathway for extending AI capabilities to underserved languages, particularly in scenarios where annotated data is scarce or unavailable.
Archival: Archival Track
Submission Number: 16
Loading