CharLuMA: Efficient Multi-Language Chart-to-Code Generation with Low-Rank Subspace Adaptation

17 Sept 2025 (modified: 03 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal large language model, chart-to-code generation, multimodal dataset
Abstract: Chart-to-code generation involves translating a chart image into an executable plotting script. However, prior work has largely focused on Python-only solutions, limiting real-world applicability and leaving the learning signals inherent in cross-language equivalences untapped. We argue that aligned multi-language scripts serve as complementary “views” of the same chart, providing mutual guidance to regularize the visual-to-code mapping. As an instantiation of this idea, we introduce CharLuMA – a multimodal large language model (MLLM) that integrates a language-guided mixture of low-rank subspaces into its multimodal projector. This architecture enables parameter-efficient adaptation via dynamic routing to language-specific subspaces, while preserving shared visual-semantic representations of charts. To facilitate training and evaluation at scale, we present Chart2NCode, a dataset of 176k Chart–Python–R–LaTeX quadruples that maintain consistent visual equivalence across languages. Experiments on multiple benchmarks demonstrate that CharLuMA achieves state-of-the-art performance among open-source MLLMs and even surpasses some proprietary systems. Critically, training with more diverse and balanced language sets yields consistent and substantial improvements across all languages by leveraging the rich supervisory signals embedded in cross-language equivalences. Subspace activation analysis further reveals a hybrid allocation pattern, with compact shared cores complemented by broader language-specific zones, while stronger models exhibit smoother and more balanced allocations. Taken together, these results establish multi-language alignment as an effective supervision paradigm for achieving universal chart-to-code generation.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 8690
Loading