A Hybrid Framework for Invoice Understanding and Cost Analysis Using LayoutLMv3 and Lightweight Vision-Language Models
Abstract: This work introduces a hybrid AI framework that unifies layout-aware token classification with lightweight generative reasoning to automate financial document parsing and support cost analysis. The system enhances LayoutLMv3 through pseudo-labeling, targeted synthetic augmentation, and class-weighted fine-tuning. It integrates LLaVA, accessed via Ollama, for limited semantic interpretation tasks. Empirical evaluation shows improved performance on rare entity recognition and contextual inference, validated through classification metrics and manual review. Our results highlight the feasibility of combining discriminative and lightweight generative techniques for scalable and interpretable invoice automation, while recognizing current limitations in real-time generative deployment.
Paper Type: Long
Research Area: Information Extraction
Research Area Keywords: layout-aware token classification, invoice parsing, pseudo-labeling, class imbalance, lightweight vision-language models, document AI
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency, Data analysis
Languages Studied: English
Submission Number: 4969
Loading