Abstract: Financial institutions continue to process millions of hand-written forms despite digital transformation efforts, creating a significant operational bottleneck. This research addresses the persistent challenge of automating handwritten data extraction from financial documents by introducing a four-stage processing pipeline that significantly out-performs existing solutions. Our approach sequentially combines targeted structural analysis, specialized optical character recognition, multimodal large language model (MLLMs) verification, and database cross-validation to handle the inherent variability in handwritten content. Experimental results demonstrate exceptional accuracy with our enhanced hybrid method achieving 98.4% F1-score across diverse field types (textual, numerical and checkbox), with perfect extraction of textual content and near-perfect numerical field recognition (98.2% F1-score). This represents dramatic improvement over conventional systems, particularly for numerical data where precision is critical for financial transactions. The document-level accuracy of 80% substantially reduces manual review requirements, offering immediate practical value while establishing a methodological framework for combining complementary technologies to overcome individual component limitations. This research demonstrates how strategically sequenced verification steps can systematically enhance extraction reliability for mission-critical document processing applications.
External IDs:dblp:conf/iccv/WangYZDLH25
Loading