OCR-Augmented GPT for Accurate Text Extraction in Industrial Environments

Jonghyeok Park, Jaehyung Cho, Soohee Han, Kyungjun Kim

Published: 01 Jan 2025, Last Modified: 28 Jan 2026IEEE AccessEveryoneRevisionsCC BY-SA 4.0

Abstract: Optical character recognition (OCR) in industrial environments often struggles with degraded text, such as handwriting or text obscured by complex backgrounds. Traditional methods address these challenges by re-identifying handwritten text or modifying backgrounds, but these approaches are costly and time-consuming. To overcome these limitations, we propose an OCR-augmented GPT framework, where GPT infers text from images with support from an initial OCR-based coarse estimation. The OCR layer first generates preliminary text outputs by processing multiple image sequences of the same text, capturing variations in lighting, noise, and distortion. These outputs, which may contain errors or inconsistencies, are then fed to a GPT layer along with the original images. By extracting information from both degraded images and preliminary OCR results, the framework produces accurate text by leveraging contextual understanding. In experimental evaluations, including real-world industrial environments, the proposed framework achieved more accurate OCR results than standalone OCR or GPT, demonstrating its scalability and cost-effectiveness for digital documentation across various environments.

External IDs:doi:10.1109/access.2025.3594682