ODLP: Scalable Multimodal Information Retrieval Using Visual and Structural Integration

Yunsheng Chen; Mahreen Nasir; Miguel Garcia-Ruiz; Ajmery Sultana; Ping Luo; Wenjun Lin

ODLP: Scalable Multimodal Information Retrieval Using Visual and Structural Integration

Yunsheng Chen, Mahreen Nasir, Miguel Garcia-Ruiz, Ajmery Sultana, Ping Luo, Wenjun Lin

Published: 04 Jul 2025, Last Modified: 04 Aug 2025KDD 2025 Workshop SKnow-LLM PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Web Information Extraction, Optical Character Recognition, Document Object Model, Large Language Models, Multimodal Integration

TL;DR: This paper proposes a multimodal approach for web information extraction by combining webpage visuals and unstructured HTML data, leveraging LLMs extract structured e-commerce knowledge

Abstract: The effective application of Large Language Models (LLMs) in domain-specific settings (e.g., e-commerce, finance, science) hinges on their ability to access and reason over reliable structured knowledge. However, extracting structured knowledge from e-commerce webpages presents significant challenges. First, webpage content is typically represented by HTML and CSS, which, when directly inputted into LLMs, frequently exceeds token limitations. Second, effectively transforming unstructured web data into structured information, such as extracting detailed lists of product descriptions, remains problematic and necessitates sophisticated parsing techniques. This paper investigates the integration of unstructured knowledge, exemplified by HTML, with other modalities (like visual representation) to derive structured knowledge tailored for e-commerce domain tasks. This method enables accurate, context-aware extraction and alignment of item information (e.g., product attributes, pricing), overcoming the limitations of methods reliant solely on unstructured text or inconsistent tags. Experimental evaluations across 31 shopping websites with more than 1,200 products validate the effectiveness of this structured knowledge integration, achieving 96%\ recall/precision and demonstrating robustness. ODLP significantly outperforms LLM-based tools like ZeroX, showcasing the power of combining multimodal data with LLM reasoning for domain-specific problems. This work provides a reliable method for processing extensive amount of unstructured web information.

Submission Number: 4

Loading