Abstract: Attribute Value Extraction (AVE) is a crucial technology in e-commerce that enables the identification and extraction of specific product attributes and their corresponding values. While most prior research has focused on directly extracting explicit values from text, this paper introduces a multimodal implicit AVE dataset in the fashion domain, which can generate standardized attribute-value pairs for more effective downstream analysis. Additionally, we propose a step-by-step pipeline that separates the generation of attributes and values, alleviating the model's complexity in understanding the task. In the second step, our visual prompting method directs the model's attention to key regions in the images, thereby improving the accuracy of value extraction. Experimental results demonstrate that our approach outperforms several recent strong baselines, and ablation studies further highlight the effectiveness of each component of our method.
Paper Type: Long
Research Area: Information Extraction
Research Area Keywords: Large Languge Models, Multimodality, prompting, reasoning
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 865
Loading