Cross-Attribute Consistency Detection in E-commerce Catalog with Large Language Models

Anonymous

Cross-Attribute Consistency Detection in E-commerce Catalog with Large Language Models

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

TL;DR: This paper presents the approach for detecting the consistency between Textual Unstructured Attribute and Structured Attribute present in e-commerce products to address catalog data quality.

Abstract: Comprehending the quality of data represented on an E-commerce product page is a challenge and is currently achieved with varied approaches that are dependent on large task-specific datasets curated with human efforts. This slows down the process of scaling to a large catalog scope. The recent advancements in Large Language Models (LLM) have revolutionized their ability to significantly enhance various downstream applications using small and carefully curated datasets. In this paper, our focus is to explore LLM capability in addressing a challenge related to the catalog quality assessment. To be specific, we aim to detect the consistency of information presented between Unstructured Attributes (UA) (incl. Title, Bullet Points (BP), Product Description (PD)), and Structured Attributes (SA) within a product page through pairwise evaluations using predefined class labels. To achieve it, we propose a novel approach, $\texttt{CENSOR}$, that utilizes LLM in two phases. In the first phase, off-the-shelf LLM is leveraged in a zero-shot manner using prompt engineering techniques. While in the second phase, open-source LLM is fine-tuned with a small human curated dataset along with the weak labeled data generated in first phase as a data augmentation technique to incorporate domain-specific knowledge. The fine-tuned LLM overcomes the deficiencies observed in the first phase and entails the model to address the consistency detection task. Evaluation conducted using the E-commerce dataset which include a comprehensive set of 186 distinct combinations of <Product Type, SA>, $\texttt{CENSOR}$ fine-tuned model outperforms the baseline method and $\texttt{CENSOR}$ zero-shot model with +34.4 and +19.4 points on F1-score respectively.

Paper Type: long

Research Area: NLP Applications

Contribution Types: NLP engineering experiment

Languages Studied: English

0 Replies

Loading