Homogeneous-listing-augmented Self-supervised Multimodal Product Title Refinement

Jiaqi Deng, Kaize Shi, Huan Huo, Dingxian Wang, Guandong Xu

Published: 2024, Last Modified: 16 Jan 2026SIGIR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Product titles on e-commerce marketplaces often suffer from verbosity and inaccuracy, hindering effective communication of essential product details to customers. Refining titles to be more concise and informative is crucial for better user experience and product promotion. Recent solutions to product title refinement follow the standard text extractive and generative methods. Some also leverage multimodal information, e.g. using product images to supplement original titles with visual knowledge. However, these generative methods often produce additional terms not endorsed by sellers. Thus, it remains challenging to incorporate visual information missing from original titles into refined titles without excessively introducing novel terms. Additionally, most existing methods require human-labeled datasets, which are laborious to construct. In response to the two challenges, we present a self-supervised multimodal framework (HLATR) for title refinement that comprises two key modules: (1) a perturbated sample generator that constructs training data by systematically mining homogeneous listing information and (2) a title refinement network that effectively harnesses visual information to refine the original titles. To explicitly balance the extraction from original titles and the generation of supplementary novel terms, we adapt the copy mechanism that is guided by a focused refinement loss. Extensive experiments demonstrate that our proposed framework consistently outperforms others in generating refined titles that contain essential multimodal semantics with minimal deviation from the original ones.