What You See Is What You Get: Entity-Aware Summarization for Reliable Sponsored Search

Xiao Liang; Xinyu Hu; Simiao Zuo; Jimi He; Yu Wang; Victor Ye Dong; Yeyun Gong; Kushal S. Dave; Yi Liu; Qiang Lou; Shao-Lun Huang; Jian Jiao

What You See Is What You Get: Entity-Aware Summarization for Reliable Sponsored Search

Xiao Liang, Xinyu Hu, Simiao Zuo, Jimi He, Yu Wang, Victor Ye Dong, Yeyun Gong, Kushal S. Dave, Yi Liu, Qiang Lou, Shao-Lun Huang, Jian Jiao

Published: 12 Oct 2024, Last Modified: 14 Nov 2024SafeGenAi PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, Summarization, Retrieval, Preference Optimization

TL;DR: We propose an entity-aware summarization framework to improve the reliability of AI-generated summaries for sponsored search.

Abstract: Large Language Models (LLMs) are increasingly used to generate summaries for sponsored search results, but they often misalign with actual webpage content, leading to user misinformation and retrieval inaccuracies. Existing approaches often fail to accurately capture critical entity information and effectively determine query-document relevance, limiting their effectiveness in sponsored search contexts. We propose an entity-aware summarization framework to improve the reliability of AI-generated summaries for sponsored search. Our approach involves two key steps: (1) a structured process for generating entity-aware summaries, including webpage entity tagging, query reflection, and summary generation; and (2) fine-tuning LLaMa3.1-8B on entity-rich summaries and applying Direct Preference Optimization (DPO) to enhance query-document relevance. Comprehensive evaluations demonstrate the superiority of our method over existing baselines. We achieve F1 scores of 0.57, 0.44, and 0.26 for Brand, Product, and Feature entities, respectively, and show a 7.86% improvement in recall@50 for retrieval tasks. Our approach significantly improves the alignment between AI-generated summaries and webpage content in sponsored search environments, marking an important advancement in accurate and effective AI-driven information retrieval systems.

Submission Number: 222

Loading