A Novel Matching Paradigm: Unified Generative and Discriminative LLM with Prompt Compression for Relevance Learning

Published: 18 Apr 2026, Last Modified: 24 Apr 2026ACL 2026 Industry Track PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLMs, Matching paradigm, Prompt compression, Industrial search advertising engine
TL;DR: A Novel Matching Paradigm: Unified Generative and Discriminative LLM with Prompt Compression for Relevance Learning
Abstract: The matching paradigm is fundamental to large-scale information retrieval and is widely used in industrial search and advertising systems. Existing approaches employ Large Language Models (LLMs) primarily as feature extractors, underutilizing their full modeling capabilities. To address this limitation, we propose a novel matching paradigm, termed the Unified Generative and Discriminative LLM (UGD). It integrates two-tower, single-tower, and generative tasks within a unified LLM framework via attention-mask partitioning, enabling generative tasks to serve as auxiliary supervision for discriminative learning and facilitating distillation from single-tower to two-tower architectures through a multi-task fine-tuning mechanism. To satisfy online latency constraints, we further introduce a self-distillation variant of UGD with a KMeans-enhanced linearized RQVAE for prompt compression and quantization. This design compresses and quantizes landing-page documents during inference, improving serving efficiency and reducing storage overhead. Extensive experiments show that UGD achieves superior performance and strong practical value. The framework has been deployed in an industrial search engine serving hundreds of millions of users and hundreds of thousands of advertisers, significantly enhancing search experience. Open access upon publication.
Submission Type: Deployed
Copyright Form: pdf
Submission Number: 424
Loading