ADP: Answer-oriented Distinction Perception for End-to-end Clarification Question Generation

Published: 2025, Last Modified: 02 Jan 2026IJCNN 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Clarification Question Generation (abbr., CQG) is crucial for ambiguous question answering. It produces the structured "clarification question" to reveal the intention and possible answers. Currently, the existing CQG approach is grounded on a pipeline mode, i.e., predicting possible answers first, and further performing CQG accordingly. This approach is applicable and obtains promising performance. However, it suffers from an unavoidable bottleneck that the distinctions among the answers are difficult to perceive, while such distinctions reveal significantly different intentions of questions. Without the ability of distinction perception and differentiation, the generator easily falls into the hallucination caused by distracting intentions. To address the issues, we construct an end-to-end CQG model using multitask learning. In particular, we propose an Answer-oriented Distinction Perception (ADP) approach to enhance the multi-task learning process. Specifically, ADP conducts comparison between a pair of possible answers, and instructs the Large Language Model (LLM) to summarize their distinction. We integrate ADP into the multi-task learning framework, progressively coupling it with possible answer generation and CQG to form different auxiliary tasks. The goal is to obtain the generalized distinction-aware CQG model. In our experiments, we use LLaMA3 as the backbone of CQG, and fine-tune it by multi-task learning. We leverage ChatGPT to produce the distinction descriptions among possible answers, and use them as observable evidence to fine-tune LLaMA3 for ADP. We evaluate our CQG model on the benchmark dataset CAmbigNQ. The test result shows that our ADP-based end-to-end CQG obtains substantial improvements compared to the pipeline CQG model. In addition, we apply our CQG model to the downstream ambiguous question answering task, and achieve an F1-score of 45.1% with an improvement of 5.9% at best (4.1% at worst).
Loading