CtrlShift: Steering Language Models for Dense Quotation Retrieval with Dynamic Prompts

ACL ARR 2025 July Submission621 Authors

28 Jul 2025 (modified: 19 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Quotation recommendation is an inherently asymmetric retrieval task, where the intended meaning of a quote often diverges from surface expressions, creating significant semantic shifts. Combined with minimal lexical overlap, this poses a core challenge for classic dense retrievers, which struggle to capture non-literal and rhetorical alignments. To bridge this semantic gap, we propose introducing controllable signals to guide the model’s attention toward abstract, context-relevant concepts. We propose CtrlShift, a framework that leverages a Variational Autoencoder (VAE) to capture latent associations between context and quotation, which is used to derive context-aware control signals to modulate semantic focus and support bidirectional alignment and rhetorical intent modeling. Experiments show that our method consistently outperforms baselines on the quotation recommendation task and can be effectively transfered to the general purposed benchmark. Further, CtrlShift integrates seamlessly with general-purpose generative models without additional fine-tuning, and provides satisfactory interpretability by generating textual explaination to uncover the model’s focus on abstract, citation-aligned semantics.
Paper Type: Long
Research Area: Information Retrieval and Text Mining
Research Area Keywords: dense retrieval, prompting, phrase/sentence embedding
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: chinese, english
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: Yes
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: Yes, we cite all datasets used in Section 4
B2 Discuss The License For Artifacts: No
B2 Elaboration: No, the artifacts we used models do not have explicit license or usage terms provided by their creators
B3 Artifact Use Consistent With Intended Use: Yes
B4 Data Contains Personally Identifying Info Or Offensive Content: No
B4 Elaboration: No, the datasets we used do not contain any personally identifying information or offensive content. Therefore, no special anonymization or filtering steps were necessary.
B5 Documentation Of Artifacts: Yes
B6 Statistics For Data: N/A
C Computational Experiments: Yes
C1 Model Size And Budget: N/A
C2 Experimental Setup And Hyperparameters: Yes
C3 Descriptive Statistics: Yes
C4 Parameters For Packages: Yes
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: N/A
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: N/A
Author Submission Checklist: yes
Submission Number: 621
Loading