CtrlShift: Steering Language Models for Dense Quotation Retrieval with Dynamic Prompts

CtrlShift: Steering Language Models for Dense Quotation Retrieval with Dynamic Prompts

ACL ARR 2025 July Submission621 Authors

28 Jul 2025 (modified: 19 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Quotation recommendation is an inherently asymmetric retrieval task, where the intended meaning of a quote often diverges from surface expressions, creating significant semantic shifts. Combined with minimal lexical overlap, this poses a core challenge for classic dense retrievers, which struggle to capture non-literal and rhetorical alignments. To bridge this semantic gap, we propose introducing controllable signals to guide the model’s attention toward abstract, context-relevant concepts. We propose CtrlShift, a framework that leverages a Variational Autoencoder (VAE) to capture latent associations between context and quotation, which is used to derive context-aware control signals to modulate semantic focus and support bidirectional alignment and rhetorical intent modeling. Experiments show that our method consistently outperforms baselines on the quotation recommendation task and can be effectively transfered to the general purposed benchmark. Further, CtrlShift integrates seamlessly with general-purpose generative models without additional fine-tuning, and provides satisfactory interpretability by generating textual explaination to uncover the model’s focus on abstract, citation-aligned semantics.

Paper Type: Long

Research Area: Information Retrieval and Text Mining

Research Area Keywords: dense retrieval, prompting, phrase/sentence embedding

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: chinese, english

Reassignment Request Area Chair: This is not a resubmission

Reassignment Request Reviewers: This is not a resubmission

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: Yes

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: Yes, we cite all datasets used in Section 4

B2 Discuss The License For Artifacts: No

B2 Elaboration: No, the artifacts we used models do not have explicit license or usage terms provided by their creators

B3 Artifact Use Consistent With Intended Use: Yes

B4 Data Contains Personally Identifying Info Or Offensive Content: No

B4 Elaboration: No, the datasets we used do not contain any personally identifying information or offensive content. Therefore, no special anonymization or filtering steps were necessary.

B5 Documentation Of Artifacts: Yes

B6 Statistics For Data: N/A

C Computational Experiments: Yes

C1 Model Size And Budget: N/A

C2 Experimental Setup And Hyperparameters: Yes

C3 Descriptive Statistics: Yes

C4 Parameters For Packages: Yes

D Human Subjects Including Annotators: No

D1 Instructions Given To Participants: N/A

D2 Recruitment And Payment: N/A

D3 Data Consent: N/A

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: N/A

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: N/A

Author Submission Checklist: yes

Submission Number: 621

Loading