Cosine Similarity as Logits?: A Scalable Knowledge Probe Using Embedding Vectors from Generative Language Models

Cosine Similarity as Logits?: A Scalable Knowledge Probe Using Embedding Vectors from Generative Language Models

ACL ARR 2025 July Submission1354 Authors

29 Jul 2025 (modified: 04 Sept 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recently, the use of pretrained language models (PLMs) as soft knowledge bases has gained growing interest, sparking the development of knowledge probes to evaluate their factual knowledge retrieval capabilities. However, existing knowledge probes for generative PLMs that support multi-token entities exhibit quadratic time complexity $\mathcal{O}(n^2)$, limiting the size of knowledge graphs used for probing. To address this, we propose DEcoder Embedding-based Relational (DEER) probe, utilizing embedding vectors extracted from generative PLMs. DEER probe achieves effective time complexity of linear order $\mathcal{O}(n)$, supports rank-based evaluation metrics including Hit@$k$, handles multi-token entity names and enables probing whilst disambiguation of homographic tail-enity names. We empirically show that DEER-probe correlates with existing knowledge probes, validating its probing capability, and we demonstrate the practical benefits of its improved scalability.

Paper Type: Short

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: interpretability, knowledge base QA, benchmarking, evaluation

Contribution Types: Model analysis & interpretability, Approaches low compute settings-efficiency

Languages Studied: English

Previous URL: https://openreview.net/forum?id=FtOD8CdSi5

Explanation Of Revisions PDF: pdf

Reassignment Request Area Chair: Yes, I want a different area chair for our submission

Reassignment Request Reviewers: Yes, I want a different set of reviewers

Justification For Not Keeping Action Editor Or Reviewers: The focus of our paper has altered since last submission. We believe reviewers with experties aligned with interpretability would be best suited as a result of this alternation.

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: N/A

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: Section 5.1, Appendix C3

B2 Discuss The License For Artifacts: Yes

B2 Elaboration: Section 8. Ethical Considerations

B3 Artifact Use Consistent With Intended Use: Yes

B3 Elaboration: Section 8. Ethical Considerations

B4 Data Contains Personally Identifying Info Or Offensive Content: N/A

B5 Documentation Of Artifacts: N/A

B6 Statistics For Data: Yes

B6 Elaboration: Table 3 Appendix

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: Section 5. Experiments

C2 Experimental Setup And Hyperparameters: N/A

C3 Descriptive Statistics: N/A

C4 Parameters For Packages: N/A

D Human Subjects Including Annotators: No

D1 Instructions Given To Participants: N/A

D2 Recruitment And Payment: N/A

D3 Data Consent: N/A

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: N/A

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: N/A

Author Submission Checklist: yes

Submission Number: 1354

Loading