Evaluating and Enhancing Large Language Models for Conversational Reasoning on Knowledge GraphsDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: The development of large language models (LLMs) has been catalyzed by advancements in pre-training techniques. These models have demonstrated robust reasoning capabilities through manually designed prompts. In this work, we evaluate the conversational reasoning capabilities of the current state-of-the-art LLM (GPT-4) on knowledge graphs (KGs). However, the performance of LLMs is constrained due to a lack of KG environment awareness and the difficulties in developing effective optimization mechanisms for intermediary reasoning stages. We further introduce LLM-ARK, a LLM grounded KG reasoning agent designed to deliver precise and adaptable predictions on KG paths. LLM-ARK leverages Full Textual Environment (FTE) prompt to assimilate state information within each reasoning step. We reframe the challenge of multi-hop reasoning on the KG as a sequential decision-making task. Utilizing the Proximal Policy Optimization (PPO) online policy gradient reinforcement learning algorithm, our model is optimized to learn from rich reward signals. Additionally, we conduct an evaluation of our model and GPT-4 on the OpenDialKG dataset. The experimental results reveal that LLaMA7B-ARK outperforms the current state-of-the-art model by 5.28 percentage points, with a performance rate of 36.39\% on the target@1 evaluation metric. Meanwhile, GPT-4 scored 14.91\%, further demonstrating the effectiveness of our methodology.
Paper Type: long
Research Area: Dialogue and Interactive Systems
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources, Data analysis, Theory
Languages Studied: English
Preprint Status: There is a non-anonymous preprint (URL specified in the next question).
A1: yes
A1 Section Or Justification: A.3 Limitation
A2: no
A3: yes
B: no
B1: no
B2: no
B3: no
B4: no
B5: no
B6: yes
B6 Section Or Justification: 4.1 Datasets and A.1 Data Format
C: yes
C1: yes
C1 Section Or Justification: 4.3 Implement Details
C2 Section Or Justification: 4.3 Implement Details and A.5 HyperParameters
C3 Section Or Justification: 4.5 Comparative Experiments and 4.6 Analysis Experiment
C4: yes
C4 Section Or Justification: 4.3 Implement Details
D: no
D1: no
D2: no
D3: no
D4: no
D5: no
E: yes
E1: no
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview