Evaluating and Enhancing Large Language Models for Conversational Reasoning on Knowledge Graphs

Anonymous

Evaluating and Enhancing Large Language Models for Conversational Reasoning on Knowledge Graphs

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: The development of large language models (LLMs) has been catalyzed by advancements in pre-training techniques. These models have demonstrated robust reasoning capabilities through manually designed prompts. In this work, we evaluate the conversational reasoning capabilities of the current state-of-the-art LLM (GPT-4) on knowledge graphs (KGs). However, the performance of LLMs is constrained due to a lack of KG environment awareness and the difficulties in developing effective optimization mechanisms for intermediary reasoning stages. We further introduce LLM-ARK, a LLM grounded KG reasoning agent designed to deliver precise and adaptable predictions on KG paths. LLM-ARK leverages Full Textual Environment (FTE) prompt to assimilate state information within each reasoning step. We reframe the challenge of multi-hop reasoning on the KG as a sequential decision-making task. Utilizing the Proximal Policy Optimization (PPO) online policy gradient reinforcement learning algorithm, our model is optimized to learn from rich reward signals. Additionally, we conduct an evaluation of our model and GPT-4 on the OpenDialKG dataset. The experimental results reveal that LLaMA7B-ARK outperforms the current state-of-the-art model by 5.28 percentage points, with a performance rate of 36.39\% on the target@1 evaluation metric. Meanwhile, GPT-4 scored 14.91\%, further demonstrating the effectiveness of our methodology.

Paper Type: long

Research Area: Dialogue and Interactive Systems

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources, Data analysis, Theory

Languages Studied: English

Preprint Status: There is a non-anonymous preprint (URL specified in the next question).

A1: yes

A1 Section Or Justification: A.3 Limitation

A2: no

A3: yes

B: no

B1: no

B2: no

B3: no

B4: no

B5: no

B6: yes

B6 Section Or Justification: 4.1 Datasets and A.1 Data Format

C: yes

C1: yes

C1 Section Or Justification: 4.3 Implement Details

C2 Section Or Justification: 4.3 Implement Details and A.5 HyperParameters

C3 Section Or Justification: 4.5 Comparative Experiments and 4.6 Analysis Experiment

C4: yes

C4 Section Or Justification: 4.3 Implement Details

D: no

D1: no

D2: no

D3: no

D4: no

D5: no

E: yes

E1: no

0 Replies

Loading