Keywords: Knowledge Representation, Large Language Models
TL;DR: This paper explores whether LLMs can create and use their own formal vocabulary to represent a complex narrative text.
Abstract: Advances in large language models have created new opportunities for the translation of natural language into formal representations for use in symbolic reasoning systems, particularly when the formal vocabulary is well-specified with sufficient training examples. In this paper, we explore the more challenging problem of automating the knowledge engineering task of authoring the formal vocabulary itself, based on the representational needs of the input text. We describe a methodology that incrementally builds a formal vocabulary of first-order predicate-argument definitions and constant terms by prompting a large language model and restricting its output generations to adhere to tightly constrained formalisms. We evaluate this methodology in its application to a challenging literary text, William Shakespeare's narrative poem "Venus and Adonis" (1593), with a focus on how well new vocabulary is reused by the large language model in after it has been introduced. We then investigate the semantic content of the generated formalisms by manually sorting the vocabulary into a taxonomy of 48 foundational areas used in knowledge representation research. Our results point to several challenges in fully-automated knowledge engineering pipelines, but also point to new opportunities in using large language models to support axiomization of foundational theories.
Paper Track: Technical paper
Submission Number: 39
Loading