Keywords: language model, semantics, logic, concept, benchmark, semantic field, definition, selection criteria, conceptual integrity
TL;DR: We propose a framework of semantic compositionality of concepts that is used to derive a benchmark dataset for conceptual integrity testing in generative language models.
Abstract: Systematic investigation of the understanding of scientific concepts has not received much attention in language models. This gap can be bridged by a formalized theory of conceptual semantics that maps naturally to instruction templates for natural language agents. We propose a simple framework expressible in first-order logic to address the semantic compositionality of scientific concepts, noun phrases and conceptual hierarchies. The framework is used to derive a conceptual integrity benchmark with 6 tasks that are applied to a selection of 187 concepts from the domains of biology, chemistry and medicine. The performance of 15 state-of-the art language models is evaluated relative to baseline information collected from various knowledge repositories. We see a strong positive correlation between model size and performance. External validity of the benchmark is demonstrated by a high correlation with other benchmarks that measure related skills. It is suggested that the proposed framework and associated benchmark provide a practical template for developing conceptual integrity benchmarks in a wide array of technical or scientific domains.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 505
Loading