Synergizing Large Language Models and Knowledge-based Reasoning for Interpretable Feature Engineering

Published: 29 Jan 2025, Last Modified: 29 Jan 2025WWW 2025 PosterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0
Track: Web mining and content analysis
Keywords: Automated Feature Engineering, Large Language Models, Knowledge Graphs, Semantic Web Reasoning, Interpretable Machine learning
Abstract: Feature engineering stands as a pivotal step in enhancing the performance of machine learning models, particularly with tabular data. However, traditional feature engineering methods are often time-consuming and require case-by-case domain knowledge. In addition, as machine learning systems become more common, interpretability becomes increasingly important, especially among domain experts. To this end, we propose ReaGen, an automated feature engineering (AutoFE) approach that combines the use of knowledge graphs (KGs) with large language models (LLMs) to generate interpretable features. ReaGen begins by symbolic reasoning over a knowledge graph to extract relevant information based on datasets description. Then, it uses several LLMs to iteratively generate meaningful features based on the retrieved information and the datasets description. Finally, to overcome challenges such as hallucinations and handling long contexts typical in LLMs, our model performs logical reasoning on the knowledge graph to ensure that the generated features maintain interpretability. ReaGen provides Python code for automatic feature generation and detailed explanations of feature utility. It leverages both LLM's internal knowledge and retrieved information from knowledge graphs. Extensive experiments on public datasets demonstrate that ReaGen significantly improves prediction accuracy while ensuring high interpretability through human-like explanations for each feature. This work highlights the potential of integrating large language models and knowledge graphs in feature engineering, paving the way for interpretable machine learning models.
Submission Number: 2269
Loading