TABGEN-RAG: Iterative Retrieval for Tabular Data Generation with Large Language Models

Published: 10 Oct 2024, Last Modified: 30 Oct 2024TRL @ NeurIPS 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: RAG, Tabular data generation
TL;DR: A new RAG framework for tabular data generation
Abstract: Large Language models (LLMs) have achieved encouraging results on tabular data generation. However, existing approaches require fine-tuning, which is computationally expensive. This paper explores an alternative: prompting a fixed LLM with in-context examples. Two main challenges arise: 1) presenting the entire training table to LLMs with limited input token length, and 2) ensuring LLMs learn effectively from the in-context examples. To address these challenges, we propose a novel retrieval-augmented generation (RAG) framework: TabGen-RAG, to enhance the in-context learning ability of LLMs for tabular data generation. TabGEN-RAG operates iteratively, retrieving a subset of real samples that represent the residual between currently generated samples and true data. Extensive experiments on five real-world tabular datasets demonstrate that TabGEN-RAG significantly improves the quality of generated samples.
Submission Number: 79
Loading