TABGEN-RAG: Iterative Retrieval for Tabular Data Generation with Large Language Models

Liancheng Fang; Aiwei Liu; Hengrui Zhang; Henry Peng Zou; Weizhi Zhang; Philip S. Yu

TABGEN-RAG: Iterative Retrieval for Tabular Data Generation with Large Language Models

Liancheng Fang, Aiwei Liu, Hengrui Zhang, Henry Peng Zou, Weizhi Zhang, Philip S. Yu

Published: 10 Oct 2024, Last Modified: 30 Oct 2024TRL @ NeurIPS 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: RAG, Tabular data generation

TL;DR: A new RAG framework for tabular data generation

Abstract: Large Language models (LLMs) have achieved encouraging results on tabular data generation. However, existing approaches require fine-tuning, which is computationally expensive. This paper explores an alternative: prompting a fixed LLM with in-context examples. Two main challenges arise: 1) presenting the entire training table to LLMs with limited input token length, and 2) ensuring LLMs learn effectively from the in-context examples. To address these challenges, we propose a novel retrieval-augmented generation (RAG) framework: TabGen-RAG, to enhance the in-context learning ability of LLMs for tabular data generation. TabGEN-RAG operates iteratively, retrieving a subset of real samples that represent the residual between currently generated samples and true data. Extensive experiments on five real-world tabular datasets demonstrate that TabGEN-RAG significantly improves the quality of generated samples.

Submission Number: 79

Loading