SYN-TITAN: Synthetic Tabular Intelligence using Transformers and Adversarial Networks

Nikhil Singh

SYN-TITAN: Synthetic Tabular Intelligence using Transformers and Adversarial Networks

Nikhil Singh

20 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Data Augmentation, Data Privacy, Generative Adversarial Networks (GANs), Hybrid AI Models, Imbalanced Data, Large Language Models (LLMs), Machine Learning, Synthetic Data Generation, Tabular Data

Abstract: The growing need for privacy-preserving synthetic tabular data has led to the development of generative models, particularly generative adversarial networks (GANs) such as CTGAN (Conditional GAN) and Enhanced CTGAN. While these models have demonstrated success in tabular data synthesis, they suffer from mode collapse, weak rare-category representation, and limited domain adaptability, often requiring manual tuning for different datasets. Furthermore, GAN-based approaches lack contextual awareness, making them ineffective at preserving logical feature relationships and real-world constraints. This paper introduces SYN-TITAN (Synthetic Tabular Intelligence using Transformers and Adversarial Networks), a hybrid LLM-GAN framework that integrates large language models (LLMs) with adversarial learning to enhance data fidelity, privacy compliance, and scalability. LLMs assist in feature engineering, data augmentation, and evaluation, ensuring that synthetic data maintains semantic integrity. SYN-TITAN is benchmarked against CTGAN, Enhanced CTGAN, and other state-of-the-art synthetic data generators using public datasets, demonstrating superior statistical alignment, rare-category preservation, and domain adaptation. Our findings indicate that LLM-guided GAN training can significantly improve synthetic tabular data quality, addressing key challenges in privacy-sensitive domains such as healthcare and finance. This work provides a scalable and interpretable hybrid approach to synthetic data generation, paving the way for more context-aware, adaptable, and reliable synthetic data frameworks.

Primary Area: generative models

Submission Number: 24298

Loading