LogiCoTab: Controllable Tabular Data Synthesis with Logical Relationships Awareness

Published: 29 Oct 2024, Last Modified: 23 Jan 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0
Abstract: Tabular data is one of the most common data formats, and recent advancements in deep learning have driven significant progress in tabular data synthesis. However, the complexity of mixed-type distributions in tabular data forces existing synthesis methods to rely on lossy preprocessing, leading to significant logical inconsistencies in synthetic data. Moreover, most approaches lack flexible conditional control over the synthesis process. In this paper, we introduce a novel two-stage tabular data synthesis method LogiCoTab, which fundamentally mitigates logical inconsistencies and enhances controllability in synthetic data. In the logical awareness stage, we design a dual-module architecture to extract semantic features and uncover intricate inter-column logical relationships. In the controlled synthesis stage, we develop a diffusion-based generative model enabling highly flexible and precise conditional data generation. Extensive experiments show that LogiCoTab outperforms state-of-the-art methods on multiple datasets, significantly enhancing tabular data synthesis quality.
Loading