Federated Feature Transformation with Sample-Aware Calibration and Local–Global Sequence Fusion

ICLR 2026 Conference Submission14959 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Automatic Feature Transformation, Tabular Data, Federated Learning
Abstract: Tabular data plays a crucial role in numerous real-world decision-making applica- tions, but extracting valuable insights often requires sophisticated feature transfor- mations. These transformations mathematically transform raw data, significantly improving predictive performance. In practice, tabular datasets are frequently fragmented across multiple clients due to widespread data distribution, privacy constraints, and data silos, making it challenging to derive unified and generalized insights. To address these issues, we propose a novel Federated Feature Transfor- mation (FEDFT) framework that enables collaborative learning while preserving data privacy. In this framework, each local client independently computes feature transformation sequences and evaluates the corresponding model performances. Instead of exchanging sensitive original data, clients transmit these transforma- tion sequences and performance metrics to a central global server. The server then compresses and encodes the aggregated knowledge into a unified embedding space, facilitating the identification of optimal feature transformation sequences. To ensure accurate and unbiased aggregation, we employ a sample-aware weight- ing strategy, assigning higher weights to clients with larger, more diverse, and numerically stable datasets, as their performance metrics are statistically reliable and representative. We also incorporate a server-side calibration mechanism to adaptively refine the unified embedding space, mitigating bias from outlier data distributions. Furthermore, to ensure optimal transformation sequences at both global and local scales, the globally optimal sequences are disseminated back to local clients. We subsequently develop a sequence fusion strategy that blends these globally optimal features with essential non-overlapping local transforma- tions critical for local predictions. Extensive experiments are conducted to demon- strate the efficiency, effectiveness, and robustness of our framework. Code and data are publicly available
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 14959
Loading