Abstract: Handling imbalanced datasets remains a critical challenge in financial machine-learning applications such as loan approval, credit scoring, and fraud detection. We present Imbalanced Financial Diffusion (Imb-FinDiff), a novel denoising diffusion framework designed to address class imbalance in financial tabular data. Our framework leverages embedding encodings for categorical and numerical attributes, effectively managing the complexities of mixed-type financial datasets. By incorporating a dual learning objective, (i) diffusion timestep noise and (ii) class label prediction, we synthesize minority class samples. Extensive experiments on diverse and real-world financial datasets demonstrate that Imb-FinDiff maintains the statistical properties of the original data while reducing bias caused by class imbalance. The minority class samples generated by Imb-FinDiff enhance the utility and fidelity of downstream machine learning classifiers.
Loading