Tail-Aware Reconstruction of Incomplete Label Distributions with Low-Rank and Sparse Modeling

Zhiqiang Kou, Haoyuan Xuan, Jingyu Zhu, Hailin Wang, Ming-kun Xie, Changwei Wang, Jing Wang, Yuheng Jia, Xin Geng

Published: 01 Jan 2025, Last Modified: 07 Nov 2025IEEE Transactions on Circuits and Systems for Video TechnologyEveryoneRevisionsCC BY-SA 4.0

Abstract: Label Distribution Learning (LDL) is a novel machine learning paradigm that addresses the problem of label ambiguity and has found widespread applications. However, obtaining complete label distributions in real-world scenarios is challenging, which has led to the emergence of Incomplete Label Distribution Learning (InLDL). Existing InLDL methods attempt to utilize low-rank label correlations to recover the complete label distribution. However, we find that real-world LDL datasets have an imbalanced nature; that is, the sum of the description degrees for normal labels is significantly larger than that for tail labels, which disrupts the low-rank assumption underlying the recovery of the label distribution. To solve the above problem, we propose Incomplete and Imbalance Label Distribution Learning (I2LDL), which makes the use of low-rank label correlations more reasonable for InLDL. Our method decomposes the recovered label distribution matrix into a low-rank component for frequent labels and a sparse component for tail labels, effectively capturing the structure of both head and tail labels. We further require that the entries in the observed positions of the recovered label distribution matrix be close to the observed values, and that the recovered label distribution for every instance forms a probability simplex (i.e., nonnegative entries summing to unity). Finally, the proposed model is optimized via the Alternating Direction Method of Multipliers (ADMM). We provide a theoretical analysis of its exact recovery guarantee under standard assumptions of incoherence, sparsity, and sufficient sampling. Furthermore, we establish a generalization error bound based on Rademacher complexity, offering theoretical insights into the learning performance of our method. Extensive experiments on 16 real-world datasets demonstrate the effectiveness and robustness of our framework compared to existing InLDL methods. The code is available at https://anonymous.4open.science/r/IncomLDL-tailaware-C021.

External IDs:doi:10.1109/tcsvt.2025.3600067