Feature selection with multi-class logistic regression

Published: 2023, Last Modified: 13 Nov 2024Neurocomputing 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Feature selection can help to reduce data redundancy and improve algorithm performance in actual tasks. Most of the embedded feature selection models are constructed based on square loss and hinge loss. However, these models based on the square loss cannot directly evaluate the discriminability of the samples in the feature subspace, and these methods based on the hinge loss are difficult to solve due to their complex objective functions. To deal with these problems, a Feature Selection method with Multi-class Logistic Regression (FSMLR) is proposed in this paper. Firstly, we construct a linear function to measure the difference between the distance from samples to their regression hyperplane and the distance from these samples to regression hyperplanes of other classes, which could be used to strengthen the discriminant property of the embedded model. Then, we design a re-weighting matrix with a ℓ2,0<math><mrow is="true"><msub is="true"><mrow is="true"><mi is="true">ℓ</mi></mrow><mrow is="true"><mn is="true">2</mn><mo is="true">,</mo><mn is="true">0</mn></mrow></msub></mrow></math>-norm sparse condition as well as a discrete condition, which is used to select features in the subspace. Considering that it is difficult to solve the re-weighting matrix with the discrete and sparse conditions in an optimization problem, we relax these two conditions and present a feature selection model via a re-weighted multi-class logistic regression with the two relaxed constraints. Finally, we add the F-norm regularization in our model to avoid overfitting, and its unconstrained equivalent transformation with ℓ2,p<math><mrow is="true"><msub is="true"><mrow is="true"><mi is="true">ℓ</mi></mrow><mrow is="true"><mn is="true">2</mn><mo is="true">,</mo><mi is="true">p</mi></mrow></msub></mrow></math>-norm regularization is derived to explore the function of the re-weighting matrix. The gradient descent algorithm could be used to solve the FSMLR. Especially, when the regularization term in the equivalence problem is set to ℓ2,1<math><mrow is="true"><msub is="true"><mrow is="true"><mi is="true">ℓ</mi></mrow><mrow is="true"><mn is="true">2</mn><mo is="true">,</mo><mn is="true">1</mn></mrow></msub></mrow></math>-norm, the global optimal solution can be obtained. Extensive experiments on multiple public data sets prove that FSMLR outperforms other competitors.
Loading