Abstract: Simultaneous feature selection and non-linear function estimation is challenging in modeling, especially in high-dimensional settings where the number of variables exceeds the available sample size. In this article, we investigate the problem of feature selection in neural networks. Although the group least absolute shrinkage and selection operator (LASSO) has been utilized to select variables for learning with neural networks, it tends to select unimportant variables into the model to compensate for its over-shrinkage. To overcome this limitation, we propose a framework of sparse-input neural networks using group concave regularization for feature selection in both low-dimensional and high-dimensional settings. The main idea is to apply a proper concave penalty to the $l_2$ norm of weights from all outgoing connections of each input node, and thus obtain a neural net that only uses a small subset of the original variables. In addition, we develop an effective algorithm based on backward path-wise optimization to yield stable solution paths, in order to tackle the challenge of complex optimization landscapes. We provide a rigorous theoretical analysis of the proposed framework, establishing finite-sample guarantees for both variable selection consistency and prediction accuracy. These results are supported by extensive simulation studies and real data applications, which demonstrate the finite-sample performance of the estimator in feature selection and prediction across continuous, binary, and time-to-event outcomes.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: We are grateful to the Action Editor and the reviewers for their constructive feedback and the “accept with minor revision” decision. We have addressed all requested revisions in the final, deanonymized camera-ready manuscript.
Below is a point-by-point summary of the changes:
1. Missing Reference: Lederer (2022)
Comment: The reference Lederer (2022) is not included in the references (it seems there may be a tex issue on p. 2).
Action: We have corrected this. The full reference for Lederer (2022) has been added to the bibliography, and the in-text citation now renders correctly.
2. Discussion on Condition 4.3 and ReLU
Comment: Condition 4.3 assumes the exists of the third derivative. Does ReLU network satisfy this condition? Please add discussion on it.
Action: We have added two remarks to the manuscript to be precise about the limits of our theory and to justify our practical implementation:
(1) In Section 4.2, immediately following the discussion of Condition 4.3, we added the following remark to clarify that our theory formally applies to smooth activations:
“Note that the bounded third-derivative condition is not strictly satisfied by non-smooth activations like ReLU (Nair and Hinton, 2010), which, although computationally efficient, is non-differentiable at the origin. Consequently, our theoretical guarantees apply most directly to the case of smooth activations, such as Softplus (Glorot et al., 2011) or GeLU (Hendrycks and Gimpel, 2016). In Section 5, we discuss the practical implementation using ReLU, where we observe empirical performance consistent with the theoretical properties established for smooth activations.”
(2) In the second paragraph of Section 5, we added a remark to justify our use of ReLU in the numerical studies:
“As discussed in Section 4.2, ReLU does not strictly satisfy the bounded third-derivative condition of Condition 4.3, since it is non-differentiable at the origin. We used ReLU in our simulations due to its computational efficiency and strong empirical performance. The results obtained with ReLU networks, presented in this section, closely align with the theoretical guarantees, as the non-differentiable points have measure zero for continuous input distributions, and ReLU can be viewed as the limit of smooth activations (e.g., Softplus) for which the condition formally holds.”
3. Figure 1 Caption Correction
Comment: The text still needs to be refined, e.g. Figure 1 caption mentions red curves which are in green.
Action: We have corrected the caption for Figure 1. The text now accurately refers to “green” lines for informative variables and gray dashed lines for nuisance variables, matching the figure’s legend. In addition, we have also performed a final proofread and corrected minor typographical errors.
We believe these revisions fully address the reviewers’ comments. We thank you again for your time and consideration.
Code: https://github.com/r08in/GCRNN
Assigned Action Editor: ~Bryon_Aragam1
Submission Number: 5013
Loading