Abstract: Efficient variable selection is crucial for optimizing the performance and interpretability of machine learning models. However, as datasets expand in sample size and dimensionality, traditional methods may encounter computational challenges and accuracy issues. To tackle this problem in the realm of big data, we propose a novel approach referred to as \textit{REinforcement learning for Variable Selection} (REVS) within the Markov Decision Process (MDP) framework. By prioritizing the long-term variable selection accuracy, we propose a dynamic policy to adjust the candidate important variable set, guiding it toward convergence to the true variable set. To enhance computational efficiency, we present an online policy iteration algorithm integrated with temporal difference learning for sequential policy improvement. Our experiments demonstrate superior performance of the method, especially in big data scenarios where it substantially reduces computation time compared to alternative methods. Furthermore, in high-dimensional feature sets with strong correlations, our approach enhances variable selection accuracy by leveraging cumulative reward information from batch data.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Pan_Xu1
Submission Number: 3689
Loading