Genetic Programming with Multi-Task Feature Selection for Alzheimer's Disease Diagnosis

Published: 01 Jan 2024, Last Modified: 20 Nov 2024CEC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Alzheimer's disease (AD) has been the most common cause of dementia making cognitive score prediction and important feature identification crucial for its diagnosis. Although sparse linear regression has been used for this purpose due to its simplicity, it often selects an excessive number of features to track the disease and assumes a linear relationship between input and output, which might not always hold. To address these limitations, genetic programming-based symbolic regression (GPSR) algorithms have been proposed. GPSR can select the important features by exploring the feature space and learning a regression model without any assumption of model structure. However, the generalization ability of existing GPSR methods still needs to be improved. Considering the multiple related prediction tasks in AD studies, this work proposes a new method called linear scaled GPSR with multi-task feature selection (LSGPMTFS), to promote the prediction performance of each task by knowledge-sharing among multiple tasks. LSGPMTFS has two stages. The first stage learns a specific feature subset for each task. In the second stage, the model for each task is searched on the union of feature subsets selected from the first stage. The experimental results on authentic AD datasets demonstrate that the proposed algorithm can select a small set of important features with better learning and generalization performance compared with other GPSR methods.
Loading