A novel personality detection method based on high-dimensional psycholinguistic features and improved distributed Gray Wolf Optimizer for feature selection
Abstract: Existing personality detection methods based on user-generated text have two major limitations.
First, they rely too much on pre-trained language models to ignore the sentiment information
in psycholinguistic features. Secondly, they have no consensus on the psycholinguistic feature
selection, resulting in the insufficient analysis of sentiment information. To tackle these issues,
we propose a novel personality detection method based on high-dimensional psycholinguistic
features and improved distributed Gray Wolf Optimizer (GWO) for feature selection (IDGWOFS).
Specifically, we introduced the Gaussian Chaos Map-based initialization and neighbor search
strategy into the original GWO to improve the performance of feature selection. To eliminate
the bias generated when using mutual information to select features, we adopt symmetric
uncertainty (SU) instead of mutual information as the evaluation for correlation and redundancy
to construct the fitness function, which can balance the correlation between features–labels
and the redundancy between features–features. Finally, we improve the common Spark-based
parallelization design of GWO by parallelizing only the fitness computation steps to improve
the efficiency of IDGWOFS. The experiments indicate that our proposed method obtains average
accuracy improvements of 3.81% and 2.19%, and average F1 improvements of 5.17% and 5.8%
on Essays and Kaggle MBTI dataset, respectively. Furthermore, IDGWOFS has good convergence
and scalability.
Loading