Abstract: Highlights•Evolving transformer and deep networks are devised for audio emotion recognition.•A Cluster Search Optimisation algorithm is proposed to adapt hyperparameters.•It incorporates Noise Tempered K-means clustering and Cluster Distance Improvement.•The Q-learning algorithm is used to optimise search behaviours.•Our study indicates CSO-optimised deep networks’ effectiveness across datasets.
Loading