Abstract: Highlights•We propose the APIN for speech emotion recognition (SER). As far as we know, this is the first work to model the interaction between amplitude and phase for SER.•We introduce the A-P interaction module with transformer to enable the amplitude and phase to guide and complement each other.•The gated fusion with amplitude and phase gates is proposed to learn the complementarity representations.
Loading