Abstract: Highlights•Facial expressions are limited in the ability to reflect individuals’ emotions.•Combining micro-expressions with physiological signals improves latent emotion recognition.•1D separable and mixable depthwise inception network effectively extracts features from diverse physiological signals.•Standardised normal distribution weighted temporal aggregation block reconstructs informative maps of frames.•A guided attention module that achieves multimodal feature fusion.
Loading