Abstract: Highlights•Text-guided pairwise cross-modal mapping modules and reconstruction modules.•Embedding linguistic information into emotion-related representation training.•Modality-specific representations are evaluated based on text-enhanced features.•We conduct experiments on three datasets demonstrating superior performance.
Loading