MLFormer: Unleashing Efficiency Without Attention for Multimodal Knowledge Graph Embedding

Published: 2025, Last Modified: 15 Jan 2026IEEE Trans. Comput. Soc. Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multimodal knowledge graphs (MMKGs) have gained widespread adoption across various domains. However, existing transformer-based methods for MMKG representation learning primarily focus on enhancing representation performance, while overlooking time and memory costs, which reduces model efficiency. To tackle these limitations, we introduce a multimodal lightweight transformer (MLFormer) model, which not only ensures robust representation capabilities but also considerably improves computational efficiency. We find that the self-attention mechanism in transformers leads to substantial performance overheads. As a result, we optimize the traditional MMKGE model in two aspects: modality processing and modality fusion, by incorporating a filter gate and Fourier transform. Our experimental results on real-world multimodal knowledge graph completion datasets demonstrate that MLFormer achieves significant improvements in computational efficiency while maintaining competitive performance.
Loading