Abstract: Highlights•Innovative model for multi-modal reasoning.•Cross-Modal Spatial-Channel (CM-SC) attention mechanism.•Effective higher-order interaction capturing.•Scalable attention mechanism and facilitate seamless integration.•Improved computational efficiency and enhanced performance metrics.
External IDs:doi:10.1016/j.displa.2024.102941
Loading