Abstract: Highlights•Captures Temporal Variations: Effectively handles previously overlooked time-based changes in frame encoding.•Innovates Frame Encoding: Improves spatial and temporal context capture, potentially replacing self-attention mechanisms.•Hierarchical Memory Grouping: Introduces a two-tier strategy for efficient feature capture and memory utilization.•Enhances Feature Quality: Provides a richer understanding of video sequences through advanced memory-based attention.•Empirical Superiority: Demonstrates consistent performance improvements across key SVOS benchmarks.
Loading