Abstract: Highlights•A novel method using the Transformer architecture for stereo confidence estimation.•Combining global content and local detail information using a pooling transformer.•Enhancing estimation accuracy by capturing context features in bi-modal input.•The proposed method outperforms state-of-the-art CNNs on multiple benchmarks.
Loading