Abstract: Highlights•Vision UFormer: Dense depth prediction model combining Vision Transformer with a UNet.•Staged Training: Moving from easier to difficult data allows successful training.•Predictor reaches SOTA results, surpassing others in long-range natural environments•Depths are used for further applications, e.g., scene reconstruction or manipulation.
External IDs:dblp:journals/cg/PolasekCKB23
Loading