Self-supervised learning of monocular depth and ego-motion estimation for non-rigid scenes in wireless capsule endoscopy videos
Abstract: Highlights•Transformer improves pose estimation with self-attention mechanism.•Multiple frame sampling intervals augment training diversity.•Binary learnable masks remove invalid self-supervisions.
Loading