Abstract: Capsule networks aim to parse images into a hierarchy of objects, parts and relations. While
promising, they remain limited by an inability to
learn effective low level part descriptions. To address this issue we propose a way to learn primary
capsule encoders that detect atomic parts from a
single image. During training we exploit motion
as a powerful perceptual cue for part definition,
with an expressive decoder for part generation
within a layered image model with occlusion. Experiments demonstrate robust part discovery in
the presence of multiple objects, cluttered backgrounds, and occlusion. The part decoder infers
the underlying shape masks, effectively filling
in occluded regions of the detected shapes. We
evaluate FlowCapsules on unsupervised part segmentation and unsupervised image classification.
0 Replies
Loading