Abstract: Domestic robots could eventually transform our
lives, but safely operating in home environments requires a rich
understanding of indoor scenes. Learning-based techniques for
scene segmentation require large-scale, pixel-level annotations,
which are laborious and expensive to collect. We propose
an automatic method for pixel-wise semantic annotation of
video sequences, that gathers cues from object detectors and
indoor 3D room-layout estimation and then annotates all the
image pixels in an energy minimization framework. Extensive
experiments on a publicly available video dataset (SUN3D)
evaluate the approach and demonstrate its effectiveness.
0 Replies
Loading