Abstract: This paper explores how to harvest precise object seg mentation masks while minimizing the human interaction cost. To achieve this, we propose an Inside-Outside Guid ance (IOG) approach in this work. Concretely, we lever age an inside point that is clicked near the object center and two outside points at the symmetrical corner locations (top-left and bottom-right or top-right and bottom-left) of a tight bounding box that encloses the target object. This re sults in a total of one foreground click and four background clicks for segmentation. The advantages of our IOG are four-fold: 1) the two outside points can help to remove dis tractions from other objects or background; 2) the inside point can help to eliminate the unrelated regions inside the bounding box; 3) the inside and outside points are easily identified, reducing the confusion raised by the state-of-the art DEXTR in labeling some extreme samples; 4) our ap proach naturally supports additional clicks annotations for further correction. Despite its simplicity, our IOG not only achieves state-of-the-art performance on several popular benchmarks, but also demonstrates strong generalization capability across different domains such as street scenes, aerial imagery and medical images, without fine-tuning. In addition, we also propose a simple two-stage solution that enables our IOG to produce high quality instance segmen tation masks from existing datasets with off-the-shelf bound ing boxes such as ImageNet and Open Images, demonstrat ing the superiority of our IOG as an annotation tool.
Loading