Abstract: Interactive segmentation pursues generating high-quality pixel-level predictions with a few user-provided clicks, which is gaining attention for its convenience in segmentation data annotation. Users are allowed to iteratively refine the prediction by adding clicks until the result is satisfactory. Existing interactive methods usually transform the clicks into a set of localization maps by Euclidian distance computation or RGB texture extraction to guide the segmentation, which makes the click transformation a core module in interactive segmentation networks. However, when adopted in human images where large poses, occlusions, and bad illuminations are prevailing, prior transformation methods tend to cause uncorrectable overlapping across localization maps which are difficult to form a good match among human parts. Furthermore, the inappropriately transformed information is hard to be refined with the static transformation manner which is out of tune with the dynamically refined interaction process. Hence, we design a dynamic transformation scheme for interactive human parsing (IHP) named Dynamic Interaction Dilation Net (DID-Net), which serves as an initial attempt to break the limitations of static transformation while capturing long-range dependencies of clicks within each human part. Specifically, we construct a Dynamic Dilation Module (DD-Module) to dilate clicks radially in several directions assisted by human body edge detection to refine the dilation quality in each interaction iteration. Furthermore, we propose an Adaptive Interaction Excitation Block (AIE-Block) to exploit potential semantic clues buried in the dilated clicks. Our DID-Net achieves state-of-the-art performance on 3 public human parsing benchmarks.
Loading