Abstract: Most existing multiple object tracking (MOT) methods that solely rely on appearance features struggle in tracking highly deformable objects. Other MOT methods that use motion clues to associate identities across frames have difficulty handling egocentric videos effectively or efficiently. In this work, we present DogThruGlasses, a large-scale deformable multi-object tracking dataset, with 150 videos and 73K annotated frames, which is collected exclusively by smart glasses. We also propose DETracker, a new MOT method that jointly detects and tracks deformable objects in egocentric videos. DETracker uses three novel modules, namely the motion disentanglement network (MDN), the patch association network (PAN) and the patch memory network (PMN), to explicitly tackle severe ego motion and track fast morphing target objects. DETracker is end-to-end trainable and achieves near real-time speed, which outperforms existing state-of-the-art method on DogThruGlasses and YouTube-Hand.
Loading