Abstract: Multi-View Detection (MVD) is highly effective for occlusion reasoning in a crowded environment. While recent works using deep learning have made significant ad-vances in the field, they have overlooked the generalization aspect, which makes them impractical for real-world deployment. The key novelty of our work is to formalize three critical forms of generalization and propose experiments to evaluate them: generalization with i) a varying number of cameras, ii) varying camera positions, and fi-nally, iii) to new scenes. We find that existing state-of-the-art models show poor generalization by overfitting to a single scene and camera configuration. To address the concerns: (a) we propose a novel Generalized MVD (GMVD) dataset, assimilating diverse scenes with changing daytime, camera configurations, and a varying number of cameras, and (b) we discuss the properties essential to bring gener-alization to MVD and propose a barebones model incorpo-rating them. We present comprehensive set of experiments on WildTrack, MultiViewX and the GMVD datasets to moti-vate the necessity to evaluate the generalization abilities of MVD methods and to demonstrate the efficacy of the proposed approach. The code and dataset are available at https://github.com/jeetv/GMVD.
Loading