Abstract: We present a scalable framework designed to craft efficient lightweight models for video object
detection utilizing self-training and knowledge distillation techniques. We scrutinize methodologies
for the ideal selection of training images from video streams and the efficacy of model sharing
across numerous cameras. By advocating for a camera clustering methodology, we aim to diminish
the requisite number of models for training while augmenting the distillation dataset. The findings
affirm that proper camera clustering notably amplifies the accuracy of distilled models, eclipsing
the methodologies that employ distinct models for each camera or a universal model trained on the
aggregate camera data.
Loading