Abstract: In this paper, we propose an efficient way to detect objects in 360\(^{\circ }\) videos in order to boost the performance of tracking on the same. Though extensive work has been done in the field of 2D video processing, the domain of 360\(^{\circ }\) video processing has not been explored much yet, as it poses difficulties such as (1) unavailability of the annotated dataset (2) severe geometric distortions at panoramic poles of the image and (3) high resolution of the media which requires high computation capable machinery. The State-of-the-art detection algorithm involves the use of CNN (Convolution Neural Networks) trained on a large dataset. Faster RCNN, SSD, YOLO, YOLO9000, YOLOv3 etc. are some of the detection algorithms that use CNN. Among these, though YOLOv3 might not be the most accurate, it is the fastest, and this trade-off between speed and accuracy is acceptable. We improvise upon this algorithm, to make it suitable for the 360\(^{\circ }\) dataset. We propose YOLO360, a CNN network to detect objects in 360\(^{\circ }\) videos and thus increase the tracking precision and accuracy. This is achieved by performing transfer learning on YOLOv3 with the manually annotated dataset.
Loading