Abstract: Collecting overhead imagery using an event camera is desirable due to the energy efficiency of the image sensor compared to standard cameras. However, event cameras complicate downstream image processing, especially for complex tasks such as object detection. In this paper, we investigate the viability of event streams for overhead object detection. We demonstrate that across a number of standard modeling approaches, there is a significant gap in performance between dense event representations and corresponding RGB frames. We establish that this gap is, in part, due to a lack of over-lap between the event representations and the pre-training data used to initialize the weights of the object detectors. Then, we apply event-to-video conversion models that convert event streams into gray-scale video to close this gap. We demonstrate that this approach results in a large performance increase, outperforming even event-specific object detection Fig. 1. Comparison of the same VisDrone-VID [1] scene techniques on our overhead target task. These results suggest using various input representations. Top Left: Event Count that better alignment between event representations and exist-Map. Top Right: FireNet [2] Gray-scale Frame. Bottom: ing large pre-trained models may result in greater short-term Original RGB Frame. performance gains compared to end-to-end event-specific architectural improvements.
Loading