- Abstract: Current Convolutional Neural Network (CNN)-based object detection models adopt strictly feedforward inference to predict the final detection results. However, the widely used one-way inference is agnostic to the global image context and the interplay between input image and task semantics. In this work, we present a general technique to improve off-the-shelf CNN-based object detection models in the inference stage without re-training, architecture modification or ground-truth requirements. We propose an iterative, bottom-up and top-down inference mechanism, which is named conscious inference, as it is inspired by prevalent models for human consciousness with top-down guidance and temporal persistence. While the downstream pass accumulates category-specific evidence over time, it subsequently affects the proposal calculation and the final detection. Feature activations are updated in line with no additional memory cost. Our approach advances the state of the art using popular detection models (Faster-RCNN, YOLOv2, YOLOv3) on 2D object detection and 6D object pose estimation.
- Keywords: consciousness, conscious inference, object detection, object pose estimation