Abstract: The main contribution of this paper is an approach for
introducing additional context into state-of-the-art general
object detection. To achieve this we first combine a state-ofthe-art classifier (Residual-101 [14]) with a fast detection
framework (SSD [18]). We then augment SSD+Residual101 with deconvolution layers to introduce additional largescale context in object detection and improve accuracy,
especially for small objects, calling our resulting system
DSSD for deconvolutional single shot detector. While these
two contributions are easily described at a high-level, a
naive implementation does not succeed. Instead we show
that carefully adding additional stages of learned transformations, specifically a module for feed-forward connections
in deconvolution and a new output module, enables this new
approach and forms a potential way forward for further detection research. Results are shown on both PASCAL VOC
and COCO detection. Our DSSD with 513 × 513 input
achieves 81.5% mAP on VOC2007 test, 80.0% mAP on
VOC2012 test, and 33.2% mAP on COCO, outperforming a state-of-the-art method R-FCN [3] on each dataset.
0 Replies
Loading