Abstract: Thangka cultural elements detection aims to locate and identify instances in Thangka. However, as a unique form of pictorial art, Thangka exhibits distinct spatial structures that deviate significantly from general images in scale and density. Therefore, it is challenging for most state-of-the-art detectors designed for natural scenes to handle Thangka cultural elements detection effectively. To overcome this issue, we propose a multi-scale and dense object detector referred as MDDet. It embeds a multi-scale receptive field fusion module (MRF) that enlarges the receptive field while capturing the spatial and channel relationships at different scales, which significantly enriches the multi-scale features extracted from the backbone. In addition, we introduce a threshold-slicing aided hyper inference (T-SAHI) scheme, which adaptively slices images in dense scenarios to aid with dense object detection in the test time. We thoroughly evaluate our method, and MDDet outperforms the prior art by a clear margin on the Thangka dataset, achieving an absolute improvement of 1.9% in average precision (AP). For the challenging medium and small objects in Thangka, MDDet obtains wide margins of 12% and 3.7% in accuracy improvement, respectively. It also shows strong generalization ability when evaluated on general scenarios, e.g., Pascal VOC 2007 and MS COCO, validating the role of MDDet in object detection.
Loading