Keywords: padding, convolution, spatial bias
TL;DR: Main results reproduction of the paper "mind the paper" with additional ablation studies.
Abstract: Scope of Reproducibility
Convolution mechanism is widely adopted for a large variety of tasks like image classification or object detection.
Alsallakah et al. [1] demonstrate how this mechanism has some flaws caused by padding. Our aim is to reproduce the
following results
• Single Shot Detector blind spot on small object detection
• Single Shot Detector blind spot fix when changing padding mode from zeros to reflect
• Uneven application of padding on downsampling convolutional layers causes feature map erosion and lower
accuracy.
Once reproduced these results, we performed a series of ablation studies to understand the effect of related factors in a
CNN
• How does Batch Normalization interact with uneven application of padding?
• Which category between these: 1) padding modalities {zeros, reflect, circular, replicate} 2) with/without batch
normalization 3) with/without uneven application of padding is more shift robust on image classification?
Methodology
To reproduce paper claims we implemented all the experiments from scratch in order to have more reliable results.
The only external resource used in this article is a PyTorch implementation of Single Shot Detector made by NVIDIA
[6]. Furthermore, since the paper thesis is related not to a specific implementation but to a class of models, our
implementations fit inside the same category but with a different configuration. We have done so to stress paper claims
and to confirm their general validity.
To train the models for image classification, 48, we used a local Nvidia GeForce GTX 1660 with 6 GB of memory, 8
GB of RAM, and AMD Ryzen 5 2600X (12) @ 3.600GHz. The training holds for approximately 20 hours. The reason
to train such quantity of models are: 1) 4 padding modes 2) with/without Batch Normalization 3) with/without uneven
application of padding (i.e. input images zero/one padded) 4) 3 random seeds
Results
During our experiments, we found the same blind spots of paper authors but on a different location: for us, the blind
spot was close to the right border while paper claims it is at the top border. We fixed blind spot issue using reflect
padding instead of zeros like paper does. About uneven application of padding, we have a performance improvement
when comparing models with/without uneven application of padding but with different delta. In our experiments, the
accuracy improvement is around 0.8% while averagely of 0.4 % for the original paper. We believe that the cause is
a different model architecture and dataset. In this article, we adopted a simple Sequential CNN classifier on Letter
MNIST while the paper uses famous architectures like ResNet on Imagenet.
What was easy
• Obtain paper results about image classifier uneven application of padding worked at the first shot.
• Once found a good implementation of SSD reproduce results on evaluation was immediate.
What was difficult
• Find an image classifier architecture suitable for uneven application of padding tests like the article i.e. with
original input size the downsampling layers don’t see the right padded border while with one-pad images the
downsampling layers see all padded borders.
• Change padding mode of convolutional layers or disable Batch Normalization layers especially on Tensorflow
models.
• Understand how to plot SSD’s object confidence for all zones of the image
Communication with original authors
To make this article there wasn’t any contact with the original authors.
Paper Url: https://arxiv.org/pdf/2010.02178v1.pdf
Supplementary Material: zip
4 Replies
Loading