Keywords: Hate speech, SOC, Post-hoc explanation, regularization
Abstract: Scope of Reproducibility
For the GHC (a dataset), the most important difference between BERT+WR and BERT+SOC is the increase in recall. While, for Stormfront (a dataset), there are similar improvements for in-domain data and the NYT dataset. But, for verifying the claims we also have tried to run the same experiment on a new data-set.
Methodology
We have tried to re-implement the author’s code and verify the claims made in their original paper. We have experimented on NVIDIA Tesla GPU which was less efficient than the original author’s resource (NVIDIA GeForceRTX 2080 Ti).
Results
We have able to reproduce claims as mentioned in the following section 2 (Scope of Reproducibility) marked as point 2 and 3. But we are not on the same page with the authors for a few reported experiments mentioned as point 1 and 4 in the same section.
What was easy
The original authors provide code for most of the experiments presented in the paper. The code was easy to run and allowed us to verify the correctness of our re-implementation. The explanations in the code made the work pretty easy for us.
What was difficult
Training of the models was very time taking as we had to wait for hours to train the model and the resources used by the original authors are not readily available everywhere.
Communication with original authors
We were in contact with the second author via E-mail, as he was responsive and shared details that were not explicitly mentioned in the paper.
Paper Url: https://openreview.net/forum?id=M2bWgAjefM&referrer=%5BML%20Reproducibility%20Challenge%202020%5D(%2Fgroup%3Fid%3DML_Reproducibility_Challenge%2F2020)
Supplementary Material: zip
3 Replies
Loading