Abstract: In this paper, we reproduce some of the experiments of text classification by fine tuning pre-trained language model on the six English
data-sets described in Howard and Ruder (2018) (verification). Then we investigate applicability of the model as is (pre-trained on
English) by conducting additional experiments on three other non-English data-sets that are not in the original paper (extension). For
the verification experiments, we didn’t generate the exact same numbers as the original paper, however, the replication results are in the
same range as compared to the baselines reported for comparison purposes. We attribute this to the limitation in computational resources
which forced us to run on smaller batch sizes and for fewer number of epochs. Otherwise, we followed in the footsteps of the author to
the best of our abilities (e.g. the libraries1 , tutorials2 , hyper-parameters and transfer learning methodology). We report implementation
details as well as lessons learned in the appendices.
Loading