Keywords: prompt engineering, label engineering, multi-label, transformers, language, LLM, text, few-shot, automatic, prompting, classification, NLP, natural language processing
TL;DR: Reproducing results and trying new experiments on a natural language processing classification method using pre-trained large language models.
Abstract: We reproduce the results in the paper Automatic Multi‐Label Prompting: Simple and Interpretable Few‐Shot Classification by Wang et al. Using human prompt engineering can be long and expensive for text classification. In their paper they proposed an approach, called AMuLaP to do automatic label prompting for few‐shot classification and claimed competitive performance on the GLUE benchmark. We reproduce their results on 3 GLUE datasets and extend them to 2 new datasets. We used the author’s code with some additions to accommodate our new datasets and other models. We ran all experiments using a combination of hyperparameters given by the original authors over 5 seeds, mainly using RTX8000 for 125 hours. We validate the original paper’s claims by reproducing its metrics within the reported standard deviation, proving the method’s competitiveness. Our extended trials highlight the method’s potential applicability to real‐world data and reveal new considerations about prompt template, language model, and seed for optimal performance. The complete code implementation was available publicly with step‐by‐step instructions on how to get AMuLaP running with GLUE datasets. This made it easy to begin reproducing the work. Finding flagged and non‐flagged tweets from Donald Trump to create our dataset was easy thanks to thetrumparchive.com which compiled this. The original code is lengthy and lacks information on implementation dependencies, making it time‐consuming to understand. It is made only for GLUE tasks and RoBERTa models and needs modification to work on new experiments. Some language models are not built for mask completion, thus not directly suitable for AMuLaP, and required us to search for solutions extensively. Finally, as label engineering is also tied to prompt engineering, we find expanding the work through the template given in the code challenging without prior experience in manual prompt engineering. We did not feel the need to reach out to the authors as the method was well explained and simple to understand with basic probability knowledge and going through the code.
Paper Url: https://aclanthology.org/2022.naacl-main.401.pdf
Paper Review Url: https://openreview.net/pdf?id=D8DJN2-Zmkf
Paper Venue: ACL 2022
Supplementary Material: zip
Confirmation: The report pdf is generated from the provided camera ready Google Colab script, The report metadata is verified from the camera ready Google Colab script, The report contains correct author information., The report contains link to code and SWH metadata., The report follows the ReScience latex style guides as in the Reproducibility Report Template (https://paperswithcode.com/rc2022/registration)., The report contains the Reproducibility Summary in the first page., The latex .zip file is verified from the camera ready Google Colab script
Journal: ReScience Volume 9 Issue 2 Article 33