Keywords: BERT, fastText, embeddings, language models
TL;DR: The paper describes a reproducibility attempt for a study investigating static and contextualized word embeddings generated with fastText and BERT models for 4+1 language varieties.
Abstract: This report summarizes our efforts to reproduce the results presented in the ACL 2021 paper Exploring the Representation of Word Meanings in Context: A Case Study on Homonymy and Synonymy by Marcos Garcia. Scope of Reproducibility The original author looks at both static and contextualized word embeddings to assess their ability to adequately represent different lexical-semantic relations, such as homonymy and synonymy. While the author describes experiments with a number of contextualized and static models, we limit our reproducibility attempt to the results reported for BERT and fastText. We also extend the original experiment by compiling a new Italian dataset and report our findings for this additional resource. Methodology We rely on the existing code-base, modifying it where necessary and integrating it with a few additional scripts for data preparation and statistics computation. Our code is available at https://anonymous.4open.science/r/repro-acl21/. Results We partially reproduce the original scores. Nonetheless, the hypothesis formulated by the original author are still corroborated. What was easy Overall, the paper is clear and provides a good overview of the experiments. It outlines the structure of the data-sets and how they were compiled, making them publicly available together with a working code-base at https://github.com/marcospln/homonymy_acl21. What was difficult An amended version of the original paper with additional details about the experiments is available on arXiv. We initially relied on the ACL version which led to some minor issues during the reproducibility attempt. The code-base does not include the script used to compute the reported statistics, but upon request the author provided a preliminary version which we re-implemented. Lastly, due to some minor bugs and lack of information about the version of the libraries being used, some minor changes to the original code-base were necessary. Communication with original authors We exchanged a number of emails with the author to discuss implementation details and discrepancies in the reproducibility results. We received prompt and helpful responses to all of our questions.
Paper Url: https://aclanthology.org/2021.acl-long.281/
Paper Venue: Other venue (not in list)
Venue Name: ACL 2021
Confirmation: The report pdf is generated from the provided camera ready Google Colab script, The report metadata is verified from the camera ready Google Colab script, The report contains correct author information., The report contains link to code and SWH metadata., The report follows the ReScience latex style guides as in the Reproducibility Report Template (https://paperswithcode.com/rc2022/registration)., The report contains the Reproducibility Summary in the first page., The latex .zip file is verified from the camera ready Google Colab script
Journal: ReScience Volume 9 Issue 2 Article 5