Automatic Dictionary Generation: Could Brothers Grimm Create a Dictionary with BERT?Download PDF

22 Sept 2022 (modified: 25 Oct 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: dictionary generation, natural language processing, transformers
Abstract: The creation of the most famous German dictionary, also referred to as ``Deutsches Wörterbuch'' or in English ``The German Dictionary'', by the two brothers Jacob and Wilhelm Grimm, took more than a lifetime to be finished (1838--1961). In our work we examine the question, if it would be possible for them to create a dictionary using present technology, i.e., language models such as BERT. Starting with the definition of the task of Automatic Dictionary Generation, we propose a method based on contextualized word embeddings and hierarchical clustering to create a dictionary given unannotated text corpora. We justify our design choices by running variants of our method on English texts, where ground truth dictionaries are available. Finally, we apply of our approach to Shakespeare's work and automatically generate a dictionary tailored to Shakespearean vocabulary and contexts without human intervention.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
Supplementary Material: zip
5 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview