Morphology Informed Selections for Subword Vocabulary SizeDownload PDF


16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Currently, guidance around selection of an optimal or appropriate subword vocabulary size is incomplete and confusing at best. Using a measure of subword-morpheme overlap, our analysis shows that one can find a "sweet spot" for a morphology informed subword vocabulary size. This sweet spot exhibits some variation with respect to text complexity and the morphological characteristics of a language. However, it is relatively constant with respect to corpus size.
0 Replies
