Length Dependence of Vocabulary RichnessDownload PDF

Published: 20 Mar 2023, Last Modified: 30 Mar 2023NoDaLiDa 2023Readers: Everyone
Keywords: computational linguistics, vocabulary richness
TL;DR: We search for a length-independent measure of vocabulary richness, but find that the form of the dependence itself may be more interesting.
Abstract: The relation between the length of a text and the number of unique words is investigated using several Swedish language corpora. We consider a number of existing measures of vocabulary richness, show that they are not length-independent, and try to improve on some of them based on statistical evidence. We also look at the spectrum of values over text lengths, and find that genres have characteristic shapes.
4 Replies

Loading