Abstract: Human-assigned concreteness ratings for words are commonly used in psycholinguistic and computational linguistic
studies. Previous research has shown that such ratings can be modeled and extrapolated by using dense word-embedding
representations. However, due to rater disagreement, considerable amounts of human ratings in published datasets are not
reliable. We investigate how such unreliable data influences modeling of concreteness with word embeddings. Study 1
compares fourteen embedding models over three datasets of concreteness ratings, showing that most models achieve high
correlations with human ratings, and exhibit low error rates on predictions. Study 2 investigates how exclusion of the less
reliable ratings influences the modeling results. It indicates that improved results can be achieved when data is cleaned.
Study 3 adds additional conditions over those of study 2 and indicates that the improved results hold only for the cleaned
data, and that in the general case removing the less reliable data points is not useful.
Loading