FeelsGoodMan: Inferring Semantics of Twitch Neologisms


16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Twitch chat messages pose a unique problem in natural language understanding due to a large presence of neologisms, specifically emotes. There are a total of 8.06 million emotes, over 400k of which were observed during the study period. There is virtually no information on the meaning or sentiment of emotes, and with a constant influx of new emotes and drift in both their frequencies and their perceived meanings, it becomes impossible to maintain an updated manually-labeled dataset. Our paper makes a two-fold contribution. First, we establish a new baseline for sentiment analysis on Twitch data, outperforming the previous benchmark by 7.36 percentage points. Secondly, we introduce a simple but powerful unsupervised framework based on word embeddings and k-NN to enrich existing models with out-of-vocabulary knowledge. This framework allows us to auto-generate an emote pseudo-dictionary, and we show that we can nearly match the supervised benchmark above, even when injecting such emote knowledge into sentiment classifiers trained on extraneous datasets such as movie reviews or Twitter.
