Assessing the Effects of Lemmatisation and Spell Checking on Sentiment Analysis of Online Reviews

Published: 2023, Last Modified: 11 Jun 2024ICSC 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: With many text preprocessing options, choosing the most efficient pipeline is important for accuracy and computational expense. Online text often contains non-standard English, spelling errors, colloquialisms, emojis, slang and other variations that affect current natural language processing tools, with no clear guidelines for preprocessing this type of text. In this work we analyse text preprocessing techniques using a dataset of online reviews scraped from iTunes and Google Play store. The objective is to measure the efficacy of different combinations of these techniques to maximise the amount of detected sentiment in a dataset of 438,157 reviews. Sentiment detection was performed by two state-of-the-art sentiment analysers (RoBERTa and VADER). Statistical analysis of the results suggest preprocessing strategies for maximising sentiment detected within mental health app reviews and similar text formats.
Loading