Tag Me a Label with Multi-arm: Active Learning for Telugu Sentiment Analysis

Sandeep Sricharan Mukku, Subba Reddy Oota, Radhika Mamidi

Published: 2017, Last Modified: 16 Jun 2023DaWaK 2017Readers: Everyone

Abstract: Sentiment Analysis is one of the most active research areas in natural language processing and an extensively studied problem in data mining, web mining and text mining for English language. With the proliferation of social media these days, data is widely increasing in regional languages along with English. Telugu is one such regional language with abundant data available in social media, but it’s hard to find a labeled training set as human annotation is time-consuming and cost-ineffective. To address this issue, in this paper the practicality of active learning for Telugu sentiment analysis is investigated. We built a hybrid approach by combining different query selection strategy frameworks to increase more accurate training data instances with limited labeled data. Using a set of classifiers like SVM, XGBoost, and Gradient Boosted Trees (GBT), we achieved promising results with minimal error rate.

0 Replies