What to track on the Twitter streaming API?: a knapsack bandits approach to dynamically update the search terms

Abstract: We use Twitter streaming API for many purposes like monitoring brands and discovering events. Because Twitter Streaming API only allows tracking words (commonly called 'search-terms'), the data collection goal needs to be formulated in terms of search terms. Twitter limits the number of search terms that can be tracked using the API, and the number of tweets retrieved per search-term depends on the terms being tracked. Therefore it's crucial to use a small set of highly relevant terms for tracking. Because social media is very dynamic and conversations evolve fast, the search terms that are relevant now might be less useful in as short of time as an hour. Manual monitoring of such discussions to update the search terms is cumbersome, error-prone and expensive. Can we have an algorithm to update the search terms based on the goals of the dataset collection? Taking inspiration from the knapsack bandits problem that effectively handle exploration (new search terms to explore) and exploitation (keep using useful search terms) when resources (network bandwidth, disk capacity or number of search terms) are constrained, we propose a new approach to dynamically update the search terms based on the goals of the data collection.
0 Replies
Loading