Abstract: The online influence maximization (OIM) problem aims to learn sequentially an optimal policy for selecting seed nodes which maximize the cumulative spread of information (influence) in a diffusion medium, throughout a multi-round diffusion campaign. We consider the sub-class of OIM problems where (i) the reward of a given round of the ongoing campaign consists of only the new activations (not observed at previous rounds), and (ii) the round’s context and the historical data from previous rounds can be exploited to learn the best policy. This problem is directly motivated by the real-world scenarios of information diffusion in influencer marketing, where (i) only a target user’s first / unique activation is of interest (and this activation will persist as an acquired, latent one throughout the campaign), and (ii) valuable side-information is available to the learning agent. We call this OIM formulation Episodic Contextual Influence Maximization with Persistence (in short, ECIMP). We propose the algorithm LSVI-GT-UCB, which implements the optimism in the face of uncertainty principle for episodic reinforcement learning with linear approximation. The learning agent estimates for each seed node its remaining potential with a Good-Turing estimator, modified by an estimated Q-function. The algorithm is empirically proven to perform better than state-of-the-art methods on two real-world datasets and a synthetically generated one.
Loading