Fast and Accurate Spreading Process Temporal Scale Estimation

Published: 12 Dec 2022, Last Modified: 28 Feb 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Spreading processes on graphs arise in a host of application domains, from the study of online social networks to viral marketing to epidemiology. Various discrete-time probabilistic models for spreading processes have been proposed. These are used for downstream statistical estimation and prediction problems, often involving messages or other information that is transmitted along with infections caused by the process. These models generally model cascade behavior at a small time scale but are insufficiently flexible to model cascades that exhibit intermittent behavior governed by multiple scales. We argue that the presence of such time scales that are unaccounted for by a cascade model can result in degradation of performance of models on downstream statistical and time-sensitive optimization tasks. To address these issues, we formulate a model that incorporates multiple temporal scales of cascade behavior. This model is parameterized by a \emph{clock}, which encodes the times at which sessions of cascade activity start. These sessions are themselves governed by a small-scale cascade model, such as the discretized independent cascade (IC) model. Estimation of the multiscale cascade model parameters leads to the problem of \emph{clock estimation} in terms of a natural distortion measure that we formulate. Our framework is inspired by the optimization problem posed by DiTursi et al, 2017, which can be seen as providing one possible estimator (a maximum-proxy-likelihood estimator) for the parameters of our generative model. We give a clock estimation algorithm, which we call FastClock, that runs in linear time in the size of its input and is provably statistically accurate for a broad range of model parameters when cascades are generated from any spreading process model with well-concentrated session infection set sizes and when the underlying graph is at least in the semi-sparse regime. We exemplify our algorithm for the case where the small-scale model is the discretized independent cascade process and extend substantially to processes whose infection set sizes satisfy a general martingale difference property. We further evaluate the performance of FastClock empirically in comparison to the state of the art estimator from DiTursi et al, 2017. We find that in a broad parameter range on synthetic networks and on a real network, our algorithm substantially outperforms that algorithm in terms of both running time and accuracy. In all cases, our algorithm's running time is asymptotically lower than that of the baseline.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url:
Changes Since Last Submission: The previous submission (#343) was desk rejected because we used the wrong font. We have made the following changes: 1.) We deleted our errant \usepackage{times} in our latex file, which seems to have been the source of the problem. We also re-downloaded the style files, which turned out to be the same as the ones we were already using. 2.) We changed all of our "\cite"s to "\citet" and "\citep". 3.) We fixed an overfull hbox (the table on the last page). On 9/25/2022, we uploaded a revision to this submission, taking into account reviewer comments, as detailed in a comment to all reviewers. On 12/2/2022, we uploaded the camera-ready version, which included an addition of acknowledgments, de-anonymization, and filled-in publication date.
Assigned Action Editor: ~Laurent_Massoulié1
Submission Number: 349