Delayed Assignments in Online Non-Centroid Clustering with Stochastic Arrivals
Keywords: Clustering, Online Algorithms, Stochastic Arrival Models
TL;DR: We study online non-centroid clustering where decisions can be delayed at a cost. We design a greedy algorithm with a constant bound on the ratio between its expected cost and that of an optimal offline clustering as the number of points grows.
Abstract: Clustering is a fundamental problem, aiming to partition a set of elements, like agents or data points, into clusters such that elements in the same cluster are closer to each other than to those in other clusters. In this paper, we present a new framework for studying online non-centroid clustering *with delays*, where elements, that arrive one at a time as points in a finite metric space, should be assigned to clusters, but assignments need *not* be immediate. Specifically, upon arrival, each point's location is revealed, and an online algorithm has to *irrevocably* decide whether to assign it to an existing cluster or create a new one containing, at this moment, only this point. However, we allow decisions to be postponed at a *delay cost*, instead of following the more common assumption of *immediate* decisions upon arrival. This poses a critical challenge: the goal is to minimize not only the total distance costs between points in each cluster, but also the overall delay costs incurred by postponing assignments. In the classic *worst-case adversarial model*, where points arrive in an *arbitrary* order, no algorithm has a competitive ratio better than sublogarithmic in the number of points. To overcome this strong impossibility, we focus on a *stochastic arrival model*, where points' locations are drawn independently across time from an *unknown* and *fixed* probability distribution over the finite metric space. We offer hope for beyond worst-case adversaries: we devise an algorithm that is **constant** competitive in the sense that, as the number of points grows, the ratio between the expected overall costs of the output clustering and an optimal offline clustering is bounded by a constant.
Area: Game Theory and Economic Paradigms (GTEP)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 567
Loading