Abstract: In this work, we show that if local datasets in a distributed network are appropriately compressed and then aggregated, it can result in a compressed version of the union of the datasets, in terms of an £2 -subspace embedding. Specifically, we show that sketching datasets which are locally generated or stored at a node in a network; via oblivious embeddings, and then aggregated, result in a valid sketch of the collective dataset. The key idea is that by applying distinct random projections on the “local” datasets, roughly gives each data point the same importance in the “global” dataset. From this, uniform sampling on the local transformed datasets is close to a uniform sampling on the global dataset, after the local projections take place. Our main arguments are also justified numerically.
Loading