Abstract: This paper studies how a stochastic gradient
algorithm (SG) can be controlled to hide the estimate of the
local stationary point from an eavesdropper. Such problems
are of significant interest in distributed optimization settings
like federated learning and inventory management. A learne
r
queries a stochastic oracle and incentivizes the oracle to obtain
noisy gradient measurements and perform SG. The oracle
probabilistically returns either a noisy gradient of the function
or a non-informative measurement, depending on the oracle
state and incentive. The learner’s query and incentive are visible
to an eavesdropper who wishes to estimate the stationary point.
This paper formulates the problem of the learner performing
covert optimization by dynamically incentivizing the stochastic
oracle and obfuscating the eavesdropper as a finite-horizon
Markov decision process (MDP). Using conditions for intervaldominance on the cost and transition probability structure, we
show that the optimal policy for the MDP has a monotone
threshold structure. We propose searching for the optimal
stationary policy with the threshold structure using a stochastic
approximation algorithm and a multi-armed bandit approach
.
The effectiveness of our methods is numerically demonstrated
on a covert federated learning hate-speech classification task.
Loading