Abstract: We consider a dataset S held by an agency, and a vector query of interest, \(f(S) \in \mathbb {R}^k\), to be posed by an analyst, which contains the information required for some planned statistical inference. The agency will release an answer to the queries with noise that guarantees a given level of Differential Privacy using the well-known Gaussian noise addition mechanism. The analyst can choose to pose the original vector query f(S) or to transform the query and adjust it to improve the quality of inference of the planned statistical procedure, such as the volume of a confidence interval or the power of a given test of hypothesis. Previous transformation mechanisms that were studied focused on minimizing certain distance metrics between the original query and the one released without a specific statistical procedure in mind. Our analysis takes the Gaussian noise distribution into account, and it is non-asymptotic. In most of the literature that takes the noise distribution into account, a given query and a given statistic based on the query are considered and the statistic’s asymptotic distribution is studied. In this paper we consider both non-random and random datasets, that is, samples, and our inference is on f(S) itself, or on parameters of the data generating process when S is a random sample. Our main contribution is in proving that different statistical procedures can be strictly improved by applying different specific transformations to queries, and in providing explicit transformations for different procedures in some natural situations.
Loading