Data-Driven Representations for Testing Independence: A Connection with Mutual Information Estimation

Abstract: From the design of a data-driven partition, this paper addresses the problem of testing independence between two multidimensional random variables from i.i.d. samples. The empirical log-likelihood statistics is adopted with the objective of approximating the sufficient statistics of a test against independence that knows the two distributions (the oracle test). It is shown that approximating the sufficient statistics of the oracle test (asymptotically) offers a connection with the problem of estimating mutual information. Applying these ideas in the context of a data-dependent tree-structured partition (TSP), we derive concrete sufficient conditions on the parameters of the TSP scheme to obtain a strongly consistent test of independence distribution-free over the family of joint probabilities equipped with densities.
0 Replies
Loading