everyone
since 13 Oct 2023">EveryoneRevisionsBibTeX
In many applications like experimental design, group testing, medical diagnosis, and active testing, the state of a random variable $Y$ is revealed by successively observing the outcomes of binary tests about $Y$, where new tests are selected adaptively based on the history of outcomes observed so far. If the number of states of $Y$ is finite, the process ends when $Y$ can be predicted with a desired level of confidence or all available tests have been used. Finding the strategy that minimizes the expected number of tests needed to predict $Y$ is virtually impossible in most real applications due to high dimensions. Therefore, the commonly used strategy is the greedy heuristic of information maximization that selects tests sequentially in order of information gain. However, this can be far from optimal for certain families of tests. In this paper, we argue that in most practical settings, for a given set of tests, there exists a $0 \ll \delta \ll \frac{1}{2}$, such that in every iteration of the greedy strategy, the selected binary test will have conditional probability of being `true', given the history, within $\delta$ units of one-half. Under this assumption, we first study the performance of the greedy strategy for the simpler case of oracle tests, that is, when all tests are functions of $Y$, and obtain tighter bounds than previously reported in literature. Subsequently, under the same assumption, we extend our analysis to incorporate noise in the test outcomes. In particular, we assume the outcomes are corrupted through a binary symmetric channel and obtain bounds on the expected number of tests needed to make accurate predictions.