Testing for Outliers in a Hidden Function

Dariusz Rafal Kowalski

Testing for Outliers in a Hidden Function

Dariusz Rafal Kowalski

20 Sept 2025 (modified: 21 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Testing functions, Outliers, Query systems

TL;DR: Testing for outliers in a function hidden by the adversary, in a learning game in which the user asks queries and the adversary provides logic-OR feedback.

Abstract: Testing for outliers is an important data mining task, rooted in learning theory, which aims at discovering points that deviate from the most considered "normal". It is widely applicable to identify intrusion, fraud, anomalies, but also values that may occur rarely but are important for various data analysis applications, e.g., maximum/minimum, median, etc. We consider a deterministic version of the problem, called Testing for Hidden Function's Outliers (HFO-testing for short), defined as follows. Given a hidden function $f$ of at most $\ell$ values, the goal is to find all outliers of $f$, that is, values whose preimages are of size at most $k$, together with their preimages, where $\ell,k$ are the problem parameters. Finding outliers can be done by asking OR queries, each represented by a set of pairs $(x,y)$, where the answer to each query is $1$ if at least one pair in the query is consistent with function $f$, i.e., a pair $(x,f(x))$ belongs to the query for some $x$, and $0$ otherwise. We formally model this process as a learning game between two players: the adversary, who first chooses\&hides a function and later provides feedback to the other player queries, and the other player (user) who creates and asks queries and later analyzes the obtained feedback. This paper aims at finding a short universal sequence of queries that allows the user to solve the above-mentioned problem for any adversarial function $f$ from any given (potentially very large) domain $N$ to a codomain $M$. We formally prove nearly-cubic, in terms of parameters $\ell,k$ and polylog$(N,M)$, upper and lower bounds for this problem, which are tight up to a polylogarithmic factor. The upper bound is showed by constructing and analyzing non-adaptive deterministic OR-query system, with decoding. The lower bound is proved by designing "costly" functions for any given OR-query system.

Primary Area: learning theory

Submission Number: 23208

Loading