Abstract: Detection of anomalies among a large number of
processes is a fundamental task that has been studied in multiple
research areas, with diverse applications spanning from spectrum
access to cyber-security. Anomalous events are characterized by
deviations in data distributions, and thus can be inferred from
noisy observations based on statistical methods. In some scenarios,
one can often obtain noisy observations aggregated from a chosen
subset of processes. Such hierarchical search can further minimize
the sample complexity while retaining accuracy. An anomaly search
strategy should thus be designed based on multiple requirements,
such as maximizing the detection accuracy; efficiency, be efficient
in terms of sample complexity; and be able to cope with statistical
models that are known only up to some missing parameters (i.e.,
composite hypotheses). In this paper, we consider anomaly detec-
tion with observations taken from a chosen subset of processes that
conforms to a predetermined tree structure with partially known
statistical model. We propose Hierarchical Dynamic Search (HDS),
a sequential search strategy that uses two variations of the Gener-
alized Log Likelihood Ratio (GLLR) statistic, and can be used for
detection of multiple anomalies. HDS is shown to be order-optimal
in terms of the size of the search space, and asymptotically optimal
in terms of detection accuracy. An explicit upper bound on the error
probability is established for the finite sample regime. In addition
to extensive experiments on synthetic datasets, experiments have
been conducted on the DARPA intrusion detection dataset, showing
that HDS is superior to existing methods.
Loading