Abstract: Real-time processing of continuously arriving and departing data objects within a memory window poses significant challenges for outlier detection in data streams, particularly in terms of time efficiency and accuracy. This study introduces the concept of contextual collective outliers, defined based on the inherent properties of data streams and outliers, and proposes the EDOBS (Efficient Distance based Outlier detection method for Batch-processed data Streams) algorithm to identify such outliers. To enhance the efficiency of EDOBS, we first develop the DNOS (Distance-based Neighborhood Object Search) algorithm, which restricts the neighborhood search scope for the tested data to a smaller and manageable range. Subsequently, the DNOE (Distance-based Normal Object Extraction) algorithm is introduced to preprocess the tested dataset, effectively filtering out the majority of normal objects. Furthermore, to improve the effectiveness of EDOBS, this work incorporates a summarization of departing data from the data stream, providing a more comprehensive reference for detecting outliers in newly arriving data. Compared to existing methods, EDOBS fully accounts for the contextual dynamics of data streams when identifying contextual collective outliers, mitigating erroneous detections caused by limited memory resources and the evolving nature of the data stream. Experimental results demonstrate that EDOBS achieves high efficiency with constrained memory usage, while its detection performance closely approximates an ideal scenario where memory resources can accommodate the entire data stream.
External IDs:dblp:journals/cluster/SuXLZGH25
Loading