Abstract: Anomaly detection is becoming important in modern society as everything goes digital. Consumers are spending a lot more time online, and various digital sensors are placed into physical/chemical equipment for health monitoring. Such monitoring data is growing at an exponential rate, which enables automated anomaly detection in various high-impact domains. In this tutorial, we dive deep into several state-of-art methods for finding anomalies in spatiotemporal data (or more generally, correlated multivariate data), on a few specific use cases such as telecom network performance and user behavior monitoring. We identify suitable methods for specific data representation (i.e. spatial, temporal and categorical dimensionality) for specific use case, and present methods to convert raw data into the formats the existing methods require. Such survey and hands-on exercise is necessary as each use case has its own special data representation and requirements in live applications. Certain methods fit better than others for a specific data format under a specific scenario. With such investigation outlined in this tutorial, we hope attendees will be able to more efficiently choose the most appropriate method for their use case. To reach as wide an audience as possible, we investigate anomaly detection application in multiple domains: telecom network performance, user behavior monitoring, financial transactions, and industrial internet of things. The types of datasets range from univariate, multivariate time series of single or multiple entities, to transactional tabular data with timestamps or sequenced indexes. The types of anomalies in these datasets include contextual anomalies for one entity or collective anomalies aggregated among multiple entities in time series, and suspicious record or user in transactional tabular data. During this three-hour hands-on tutorial, we examine the suitability and performance of different methods for the 4 introduced use cases. The available choices of anomaly detection modeling framework ranges from sequential or time series models, to static graph neural network models and dynamic graph neural network models that learns coupled spatial, structural and temporal information. Among these models, anomaly detection is formulated as a forecast problem for some use cases in which anomaly severity is derived based on the observed value deviation from the predicted value. In other use cases, anomaly detection is solved as a classification problem (either as node, edge classification in graph, row-based classification in tabular data or sequential classification in sequence data). The tutorial combines an introduction of fundamental anomaly detection techniques with hands-on exercises. For the hands-on exercises, we focus on the time series based HTM method introduced in and graph methods introduced in. Participants learn to identify suitable methods, apply data transformation techniques to convert raw data into the formats the different methods assume, and study these methods on the aforementioned real-world or synthetic data sets.
Loading