LogCAE: An Approach for Log-based Anomaly Detection with Active Learning and Contrastive Learning

Pei Xiao, Tong Jia, Chiming Duan, Huaqian Cai, Ying Li, Gang Huang

Published: 01 Jan 2024, Last Modified: 11 Feb 2025ISSRE 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Log-based anomaly detection plays a crucial role in maintaining the reliability of software systems. Unsupervised models are more suitable for real-world usage because they do not rely on huge data labeling efforts. However, their effectiveness is limited because of the lack of supervision of data labels. To balance model effectiveness and labeling efforts, existing approaches enhance model capabilities by incorporating relatively few but key human labels as a golden signal, thereby improving the model ability with acceptable labeling efforts. However, these methods still face limitations of complex human labels and insufficient utilization of human knowledge. In this paper, we introduce LogCAE, a two-stage log anomaly detection approach based on active learning and contrastive learning. It utilizes an unsupervised model to learn from unlabeled log data without human labels and incorporates human knowledge through active learning during online optimization. We employ contrastive learning to optimize the representation of log samples in feature space for more efficient usage of human labels. We conducted experiments on three distinct public log datasets (Thunderbird, BGL, and Zookeeper). The results show that our method improves 12.93% F1-score on average with 6.06% labeled data samples. Besides, our approach is more effective in utilizing human labels than state-of-the-art approaches.