Kernel-Level Event-Based Performance Anomaly Detection in Software Systems under Varying Load Conditions
Abstract: Performance anomalies in software systems can lead to significant disruptions and reduced user satisfaction. Traditional methods of anomaly detection rely on log events that capture higher-level system activities but may lack the details to effectively pinpoint root causes. This study investigates the detection of performance anomalies in software systems using kernel-level event data. By leveraging both classical and deep learning approaches, we developed models capable of identifying anomalous patterns in system behavior. The experimental dataset, consisting of over 24 million events collected under various noise and workload conditions, provided a comprehensive basis for analysis. Our results show the robustness of ensemble techniques in predicting performance anomalies with the random forest (accuracy = 89%) and ensemble stacking (F1 score= 0.76, AUC= 0.84) models outperforming other classifiers. Feature importance analysis revealed that CPU-bound events, such as sched_switch and sched_wakeup, are key indicators of performance anomalies. Additionally, a significant relationship was identified between system workload conditions and the likelihood of anomalies, as confirmed by statistical testing. These findings highlight the potential of kernel-level data for precise anomaly detection and provide insights for optimizing system monitoring and performance management.
Loading