KERMIT: A BERT-Based Classification Method for Linux Kernel Crashes Through Stack Trace

Published: 01 Jan 2025, Last Modified: 08 Nov 2025ICIC (16) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: With the rapid development of kernel auto-fuzzing, the volume of Linux kernel crash reports has surged, many of which are duplicates triggered by the same underlying bug. However, existing classification techniques (e.g., heuristic methods based on crash function classification, bug rule matching and filtering, and kernel object similarity computation) often misidentify such duplicate reports as distinct issues, leading to increased manual effort, incomplete patches, and reduced efficiency in bug remediation. Accurately identifying duplicate crash reports thus remains a critical challenge in kernel error management. Different from application-level crashes, kernel crashes stem directly from low-level mechanisms, e.g., hardware exceptions and resource contention, bypassing intermediate abstraction layers. Therefore, traditional natural language processing techniques, originally designed for user-space crashes, struggle to capture the unique semantics of kernel-specific system call paths. To address this gap, we propose KERMIT, a methodology that combines kernel-specific feature extraction with domain-adaptive finetuning to bridge semantic disparities in crash report analysis. Specifically, KERMIT employs tailored filtering techniques to isolate complete call trace data while removing irrelevant noise, and it utilizes full-parameter fine-tuning of a BERT-based model to adapt its semantic embeddings to the unique characteristics of kernel crash reports. Experimental results demonstrate that KERMIT achieves a recall rate of 92.33%, representing a 7.73% improvement over state-of-the-art methods. Notably, KERMIT, built on a fine-tuned BERT model with only 110 million parameters, outperforms large-scale models like GPT-4 by over 30% in recall, offering a more efficient and resource-effective solution for kernel crash de-duplication.
Loading