Abstract: Log parsing is a prerequisite for log analysis. Recently, large language models (LLMs) have demonstrated high accuracy in log parsing. However, their frequent invocations incur substantial costs. To address this issue, some methods have turned to small language models (SLMs), which offer improved efficiency but suffer from reduced accuracy due to limited model capacity. To achieve both high accuracy and efficiency, we propose CSLParser, a collaborative log parsing framework using SLMs and LLMs. CSLParser delegates most log parsing tasks to SLMs and selectively invokes LLMs to correct parsing results generated by SLMs, thereby effectively reducing the invocation cost of LLMs while maintaining high accuracy. Specifically, to enhance the accuracy of SLMs, we propose a diversified sampling strategy to select diverse samples for training, enabling SLMs to effectively handle diverse log patterns. To efficiently invoke LLMs, we design a rule-based selection strategy to identify hard cases that are challenging for SLMs to correctly parse, which are subsequently corrected by LLMs. Additionally, we propose a dynamic template updating mechanism that merges similar templates based on structural and semantic information to further enhance parsing accuracy. Extensive experiments on public large-scale log datasets show that CSLParser outperforms state-of-the-art baselines in both accuracy and efficiency.
External IDs:dblp:conf/issre/HongWZDXHYL25
Loading