Abstract: Large language models (LLMs) have been actively applied in system troubleshooting, yet their effectiveness and accuracy are limited by the complexity of system domains. Online technical forums, rich with expert-contributed troubleshooting insights, are valuable resources for practitioners. However, finding the most relevant information for a given issue often requires considerable manual effort. We attribute this challenge to the multi-modality of forum posts, which contain a diverse mix of data artifacts, including code snippets, log messages, console outputs, commands, and descriptions. Traditional retrieval methods, which focus on a subset of these data types or treat the entire content as natural language, often fail to be effective. To address these challenges, we propose a comprehensive framework, ForumSeeker. The core concept is to incorporate the complete stack from the failure site and independently process heterogeneous data artifacts within forum posts, calculating relevance scores between different pairs of artifacts, then aggregating the results of individual pairs to achieve superior performance. Our evaluation demonstrates that ForumSeeker significantly outperforms five baselines, achieving at least 44.7% improvements in search ranking quality over the best competitor. Moreover, ForumSeeker successfully ranks ground-truth relevant forum posts within the top 10 results in 96.1% of cases.
External IDs:dblp:conf/fase/KimRSWT26
Loading