Keywords: Knowledge base retrieval, Large language models, Domain adaptation, Summarization, Clustering, Information retrieval, Enterprise support, Weak supervision, Fine-tuning
TL;DR: FineKB leverages domain-adapted LLM issue summarization and clustering-based retrieval to bridge noisy case descriptions with KB articles, improving retrieval accuracy at top-3 from 24% to 66%.
Abstract: Retrieving relevant knowledge base (KB) articles for enterprise support cases is difficult due to the semantic mismatch between noisy, verbose case descriptions and concise KB content. We present FineKB, a domain-adaptive issue–summarization and cluster-aware retrieval framework that addresses this gap through (i) a finetuned LLM trained on teacher-generated pseudo-summaries to normalize heterogeneous case narratives, (ii) per-KB multi-centroid clustering that models the diverse sub-problems associated with each KB article, and (iii) a confidence-adaptive hybrid inference mechanism that augments high-confidence vector search with selective content lookup and LLM reasoning for ambiguous cases. At inference time, raw case text is embedded and matched against this summary-structured index, avoiding runtime summarization while improving alignment. Experiments on large-scale enterprise data show that FineKB achieves 65.39\% Recall@3, substantially outperforming KB-content dense retrieval (42.73\%). To support reproducible research on noisy-to-structured retrieval, we release FineKB-Vectors, a vectorized dataset containing case-summary and KB-article embeddings.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 13617
Loading