Keywords: Generative language models, LLM, RAG, chunking, se- mantic search, clinical guidelines, evidence-based medicine, NLP in medicine
TL;DR: Adaptive chunking algorithm with proven optimality. The task is formalized as finding the shortest path in a directed acyclic graph (DAG). A multi-component coherence function combines semantic, structural, and medical analysis.
Abstract: Retrieval-augmented generation (RAG) systems are becoming a key tool for handling medical documentation. However, retrieval quality depends on the document splitting strategy (chunking). Existing methods—fixed window, sliding window, semantic splitting—do not account for the specifics of clinical guidelines: hierarchical structure, the “recommendation—evidence level” link, medical terminology.
This paper proposes the ASCM method (Adaptive Semantic Chunking Method)—an adaptive chunking algorithm with proven optimality.
The task is formalized as finding the shortest path in a directed acyclic graph (DAG). A multi-component coherence function combines semantic, structural, and medical analysis. The dynamic programming algorithm guarantees global optimum in time O(N· Lmax).
Experiments on a corpus of gastrointestinal oncology clinical guidelines show superiority of ASCM over baseline methods: +18% in medical entity preservation rate (EPR), +12% in semantic coherence score (SCS).
Submission Number: 57
Loading