
\section{Related Work}
\label{sec:related}

\begin{figure}
    \centering
    \includegraphics[width=1\linewidth,trim={15 15 10 40},clip]{images/taxonomy_minimal.pdf}
    \caption{Taxonomy of Knowledge Augmentation Approaches for LLMs}
    \label{fig:KG_aug_tax}
\end{figure}

To address the knowledge cutoff and factual limitations of LLMs, numerous techniques have been developed to provide models with the information needed to perform open-ended tasks. Some methods focus on the ability to encode knowledge within the model parameters, while others try to embed it into the context window.
A taxonomy of the evaluated literature is depicted in Figure~\ref{fig:KG_aug_tax}

Retrieval-Augmented Generation (RAG) \cite{lewis2021retrievalaugmentedgenerationknowledgeintensivenlp}, along with variants tailored to structured data such as KG-RAG \cite{zhu-etal-2025-knowledge}, mitigates hallucinations by appending external information to the prompt.
While these can be considered elegant and effective solutions, they incur the issue of inflating the context window, ultimately leading to high inference costs. 
Alternative text-based injection methods, such as K-BERT \cite{liu2019kbertenablinglanguagerepresentation} and KnowBert \cite{peters2019knowledgeenhancedcontextualword}, opt to incorporate knowledge via sentence trees or entity embeddings, but typically require extensive retraining of the LLM backbone or are specialized for a particular set of entities.

Other works have focused on techniques that use Parameter-Efficient Fine-Tuning (PEFT). Methods like K-Adapter \cite{wang2020kadapterinfusingknowledgepretrained} and KnowLA \cite{luo2024knowlaenhancingparameterefficientfinetuning} use lightweight adapters to incorporate knowledge into the LLM.
KnowLA can be considered closest to our proposed approach, as it uses external entity embeddings derived from a Knowledge graph to enhance the LLM. However, it requires embeddings to be pre-computed before training, severely impacting the ability to generalize to unseen graphs.

Another active area of research is the integration of the graph directly as token embeddings, a sort of Graph-Prompting.
Examples of this approach are GraphToken \cite{perozzi2024letgraphtalkingencoding}, TEA-GLM \cite{wang2024llmszeroshotgraphlearners}, and GQT \cite{wang2025learninggraphquantizedtokenizers}, which use GNNs to generate "soft" token embeddings that represent structural information.
While bearing resemblance to our proposed architecture, these methods target graph-specific benchmarks rather than general-purpose factual grounding for LLM reasoning.
Further, a more sophisticated integration approach has been proposed in ConceptFormer \cite{barmettler2025conceptformerefficientuseknowledgegraph}, which injects "concept vectors" to reduce token consumption; while showing promising results for knowledge injection, its evaluation was limited to smaller models like GPT-2 0.1B.

