Low-Rank Adapters as Implicit Probabilistic Circuits: Tractable Bayesian Fine-Tuning via Tensor-Algebraic Duality
Keywords: Parameter-Efficient Fine-Tuning, LoRA, Probabilistic Circuits, Bayesian Inference, Tensor Algebra, Uncertainty Quantification
TL;DR: We establish a tensor-algebraic duality between low-rank adapters (LoRA) and probabilistic circuits to enable exact, highly efficient Bayesian fine-tuning.
Abstract: We establish a formal tensor-algebraic duality between low-rank adapters (LoRA) and probabilistic circuits (PCs), proving that any rank-$r$ adapter $\Delta W = AB$ induces a decomposable, smooth PC of size $\mathcal{O}(r \cdot d)$---the two structural properties that characterise tractable generative models. This correspondence enables exact Bayesian posterior inference over adapter weights in $\mathcal{O}(r^2 d)$ time via an algebraic Woodbury identity, improving over the $\mathcal{O}(d^3)$ cost of full-rank inference by a factor of $\mathcal{O}(d/r)$. Extending to the multi-task setting, we apply Kruskal's uniqueness theorem to the adapter tensor $\mathcal{T} \in \mathbb{R}^{d_{\text{out}} \times d_{\text{in}} \times K}$ and derive sample-complexity guarantees for identifiable factor recovery: $n_k = \Omega(rd \log d/(\varepsilon^2 K))$ samples per task suffice. On the learning-theory side, we prove that the Rademacher complexity of the rank-$r$ adapter class is $\hat{\mathfrak{R}}_n(\mathcal{F}_r) = \mathcal{O}\left(R_A R_B \sqrt{\frac{\log d}{n}}\right)$---tight up to constants---yielding an analytically optimal rank $r^* = \Theta((n/ \log d)^{1/3})$. Controlled experiments confirm the inference-speed and calibration predictions: at rank $r = 8$ with $d = 1024$, Woodbury posterior inference is $47\times$ faster than Cholesky decomposition, while Bayesian LoRA achieves ECE $0.031$ versus $0.089$ and AUROC-OOD $0.847$ versus $0.712$ for deterministic adapters on the MNLI$\to$HANS out-of-distribution benchmark.
Submission Number: 30
Loading