Open Source Links: https://github.com/komikat/prep-gated-circuits
Keywords: Circuit analysis, Foundational work
Other Keywords: linguistic probing
TL;DR: We reverse-engineer a “Query-Gated Courier” circuit in Gemma-2-2B for role-gated retrieval.
Abstract: We study Gemma-2-2B on a controlled role-gated retrieval task where a
prepositional gate ($\texttt{to}$ or $\texttt{from}$) selects which of two
entities is correct. On 60 single-token name pairs the model attains
100\% accuracy with a mean flip magnitude \($\approx$ 3.5\) (sum of
per-condition correctness margins). Using causal tracing, we identify
a Query-Gated Courier circuit with three stages: (1) a gate
token (from/to) writes a role feature at the answer; (2) this feature perturbs
late-layer courier queries, shifting their $\(q \cdot k\)$ preference;
(3) couriers attend to the correct name and inject it via OV, raising
its logit. Gate-residual swaps flip predictions, and a compact
nine-head keep set reproduces the behavior with high fidelity. The
circuit gives a potential algorithm for role tracking and aligns with
the Paninian Kāraka analysis, mapping $\texttt{to}$ to sampradāna
and $\texttt{from}$ to apādāna.
Submission Number: 307
Loading