Uncovering Latent Chain of Thought Vectors in Large Language Models

Published: 05 Mar 2025, Last Modified: 05 Mar 2025ICLR 2025 Workshop Weight Space Learning PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: tiny / short paper (up to 4 pages)
Keywords: Steering Vectors, Activation Engineering, Chain of Thought Reasoning, Interpretability
TL;DR: Using Layer Activations from Llama3 and Mistral, we derive injectable steering vectors to steer language models towards Chain of Thought thinking without the need for natural language prompting.
Submission Number: 31
Loading