COLM 2025 Workshop INTERPLAY Submissions

LLM Microscope: What Model Internals Reveal About Answer Correctness and Context Utilization
Jiarui Liu, Jivitesh Jain, Mona T. Diab, Nishant Subramani
- Published: 24 Sept 2025, Last Modified: 25 Sept 2025
- INTERPLAY
- Readers: Everyone
Safety Subspaces are Not Distinct: A Fine-Tuning Case Study
Shaan Shah, Kaustubh Ponkshe, Raghav Singhal, Praneeth Vepakomma
- Published: 24 Sept 2025, Last Modified: 24 Sept 2025
- INTERPLAY
- Readers: Everyone
Understanding In-context Learning of Addition via Activation Subspaces
Xinyan Hu, Kayo Yin, Michael I. Jordan, Jacob Steinhardt, Lijie Chen
- Published: 24 Sept 2025, Last Modified: 24 Sept 2025
- INTERPLAY
- Readers: Everyone
Analyzing Representational Shifts in Multimodal Models: A Study of Feature Dynamics in Gemma and PaliGemma
Aaron C Friedman, Trinabh Gupta, Raine Ma, Sean O'Brien, Kevin Zhu, Cole Blondin
- Published: 24 Sept 2025, Last Modified: 24 Sept 2025
- INTERPLAY
- Readers: Everyone
Universal Neurons in GPT-2: Emergence, Persistence, and Functional Impact
Advey Nandan, Cheng-Ting Chou, Amrit Kurakula, Cole Blondin, Kevin Zhu, Vasu Sharma, Sean O'Brien
- Published: 24 Sept 2025, Last Modified: 24 Sept 2025
- INTERPLAY
- Readers: Everyone
Interpreting the Latent Structure of Operator Precedence in Language Models
Dharunish Yugeswardeenoo, Harshil Nukala, Cole Blondin, Sean O'Brien, Vasu Sharma, Kevin Zhu
- Published: 24 Sept 2025, Last Modified: 24 Sept 2025
- INTERPLAY
- Readers: Everyone
One-shot Optimized Steering Vectors Mediate Safety-relevant Behaviors in LLMs
Jacob Dunefsky, Arman Cohan
- Published: 24 Sept 2025, Last Modified: 24 Sept 2025
- INTERPLAY
- Readers: Everyone
Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs
Ziling Cheng, Meng Cao, Marc-Antoine Rondeau, Jackie CK Cheung
- Published: 24 Sept 2025, Last Modified: 24 Sept 2025
- INTERPLAY
- Readers: Everyone
Emotions Where Art Thou: Understanding and Characterizing the Emotional Latent Space of Large Language Models
Benjamin Reichman, Adar Avsian, Larry Heck
- Published: 24 Sept 2025, Last Modified: 10 Oct 2025
- INTERPLAY
- Readers: Everyone
How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence
Hongzhe Du, Weikai Li, Min Cai, Karim Saraipour, Zimin Zhang, Yizhou Sun, Himabindu Lakkaraju, Shichang Zhang
- Published: 24 Sept 2025, Last Modified: 24 Sept 2025
- INTERPLAY
- Readers: Everyone
From Indirect Object Identification to Syllogisms: Exploring Binary Mechanisms in Transformer Circuits
Karim Saraipour, Shichang Zhang
- Published: 24 Sept 2025, Last Modified: 24 Sept 2025
- INTERPLAY
- Readers: Everyone
On the Geometry of Semantics in Next-token Prediction
Yize Zhao, Christos Thrampoulidis
- Published: 24 Sept 2025, Last Modified: 24 Sept 2025
- INTERPLAY
- Readers: Everyone
Death by a Thousand Directions: Exploring the Geometry of Harmfulness in LLMs through Subconcept Probing
McNair Shah, Saleena Angeline Sartawita, Adhitya Rajendra Kumar, Naitik Chheda, Kevin Zhu, Vasu Sharma, Sean O'Brien, Will Cai
- Published: 24 Sept 2025, Last Modified: 24 Sept 2025
- INTERPLAY
- Readers: Everyone
Angular Steering: Behavior Control via Rotation in Activation Space
Hieu M. Vu, Tan Minh Nguyen
- Published: 24 Sept 2025, Last Modified: 24 Sept 2025
- INTERPLAY
- Readers: Everyone
BERTology in the Modern World
Michael Li, Nishant Subramani
- Published: 24 Sept 2025, Last Modified: 25 Sept 2025
- INTERPLAY
- Readers: Everyone
Causal Interventions Reveal Shared Structure Across English Filler–Gap Constructions
Sasha Boguraev, Christopher Potts, Kyle Mahowald
- Published: 24 Sept 2025, Last Modified: 24 Sept 2025
- INTERPLAY
- Readers: Everyone
Predicting Success of Model Editing via Intrinsic Features
Yanay Soker, Martin Tutek, Yonatan Belinkov
- Published: 24 Sept 2025, Last Modified: 24 Sept 2025
- INTERPLAY
- Readers: Everyone
Evaluating Contrast Localizer for Identifying Causal Units in Social & Mathematical Tasks in Language Models
Yassine Jamaa, Badr AlKhamissi, Satrajit S Ghosh, Martin Schrimpf
- Published: 24 Sept 2025, Last Modified: 24 Sept 2025
- INTERPLAY
- Readers: Everyone
Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking
Wuwei Zhang, Fangcong Yin, Howard Yen, Danqi Chen, Xi Ye
- Published: 24 Sept 2025, Last Modified: 24 Sept 2025
- INTERPLAY
- Readers: Everyone
Attributing Response to Context: A Jensen–Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation
Ruizhe Li, Chen Chen, Yuchen Hu, Yanjun Gao, Xi Wang, Emine Yilmaz
- Published: 24 Sept 2025, Last Modified: 24 Sept 2025
- INTERPLAY
- Readers: Everyone
Localizing Persona Representations in LLMs
Celia Cintas, Miriam Rateike, Erik Miehling, Elizabeth M. Daly, Skyler Speakman
- Published: 24 Sept 2025, Last Modified: 24 Sept 2025
- INTERPLAY
- Readers: Everyone
Comparing Prompt and Representation Engineering for Personality Control in Language Models: A Case Study
Pengrui Han
- Published: 24 Sept 2025, Last Modified: 24 Sept 2025
- INTERPLAY
- Readers: Everyone