Toggle navigation
OpenReview
.net
Login
×
Back to
COLM
COLM 2025 Workshop INTERPLAY Submissions
LLM Microscope: What Model Internals Reveal About Answer Correctness and Context Utilization
Jiarui Liu
,
Jivitesh Jain
,
Mona T. Diab
,
Nishant Subramani
Published: 24 Sept 2025, Last Modified: 25 Sept 2025
INTERPLAY
Readers:
Everyone
Safety Subspaces are Not Distinct: A Fine-Tuning Case Study
Shaan Shah
,
Kaustubh Ponkshe
,
Raghav Singhal
,
Praneeth Vepakomma
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
INTERPLAY
Readers:
Everyone
Understanding In-context Learning of Addition via Activation Subspaces
Xinyan Hu
,
Kayo Yin
,
Michael I. Jordan
,
Jacob Steinhardt
,
Lijie Chen
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
INTERPLAY
Readers:
Everyone
Analyzing Representational Shifts in Multimodal Models: A Study of Feature Dynamics in Gemma and PaliGemma
Aaron C Friedman
,
Trinabh Gupta
,
Raine Ma
,
Sean O'Brien
,
Kevin Zhu
,
Cole Blondin
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
INTERPLAY
Readers:
Everyone
Universal Neurons in GPT-2: Emergence, Persistence, and Functional Impact
Advey Nandan
,
Cheng-Ting Chou
,
Amrit Kurakula
,
Cole Blondin
,
Kevin Zhu
,
Vasu Sharma
,
Sean O'Brien
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
INTERPLAY
Readers:
Everyone
Interpreting the Latent Structure of Operator Precedence in Language Models
Dharunish Yugeswardeenoo
,
Harshil Nukala
,
Cole Blondin
,
Sean O'Brien
,
Vasu Sharma
,
Kevin Zhu
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
INTERPLAY
Readers:
Everyone
One-shot Optimized Steering Vectors Mediate Safety-relevant Behaviors in LLMs
Jacob Dunefsky
,
Arman Cohan
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
INTERPLAY
Readers:
Everyone
Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs
Ziling Cheng
,
Meng Cao
,
Marc-Antoine Rondeau
,
Jackie CK Cheung
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
INTERPLAY
Readers:
Everyone
Emotions Where Art Thou: Understanding and Characterizing the Emotional Latent Space of Large Language Models
Benjamin Reichman
,
Adar Avsian
,
Larry Heck
Published: 24 Sept 2025, Last Modified: 10 Oct 2025
INTERPLAY
Readers:
Everyone
How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence
Hongzhe Du
,
Weikai Li
,
Min Cai
,
Karim Saraipour
,
Zimin Zhang
,
Yizhou Sun
,
Himabindu Lakkaraju
,
Shichang Zhang
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
INTERPLAY
Readers:
Everyone
From Indirect Object Identification to Syllogisms: Exploring Binary Mechanisms in Transformer Circuits
Karim Saraipour
,
Shichang Zhang
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
INTERPLAY
Readers:
Everyone
On the Geometry of Semantics in Next-token Prediction
Yize Zhao
,
Christos Thrampoulidis
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
INTERPLAY
Readers:
Everyone
Death by a Thousand Directions: Exploring the Geometry of Harmfulness in LLMs through Subconcept Probing
McNair Shah
,
Saleena Angeline Sartawita
,
Adhitya Rajendra Kumar
,
Naitik Chheda
,
Kevin Zhu
,
Vasu Sharma
,
Sean O'Brien
,
Will Cai
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
INTERPLAY
Readers:
Everyone
Angular Steering: Behavior Control via Rotation in Activation Space
Hieu M. Vu
,
Tan Minh Nguyen
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
INTERPLAY
Readers:
Everyone
BERTology in the Modern World
Michael Li
,
Nishant Subramani
Published: 24 Sept 2025, Last Modified: 25 Sept 2025
INTERPLAY
Readers:
Everyone
Causal Interventions Reveal Shared Structure Across English Filler–Gap Constructions
Sasha Boguraev
,
Christopher Potts
,
Kyle Mahowald
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
INTERPLAY
Readers:
Everyone
Predicting Success of Model Editing via Intrinsic Features
Yanay Soker
,
Martin Tutek
,
Yonatan Belinkov
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
INTERPLAY
Readers:
Everyone
Evaluating Contrast Localizer for Identifying Causal Units in Social & Mathematical Tasks in Language Models
Yassine Jamaa
,
Badr AlKhamissi
,
Satrajit S Ghosh
,
Martin Schrimpf
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
INTERPLAY
Readers:
Everyone
Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking
Wuwei Zhang
,
Fangcong Yin
,
Howard Yen
,
Danqi Chen
,
Xi Ye
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
INTERPLAY
Readers:
Everyone
Attributing Response to Context: A Jensen–Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation
Ruizhe Li
,
Chen Chen
,
Yuchen Hu
,
Yanjun Gao
,
Xi Wang
,
Emine Yilmaz
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
INTERPLAY
Readers:
Everyone
Localizing Persona Representations in LLMs
Celia Cintas
,
Miriam Rateike
,
Erik Miehling
,
Elizabeth M. Daly
,
Skyler Speakman
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
INTERPLAY
Readers:
Everyone
Comparing Prompt and Representation Engineering for Personality Control in Language Models: A Case Study
Pengrui Han
Published: 24 Sept 2025, Last Modified: 24 Sept 2025
INTERPLAY
Readers:
Everyone