OpenReview
.net
OpenReview
.net
Login
OpenReview
.net
Login
Himabindu Lakkaraju
Senior Staff Research Scientist, Google
Assistant Professor, Harvard University
Joined
June 2017
Names
Himabindu Lakkaraju
(Preferred)
,
Hima Lakkaraju
Emails
****@cs.stanford.edu
(Confirmed)
,
****@seas.harvard.edu
(Confirmed)
,
****@hbs.edu
Personal Links
Homepage
DBLP
Semantic Scholar
Career & Education History
Senior Staff Research Scientist
Google
(google.com)
2025
–
Present
Assistant Professor
Harvard University
(harvard.edu)
2020
–
Present
Postdoc
Harvard University
(harvard.edu)
2018
–
2019
PhD student
Computer Science,
Stanford University
(stanford.edu)
2012
–
2018
Advisors, Relations & Conflicts
Coauthor
Cynthia Rudin
2016
–
2019
PhD Advisor
Jon Kleinberg
2013
–
2018
PhD Advisor
Jure leskovec
2012
–
2018
Expertise
Interpetability
,
Fairness
,
and Safety in Machine Learning
2013
–
Present
Causality
2013
–
Present
Counterfactual Inference
2013
–
Present
Publications
User Persona Subspaces Modulate Refusal Behavior in Language Models
Yan Zhou
,
Shichang Zhang
,
Zidi Xiong
,
Himabindu Lakkaraju
Mech Interp Workshop ICML 2026 Poster
Readers:
Everyone
Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
Usha Bhalla
,
Alex Oesterling
,
Claudio Mayrink Verdun
,
Himabindu Lakkaraju
,
Flavio Calmon
ICLR 2026 Oral
Readers:
Everyone
Can Trustworthiness Generalize? Leveraging Weak Supervision for Stronger Models
Lillian Sun
,
Martin Pawelczyk
,
Zhenting Qi
,
Aounon Kumar
,
Himabindu Lakkaraju
ICLR 2026 Conference Withdrawn Submission
Readers:
Everyone
How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence
Hongzhe Du
,
Weikai Li
,
Min Cai
,
Karim Saraipour
,
Zimin Zhang
,
Yizhou Sun
,
Himabindu Lakkaraju
,
Shichang Zhang
XLLM-Reason-Plan
Readers:
Everyone
How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence
Hongzhe Du
,
Weikai Li
,
Min Cai
,
Karim Saraipour
,
Zimin Zhang
,
Yizhou Sun
,
Himabindu Lakkaraju
,
Shichang Zhang
INTERPLAY
Readers:
Everyone
Accountability Attribution: Tracing Model Behavior to Training Processes
Shichang Zhang
,
Hongzhe Du
,
Karim Saraipour
,
Jiaqi W. Ma
,
Himabindu Lakkaraju
ICML 2025 R2-FM Workshop Poster
Readers:
Everyone
Inference-Time Reward Hacking in Large Language Models
Hadi Khalaf
,
Claudio Mayrink Verdun
,
Alex Oesterling
,
Himabindu Lakkaraju
,
Flavio Calmon
MoFA Poster
Readers:
Everyone
Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models
Lillian Sun
,
Martin Pawelczyk
,
Zhenting Qi
,
Aounon Kumar
,
Himabindu Lakkaraju
MOSS@ICML2025 Oral
Readers:
Everyone
Leveraging the Sequential Nature of Language for Interpretability
Usha Bhalla
,
Alex Oesterling
,
Claudio Mayrink Verdun
,
Flavio Calmon
,
Himabindu Lakkaraju
ICML 2025 World Models Workshop
Readers:
Everyone
Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models
Lillian Sun
,
Martin Pawelczyk
,
Zhenting Qi
,
Aounon Kumar
,
Himabindu Lakkaraju
CFAgentic @ ICML'25 Oral
Readers:
Everyone
View all 107 publications
Co-Authors
Aaron Jiaxun Li
Alex Gu
Alex Oesterling
Alexander Lin
Alexandra Zytek
Alexandre Alahi
Alexis Ross
Angshu Rai
Anna P. Meyer
Aounon Kumar
Asma Ghandeharioun
Bang An
Ben Yuhas
Carl Shan
Catherine Huang
Charumathi Badrinath
Chelse Swoopes
Chirag Agarwal
Chiranjib Bhattacharyya
Christina Xiao
Claudio Mayrink Verdun
Cynthia Rudin
Dan Ley
David Miller
Dean Foster
View all 127 co-authors