Toggle navigation
OpenReview
.net
Login
×
Back to
NeurIPS
NeurIPS 2024 Workshop InterpretableAI Submissions
A Theory of Interpretable Approximations
Marco Bressan
,
Nicolò Cesa-Bianchi
,
Emmanuel Esposito
,
Yishay Mansour
,
Shay Moran
,
Maximilian Thiessen
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
Positional Information Can Emerge Through Causal Attention Making Nearby Token Embeddings Similar Even Without Positional Encodings
Chunsheng Zuo
,
Pavel Guerzhoy
,
Michael Guerzhoy
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
The Price of Freedom: An Adversarial Attack on Interpretability Evaluation
Kristoffer Knutsen Wickstrøm
,
Marina MC Höhne
,
Anna Hedström
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
Can sparse autoencoders be used to decompose and interpret steering vectors?
Harry Mayne
,
Yushi Yang
,
Adam Mahdi
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
Your Theory Is Wrong: Using Linguistic Frameworks for LLM Probing
Victoria Firsanova
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
Riemann Sum Optimization for Accurate Integrated Gradients Computation
Shree Singhi
,
Swadesh Swain
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
ConceptDrift: Uncovering Biases through the Lens of Foundation Models
Cristian Daniel Paduraru
,
Antonio Barbalau
,
Radu Filipescu
,
Andrei Liviu Nicolicioiu
,
Elena Burceanu
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
Position: In Defence of Post-hoc Explainability
Nick Oh
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
ProtoS-ViT: Visual foundation models for sparse self-explainable classifications
Hugues Turbe
,
Mina Bjelogrlic
,
Gianmarco Mengaldo
,
Christian Lovis
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
Residual Stream Analysis with Multi-Layer SAEs
Tim Lawson
,
Lucy Farnik
,
Conor Houghton
,
Laurence Aitchison
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
Subgroup Discovery with the Cox Model
Zachary Izzo
,
Iain Melvin
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
Measuring the Reliability of Causal Probing Methods: Tradeoffs, Limitations, and the Plight of Nullifying Interventions
Marc Canby
,
Adam Davies
,
Chirag Rastogi
,
Julia Hockenmaier
Published: 10 Oct 2024, Last Modified: 06 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations
Kola Ayonrinde
,
Michael T Pearce
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
SignAttention: On the Interpretability of Transformer Models for Sign Language Translation
Pedro Alejandro Dal Bianco
,
Oscar Agustín Stanchi
,
Facundo Manuel Quiroga
,
Franco Ronchetti
,
Enzo Ferrante
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
Competence-Based Analysis of Language Models
Adam Davies
,
Jize Jiang
,
ChengXiang Zhai
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
Aligning Characteristic Descriptors with Images for Human-Expert-like Explainability
Bharat Chandra Yalavarthi
,
Nalini K. Ratha
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
On Interpretability and Overreliance
Julian Skirzynski
,
Elena Glassman
,
Berk Ustun
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
Policy-shaped prediction: improving world modeling through interpretability
Miles Richard Hutson
,
Isaac Kauvar
,
Nick Haber
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
Isometry pursuit
Samson J Koelle
,
Marina Meila
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models
Konstantin Donhauser
,
Gemma Elyse Moran
,
Aditya Ravuri
,
Kian Kenyon-Dean
,
Kristina Ulicna
,
Cian Eastwood
,
Jason Hartford
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
You can remove GPT2's LayerNorm by fine-tuning
Stefan Heimersheim
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
Explainable Concept Generation through Vision-Language Preference Learning
Aditya Taparia
,
Som Sagar
,
Ransalu Senanayake
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
CoS: Enhancing Personalization and Mitigating Bias with Context Steering
Sashrika Pandey
,
Jerry Zhi-Yang He
,
Mariah L Schrum
,
Anca Dragan
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
Disentangling Mean Embeddings for Better Diagnostics of Image Generators
Sebastian Gregor Gruber
,
Pascal Tobias Ziegler
,
Florian Buettner
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
Probable-class Nearest-neighbor Explanations Improve AI & Human Accuracy
Giang Nguyen
,
Valerie Chen
,
Mohammad Reza Taesiri
,
Anh Totti Nguyen
Published: 10 Oct 2024, Last Modified: 03 Dec 2024
IAI Workshop @ NeurIPS 2024
Readers:
Everyone
«
‹
1
2
›
»