Toggle navigation
OpenReview
.net
Login
×
Back to
ICML
ICML 2024 Workshop MI Submissions
Grokking and the Geometry of Circuit Formation
Ahmed Imtiaz Humayun
,
Randall Balestriero
,
Richard Baraniuk
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Poster
Readers:
Everyone
Exploring the Internal Mechanisms of Music LLMs: A Study of Root and Quality via Probing and Intervention Techniques
Wenye Ma
,
Gus Xia
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Poster
Readers:
Everyone
Faithful and Fast Influence Function via Advanced Sampling
Jungyeon Koh
,
Hyeonsu Lyu
,
Jonggyu Jang
,
Hyun Jong Yang
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Spotlight
Readers:
Everyone
Sparse Autoencoders Match Supervised Features for Model Steering on the IOI Task
Aleksandar Makelov
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Spotlight
Readers:
Everyone
Cluster-Norm for Unsupervised Probing of Knowledge
Walter Laurito
,
Sharan Maiya
,
Grégoire DHIMOÏLA
,
Owen Ho Wan Yeung
,
Kaarel Hänni
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Poster
Readers:
Everyone
Planning behavior in a recurrent neural network that plays Sokoban
Adrià Garriga-Alonso
,
Mohammad Taufeeque
,
Adam Gleave
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Poster
Readers:
Everyone
Extracting Finite State Machines from Transformers
Rik Adriaensen
,
Jaron Maene
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Poster
Readers:
Everyone
Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders
Senthooran Rajamanoharan
,
Arthur Conmy
,
Lewis Smith
,
Tom Lieberum
,
Vikrant Varma
,
Janos Kramar
,
Rohin Shah
,
Neel Nanda
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Poster
Readers:
Everyone
Visualizing Neural Network Imagination
Nevan Wichers
,
Victor Tao
,
Riccardo Volpato
,
Fazl Barez
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Poster
Readers:
Everyone
Attention with Markov: A Curious Case of Single-layer Transformers
Ashok Vardhan Makkuva
,
Marco Bondaschi
,
Alliot Nagle
,
Adway Girish
,
Hyeji Kim
,
Martin Jaggi
,
Michael Gastpar
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Poster
Readers:
Everyone
Learning and Unlearning of Fabricated Knowledge in Language Models
Chen Sun
,
Nolan Andrew Miller
,
Andrey Zhmoginov
,
Max Vladymyrov
,
Mark Sandler
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Spotlight
Readers:
Everyone
Progressive distillation improves feature learning via implicit curriculum
Abhishek Panigrahi
,
Bingbin Liu
,
Sadhika Malladi
,
Andrej Risteski
,
Surbhi Goel
Published: 24 Jun 2024, Last Modified: 24 Jun 2024
ICML 2024 MI Workshop Poster
Readers:
Everyone
Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms
Michael Hanna
,
Sandro Pezzelle
,
Yonatan Belinkov
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Spotlight
Readers:
Everyone
Finding Visual Task Vectors
Alberto Hojel
,
Yutong Bai
,
Trevor Darrell
,
Amir Globerson
,
Amir Bar
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Poster
Readers:
Everyone
Language Models Linearly Represent Sentiment
Curt Tigges
,
Oskar John Hollinsworth
,
Atticus Geiger
,
Neel Nanda
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Spotlight
Readers:
Everyone
Tokenized SAEs: Disentangling SAE Reconstructions
Thomas Dooms
,
Daniel Wilhelm
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Poster
Readers:
Everyone
InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques
Rohan Gupta
,
Iván Arcuschin
,
Thomas Kwa
,
Adrià Garriga-Alonso
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Poster
Readers:
Everyone
Robust Unlearning via Mechanistic Localizations
Phillip Huang Guo
,
Aaquib Syed
,
Abhay Sheshadri
,
Aidan Ewart
,
Gintare Karolina Dziugaite
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Spotlight
Readers:
Everyone
LLM Circuit Analyses Are Consistent Across Training and Scale
Curt Tigges
,
Michael Hanna
,
Qinan Yu
,
Stella Biderman
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Poster
Readers:
Everyone
The Concept Percolation Hypothesis: Analyzing the Emergence of Capabilities in Neural Networks Trained on Formal Grammars
Ekdeep Singh Lubana
,
Kyogo Kawaguchi
,
Robert P. Dick
,
Hidenori Tanaka
Published: 24 Jun 2024, Last Modified: 24 Jun 2024
ICML 2024 MI Workshop Poster
Readers:
Everyone
The Remarkable Robustness of LLMs: Stages of Inference?
Vedang Lad
,
Wes Gurnee
,
Max Tegmark
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Poster
Readers:
Everyone
Grokking, Rank Minimization and Generalization in Deep Learning
David Yunis
,
Kumar Kshitij Patel
,
Samuel Wheeler
,
Pedro Henrique Pamplona Savarese
,
Gal Vardi
,
Karen Livescu
,
Michael Maire
,
Matthew Walter
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Poster
Readers:
Everyone
Loss in the Crowd: Hidden Breakthroughs in Language Model Training
Sara Kangaslahti
,
Elan Rosenfeld
,
Naomi Saphra
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Spotlight
Readers:
Everyone
Neuroplasticity and Corruption in Model Mechanisms: A case study of Indirect Object Identification
Vishnu Kabir Chhabra
,
Ding Zhu
,
Mohammad Mahdi Khalili
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Poster
Readers:
Everyone
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Adam Karvonen
,
Benjamin Wright
,
Can Rager
,
Rico Angell
,
Jannik Brinkmann
,
Logan Riggs Smith
,
Claudio Mayrink Verdun
,
David Bau
,
Samuel Marks
Published: 24 Jun 2024, Last Modified: 31 Jul 2024
ICML 2024 MI Workshop Oral
Readers:
Everyone
«
‹
1
2
3
4
›
»