# Code for "Don't trust your eyes: on the (un)reliability of feature visualizations"

This repository contains code to replicate experiments from "Don't trust your eyes: on the (un)reliability of feature visualizations".

## Fooling feature visualizations
Feature visualizations are widely used interpretability tools - but can we trust them? We investigate this question from an adversarial, empirical and theoretical perspective. The result: Don’t trust your eyes!

![example-figure](./assets/example_figure.png)

For instance, from an adversarial perspective we can adapt a model such that it maintains identical behavior on natural image input (e.g., identical ImageNet accuracy) but its feature visualizations are changed completely. In the example here, the feature visualization shows a painting (right) instead of the original feature visualization (left).

## Citation
```
@article{anonymous,
  url = {anonymous},
  author = {anonymous, author},
  title = {Don't trust your eyes: on the (un)reliability of feature visualizations},
  journal={anonymous},
  year = {2023},
```


