
==========================================================================================
NPEFF baselines: Activation ICA (maybe activation sparse-NMF for resnet50)
==========================================================================================


- Not surprising that activation ICA returns similarly-tuned components to NPEFF
  since the role of parameters and activations are dual to each other in our
  "computational graph" (or whatever I call it) mental model.


- TODO: Include results using PCA instead of ICA. I remember PCA being signficantly worse
  than ICA in experiments from long ago, but it would probably be good to include this in
  the rebuttal/paper.


Advantages of NPEFF over Activation ICA:
- Can immediately be applied to any differentiable model. No need to select what subset of activations to use.
    - Particularly useful in cases such as transformers where activation space is not fixed dimensional.
- Takes into account every bit of computation done by the model.
    - For example, selecting activations only at the end of resnet blocks (or whatever the proper term for
      them is) does not give insights into the processing and representations used within sublayers of the block.
    - For example, only using activations from the CLS token's position ignores information stored at other token
      positions.
- Has verification built into it.
    - We can directly show that the parameters highlighted by an NPEFF component are preferentially important
      to its top examples by our perturbation experiments.
    - Need to look more into activation perturbation literature (or maybe try a few more things), but it looks
      like the activation ICA components do not really perturb selectively. Furthermore, they lack the Fisher-
      based theoretical backing of the NPEFF pertubation methods (as far as methods I'm aware of but maybe some
      exist, however such theoretical backing is probably more non-trivial to derive for those methods).
- Representations in parameter space are uniquely useful for making changes to the model.
    - More of a future research direction, but important to mention.


==========================================================================================
"Quality" of the approximations to the Fisher we make.
==========================================================================================

- NOTE: It seems that the quality of component tunings returned by LRM-NPEFF is significantly
  greater than that of diagonal-NPEFF. So it does appear that better approximations
  produce better results with NPEFF.
- Sparsity doesn't seem to be too big of an issue here. I don't think I've seen much
  of a difference using 131k values per example instead of the 65k used throughout this paper.
- Maybe cite our model merging paper as evidence that the diagonal approxiation is "good enough"
  to enhance information transfer between models. Furthermore, the EWC demonstrates
  that it is "good enough" to prevent catestrophic forgetting.


While we agree that using closer approximations to the full PEF matrices would yield
better results, we emphasize that NPEFF's validity as an intpretability algorithm
is not directly dependent on the assumptions we make holding in practice. Instead,
our experiments indicate that NPEFF does indeed uncover components with interpretable
tunings as seen by examination of their top examples. Furthermore, our perturbation
experiments provide an emperical means to verify the assertions made by NPEFF about
the importance of particular parameters to individual components. Independent of
*why* NPEFF works as an interpretability algorithm, we have emperically demonstrated
that NPEFF *does* work as an interpretability method.

While our Fisher-based derivation is useful for understanding and working with NPEFF,
the specific methods introduced in this paper can be justified without invoking full
PEF matrices. Instead of seeing the PEF vectors as diagonals of Fisher matrices, they
can be thought of as simply showing the sensitivity of the model's predictions on a
particular example to perturbations of that parameter. The reasoning behind our
"computational graph"-based hypothesis of why NPEFF uncovers interpretable components
still holds as sub-computations would still be imprinted in these parameter sensitivity
maps. Our perturbation method can also be derived by wanting to preferentially perturb
the parameters important for a component without perturbing those important for the
model's general processing. We can expound on this interpretation in an appendix if so
desired.
