{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Gemma-2-2b Demo\n",
    "\n",
    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/safety-research/circuit-tracer/blob/main/demos/gemma_demo.ipynb\">\n",
    "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
    "</a>\n",
    "\n",
    "\n",
    "In this demo, you'll see a few of the attribution graphs found with gemma-2-2b. Below, you'll find examples, as well as a number of intervention experiments that will validation the correctness of our annotations / supernodes found in the graph."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- [`The International Advanced Security Group (IAS` → `G`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-G&pinnedIds=25_5604_7%2C24_763_7%2C22_12304_7%2C23_14585_7%2C24_1668_7%2C20_7544_7%2C17_4855_5%2C24_9503_7%2C17_4886_5%2C14_1031_5%2C13_7451_5%2C4_3134_5%2CE_5897_5%2C1_3977_5%2C13_5661_7%2C11_10532_7%2C11_7419_7%2C10_4451_7%2C6_9719_7%2CE_24632_7%2CE_591_6%2C0_548_7%2C2_12787_7%2C2_12811_7%2C2_4716_7%2C2_8870_7%2C5_10381_7%2C6_3358_7%2C7_7303_7%2C7_4807_7%2C7_15088_7%2C8_4119_7%2C10_2379_7%2C9_15938_7%2C10_8308_7%2C10_11210_7%2C14_922_7%2C15_5076_7%2C14_15510_7%2C15_444_7%2C17_14853_7%2C27_235319_7%2C2_8493_5%2C3_4791_6%2C4_6672_6%2C5_12154_6%2C5_112_6%2C8_15626_7%2C13_945_6%2C9_13890_6%2C5_12910_7%2C14_1031_6%2C14_13599_6%2C21_5066_6%2C13_13476_6%2C16_3033_6%2C25_4062_7%2C15_1301_6%2C6_11788_7%2C7_12830_7%2C11_8369_6%2C0_14394_7%2C0_7197_7%2C0_4370_7%2C0_3410_7%2C0_5548_7%2C0_13190_7%2C0_2592_7%2C1_10188_7%2C15_12642_5%2C23_999_7%2C14_13969_5%2C21_1146_7%2C24_2871_7%2C25_5880_7%2C24_7620_7%2C23_12120_7%2C25_10087_7&supernodes=%5B%5B%22Activates+on+%28+before+acronym+%2F+upweights+G*%22%2C%2220_7544_7%22%2C%2222_12304_7%22%5D%2C%5B%22%5C%22group%5C%22%22%2C%221_3977_5%22%2C%222_8493_5%22%2C%2217_4855_5%22%5D%2C%5B%22Activates+on+%28+before+acronym%22%2C%2221_5066_6%22%2C%2211_8369_6%22%2C%2213_945_6%22%2C%229_13890_6%22%2C%2214_13599_6%22%2C%2215_1301_6%22%2C%2216_3033_6%22%2C%2213_13476_6%22%2C%225_112_6%22%2C%225_12154_6%22%2C%224_6672_6%22%2C%223_4791_6%22%5D%2C%5B%22Predict+tokens+starting+with+G%22%2C%2223_999_7%22%2C%2224_1668_7%22%2C%2225_5604_7%22%5D%2C%5B%22Tokens+in+acronymable+proper+nouns%22%2C%2213_7451_5%22%2C%2214_13969_5%22%5D%2C%5B%22tokens+followed+by+G%22%2C%2221_1146_7%22%2C%2224_2871_7%22%2C%2225_5880_7%22%2C%2224_7620_7%22%5D%2C%5B%22all+caps%22%2C%2225_10087_7%22%2C%2223_12120_7%22%2C%2223_14585_7%22%2C%2224_763_7%22%5D%2C%5B%22Tokens+starting+with+G%22%2C%2214_1031_6%22%2C%2215_12642_5%22%2C%2214_1031_5%22%2C%224_3134_5%22%2C%2217_4886_5%22%5D%2C%5B%22first+token+of+acronyms%22%2C%220_7197_7%22%2C%220_5548_7%22%2C%220_3410_7%22%2C%220_4370_7%22%2C%222_8870_7%22%2C%220_13190_7%22%2C%220_14394_7%22%2C%220_2592_7%22%2C%221_10188_7%22%2C%227_12830_7%22%2C%2213_5661_7%22%2C%2214_922_7%22%2C%2217_14853_7%22%2C%227_15088_7%22%2C%229_15938_7%22%2C%2210_8308_7%22%2C%2215_5076_7%22%2C%2210_11210_7%22%2C%2211_10532_7%22%2C%226_9719_7%22%2C%225_10381_7%22%2C%222_12811_7%22%2C%222_4716_7%22%5D%2C%5B%22tokens+of+acronyms+in+parentheses%22%2C%2215_444_7%22%2C%228_4119_7%22%2C%2210_2379_7%22%2C%226_3358_7%22%2C%225_12910_7%22%2C%2224_9503_7%22%2C%226_11788_7%22%2C%2225_4062_7%22%2C%220_548_7%22%2C%2211_7419_7%22%2C%2210_4451_7%22%2C%2214_15510_7%22%2C%227_7303_7%22%2C%227_4807_7%22%2C%222_12787_7%22%2C%228_15626_7%22%5D%5D&clickedId=24_13541_7&pruningThreshold=0.7&densityThreshold=0.99)\n",
    "- [`Mexico:peso :: Europe:` → `euro`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-euro&pinnedIds=25_5604_7%2C24_763_7%2C22_12304_7%2C23_14585_7%2C24_1668_7%2C20_7544_7%2C17_4855_5%2C24_9503_7%2C17_4886_5%2C14_1031_5%2C13_7451_5%2C4_3134_5%2CE_5897_5%2C1_3977_5%2C13_5661_7%2C11_10532_7%2C11_7419_7%2C10_4451_7%2C6_9719_7%2CE_24632_7%2CE_591_6%2C0_548_7%2C2_12787_7%2C2_12811_7%2C2_4716_7%2C2_8870_7%2C5_10381_7%2C6_3358_7%2C7_7303_7%2C7_4807_7%2C7_15088_7%2C8_4119_7%2C10_2379_7%2C9_15938_7%2C10_8308_7%2C10_11210_7%2C14_922_7%2C15_5076_7%2C14_15510_7%2C15_444_7%2C17_14853_7%2C27_235319_7%2C2_8493_5%2C3_4791_6%2C4_6672_6%2C5_12154_6%2C5_112_6%2C8_15626_7%2C13_945_6%2C9_13890_6%2C5_12910_7%2C14_1031_6%2C14_13599_6%2C21_5066_6%2C13_13476_6%2C16_3033_6%2C25_4062_7%2C15_1301_6%2C6_11788_7%2C7_12830_7%2C11_8369_6%2C0_14394_7%2C0_7197_7%2C0_4370_7%2C0_3410_7%2C0_5548_7%2C0_13190_7%2C0_2592_7%2C1_10188_7%2C15_12642_5%2C23_999_7%2C14_13969_5%2C21_1146_7%2C24_2871_7%2C25_5880_7%2C24_7620_7%2C23_12120_7%2C25_10087_7%2C27_39182_6%2C27_16394_6%2C27_29913_6%2C27_3445_6%2C23_7512_6%2C23_6406_6%2C21_467_6%2C20_5796_6%2C18_10102_6%2C21_14004_6%2C18_10914_6%2C17_14392_6%2C19_6215_6%2C18_6132_6%2C19_3905_6%2C16_5477_5%2C23_15975_6%2C25_5524_6%2C24_15918_6%2C14_13587_5%2C6_3077_5%2C8_15546_5%2C4_12658_5%2C2_4931_5%2CE_4238_5%2C0_5626_5%2C15_15510_5%2C17_6158_5%2C14_9246_3%2C4_4678_3%2C15_10074_3%2CE_78755_3%2C2_11100_3%2C2_16013_3%2C5_15196_3%2C5_15068_3%2C5_13630_3%2C6_9152_3%2C6_4525_3%2C8_3601_3%2C6_13608_3%2C1_1214_3%2C0_10532_3%2C1_11176_3%2C1_2609_3%2C7_15373_3%2C3_4025_3%2C2_2161_3%2C6_10981_3%2C22_6179_6%2C21_4240_6&clickedId=24_8718_6&pruningThreshold=0.7&densityThreshold=0.99&supernodes=%5B%5B%22EU%2FEuropean%22%2C%220_5626_5%22%2C%224_12658_5%22%2C%222_4931_5%22%2C%228_15546_5%22%2C%226_3077_5%22%2C%2214_13587_5%22%2C%2216_5477_5%22%2C%2215_15510_5%22%5D%2C%5B%22currency%22%2C%2218_10102_6%22%2C%2220_5796_6%22%2C%2223_7512_6%22%2C%2222_6179_6%22%2C%2221_4240_6%22%2C%2219_6215_6%22%2C%2217_14392_6%22%2C%2218_10914_6%22%2C%2221_14004_6%22%2C%2219_3905_6%22%5D%2C%5B%22finance%22%2C%226_9152_3%22%2C%226_4525_3%22%2C%222_2161_3%22%2C%221_2609_3%22%2C%220_10532_3%22%5D%2C%5B%22currency%22%2C%222_16013_3%22%2C%223_4025_3%22%2C%2214_9246_3%22%2C%2215_10074_3%22%2C%222_11100_3%22%2C%221_1214_3%22%2C%228_3601_3%22%2C%227_15373_3%22%2C%226_10981_3%22%2C%225_13630_3%22%2C%224_4678_3%22%2C%226_13608_3%22%2C%225_15196_3%22%2C%225_15068_3%22%2C%221_11176_3%22%5D%2C%5B%22EU%22%2C%2223_15975_6%22%2C%2218_6132_6%22%2C%2223_6406_6%22%2C%2225_5524_6%22%2C%2221_467_6%22%5D%5D&clerps=%5B%5B%2224_2415918_6%22%2C%22don%27t+say+Euro%22%5D%5D)\n",
    "- [`The guitarist knew the song` → `. / is`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-gp-nps&clerps=%5B%5B%222413277%22%2C%22%28incomprehensible%29%22%5D%2C%5B%222106697%22%2C%22The+X+at+start+of+sentence+%28subject+NPs%29%22%5D%2C%5B%222012754%22%2C%22ends+of+NPs%2C+upweights+verbs%22%5D%2C%5B%222305096%22%2C%22ends+of+phrases+%22%5D%2C%5B%222001993%22%2C%22say+%5C%22would%5C%22%22%5D%2C%5B%222111913%22%2C%22subject+in+sentential+clause+%28say+%5C%22had%5C%22%29%22%5D%2C%5B%222012650%22%2C%22say+a+verb%22%5D%2C%5B%221702296%22%2C%22thought+%28say+a+verb%29%22%5D%2C%5B%221609179%22%2C%22knew%22%5D%2C%5B%221401162%22%2C%22was%2Fwere%22%5D%2C%5B%221303459%22%2C%22knew%2Fknow%22%5D%2C%5B%222310652%22%2C%22subject+in+sentential+clause+%28say+%5C%22would%5C%22%29%22%5D%2C%5B%222514276%22%2C%22don%27t+say+%5C%22well%5C%22%22%5D%2C%5B%222206853%22%2C%22don%27t+say+%5C%22well%5C%22%22%5D%2C%5B%221915834%22%2C%22say+%5C%22well%5C%22%22%5D%2C%5B%221409346%22%2C%22know%22%5D%2C%5B%222301993%22%2C%22say+%5C%22was%5C%22%22%5D%2C%5B%222207306%22%2C%22say+%5C%22was%5C%22%22%5D%2C%5B%222108443%22%2C%22say+%5C%22well%5C%22%22%5D%2C%5B%22701641%22%2C%22know+%2F+understand%22%5D%2C%5B%22600576%22%2C%22know+%28that%29%22%5D%2C%5B%221804181%22%2C%22sentential+verbs+%28say+a+verb%29%22%5D%2C%5B%22205370%22%2C%22want%22%5D%2C%5B%22307146%22%2C%22know%22%5D%2C%5B%22414214%22%2C%22know%22%5D%2C%5B%222105739%22%2C%22ends+of+phrases%22%5D%2C%5B%221914505%22%2C%22%5C%22song%5C%22%22%5D%2C%5B%221500908%22%2C%22know%22%5D%2C%5B%22600908%22%2C%22know%22%5D%2C%5B%222301612%22%2C%22say+a+verb%22%5D%2C%5B%222101806%22%2C%22say+a+verb%22%5D%2C%5B%221204795%22%2C%22know%22%5D%2C%5B%2221_2111913_5%22%2C%22sentential+subjects+%28say+a+verb%29%22%5D%5D&pinnedIds=27_729_5%2CE_6608_3%2C21_11913_5%2C23_1993_5%2C20_12650_5%2C22_7306_5%2CE_91939_2%2CE_5169_5%2C17_2296_3%2C18_4181_5%2C14_9346_3%2C20_1993_5%2C19_14505_5%2C16_9179_3%2C15_908_3%2C14_9346_4%2C6_576_3%2C6_908_3%2C12_4795_3%2C13_3459_3%2C7_1641_3%2C4_14214_3%2C23_1612_5%2C21_1806_5&supernodes=%5B%5B%22say+a+verb%22%2C%2220_12650_5%22%2C%2221_1806_5%22%2C%2223_1612_5%22%5D%2C%5B%22say+%5C%22was%5C%22%22%2C%2223_1993_5%22%2C%2222_7306_5%22%5D%2C%5B%22knew%22%2C%2216_9179_3%22%2C%224_14214_3%22%2C%2213_3459_3%22%2C%2212_4795_3%22%2C%2214_9346_3%22%2C%2215_908_3%22%2C%227_1641_3%22%2C%226_576_3%22%2C%226_908_3%22%2C%2214_9346_4%22%5D%2C%5B%22ends+of+phrases%22%2C%2221_5739_5%22%2C%2223_5096_5%22%5D%2C%5B%22ends+of+NPs%2C+upweights+verbs%22%2C%2220_12754_5%22%2C%2221_6697_5%22%5D%2C%5B%22sentential+verbs+%28say+a+verb%29%22%2C%2221_11913_5%22%2C%2218_4181_5%22%2C%2217_2296_3%22%5D%5D&pruningThreshold=0.51&densityThreshold=0.99)\n",
    "- [`The keys on the cabinet` → `are`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-keys-cabinet&pinnedIds=27_708_5%2C27_791_5%2C22_11517_5%2C23_4388_5%2C22_11517_2%2CE_12978_2%2C21_13184_5%2C20_11933_2%2C17_13299_2%2C21_13184_2%2C18_11190_2%2C17_9360_2%2C15_3530_2%2C4_13770_2%2C24_9272_5&clickedId=17_13299_2&pruningThreshold=0.7&densityThreshold=0.99&supernodes=%5B%5B%22Output+plural+verb%22%2C%2227_791_5%22%2C%2227_708_5%22%5D%2C%5B%22key%22%2C%2217_9360_2%22%2C%2215_3530_2%22%2C%224_13770_2%22%5D%2C%5B%22say+are%2Fwere%22%2C%2217_13299_2%22%2C%2222_11517_5%22%2C%2220_11933_2%22%2C%2222_11517_2%22%5D%2C%5B%22ends+of+plural+subjects%22%2C%2224_9272_5%22%2C%2223_4388_5%22%2C%2221_13184_5%22%2C%2221_13184_2%22%5D%5D&clerps=%5B%5B%2222_2211517_2%22%2C%22say+are+%2F+were%22%5D%2C%5B%2220_2011933_2%22%2C%22say+are+%2F+were%22%5D%2C%5B%2222_2211517_5%22%2C%22say+are+%2F+were%22%5D%2C%5B%2215_1503530_2%22%2C%22key%22%5D%2C%5B%224_413770_2%22%2C%22key%22%5D%2C%5B%2217_1713299_2%22%2C%22say+are+%2F+were%22%5D%2C%5B%2223_2304388_5%22%2C%22plural+subject+nouns%22%5D%2C%5B%2221_2113184_5%22%2C%22plural+subject+nouns%22%5D%2C%5B%2221_2113184_2%22%2C%22plural+subject+nons%22%5D%2C%5B%2224_2409272_5%22%2C%22ends+of+plural+subjects+%28say+a+verb%29%22%5D%2C%5B%2218_1811190_2%22%2C%22plural+nouns%22%5D%5D)\n",
    "- [`Fact: Michael Jordan plays the sport of` → `basketball`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-michael-jordan&clerps=%5B%5B%222413541%22%2C%22numbers%22%5D%2C%5B%222308855%22%2C%22basketball%22%5D%2C%5B%222513416%22%2C%22Spanish%22%5D%2C%5B%222104818%22%2C%22basketball%22%5D%2C%5B%222109324%22%2C%22sports%22%5D%2C%5B%222009090%22%2C%22basketball%22%5D%2C%5B%221712431%22%2C%22sports%22%5D%2C%5B%221515208%22%2C%22play%22%5D%2C%5B%221404939%22%2C%22play%22%5D%2C%5B%221915763%22%2C%22sports%22%5D%2C%5B%221812672%22%2C%22basketball%22%5D%2C%5B%221414510%22%2C%22sports%22%5D%2C%5B%22401742%22%2C%22basketball%22%5D%2C%5B%22101173%22%2C%22basketball%22%5D%2C%5B%22411%22%2C%22famous+people+%2F+named+entities%22%5D%5D&pinnedIds=25_5604_7%2C24_763_7%2C22_12304_7%2C23_14585_7%2C24_1668_7%2C20_7544_7%2C17_4855_5%2C24_9503_7%2C17_4886_5%2C14_1031_5%2C13_7451_5%2C4_3134_5%2CE_5897_5%2C1_3977_5%2C13_5661_7%2C11_10532_7%2C11_7419_7%2C10_4451_7%2C6_9719_7%2CE_24632_7%2CE_591_6%2C0_548_7%2C2_12787_7%2C2_12811_7%2C2_4716_7%2C2_8870_7%2C5_10381_7%2C6_3358_7%2C7_7303_7%2C7_4807_7%2C7_15088_7%2C8_4119_7%2C10_2379_7%2C9_15938_7%2C10_8308_7%2C10_11210_7%2C14_922_7%2C15_5076_7%2C14_15510_7%2C15_444_7%2C17_14853_7%2C27_235319_7%2C2_8493_5%2C3_4791_6%2C4_6672_6%2C5_12154_6%2C5_112_6%2C8_15626_7%2C13_945_6%2C9_13890_6%2C5_12910_7%2C14_1031_6%2C14_13599_6%2C21_5066_6%2C13_13476_6%2C16_3033_6%2C25_4062_7%2C15_1301_6%2C6_11788_7%2C7_12830_7%2C11_8369_6%2C0_14394_7%2C0_7197_7%2C0_4370_7%2C0_3410_7%2C0_5548_7%2C0_13190_7%2C0_2592_7%2C1_10188_7%2C15_12642_5%2C23_999_7%2C14_13969_5%2C21_1146_7%2C24_2871_7%2C25_5880_7%2C24_7620_7%2C23_12120_7%2C25_10087_7%2C27_39182_6%2C27_16394_6%2C27_29913_6%2C27_3445_6%2C23_7512_6%2C23_6406_6%2C21_467_6%2C20_5796_6%2C18_10102_6%2C21_14004_6%2C18_10914_6%2C17_14392_6%2C19_6215_6%2C18_6132_6%2C19_3905_6%2C16_5477_5%2C23_15975_6%2C25_5524_6%2C24_15918_6%2C14_13587_5%2C6_3077_5%2C8_15546_5%2C4_12658_5%2C0_10401_5%2C2_4931_5%2CE_4238_5%2C0_5626_5%2C0_8671_5%2C15_15510_5%2C17_6158_5%2C14_9246_3%2C4_4678_3%2C15_10074_3%2CE_78755_3%2C2_11100_3%2C2_16013_3%2C5_15196_3%2C5_15068_3%2C5_13630_3%2C6_9152_3%2C6_4525_3%2C8_3601_3%2C6_13608_3%2C1_1214_3%2C0_10532_3%2C1_11176_3%2C1_2609_3%2C1_15233_3%2C7_15373_3%2C3_4025_3%2C2_2161_3%2C6_10981_3%2C21_14291_6%2C22_6179_6%2C21_4240_6%2C23_8855_8%2C21_9324_8%2C21_4818_8%2C20_9090_8%2C19_5566_8%2C19_15763_8%2C17_12431_8%2C21_4818_5%2C27_21474_8%2C27_48674_8%2C18_12672_5%2C19_15763_5%2C4_1742_4%2C16_11751_5%2C14_14510_4%2C20_9090_5%2C7_852_4%2C6_2181_4%2C1_1173_4%2CE_18853_4%2C17_12431_5%2C15_15208_5%2C14_4939_5%2C6_7377_5%2CE_12258_5%2C7_14700_4%2C16_11751_4%2C18_12672_4%2C21_4818_4%2C21_4818_6%2C18_12672_6%2C16_11751_7%2C18_12672_8&supernodes=%5B%5B%22Output+%5C%22Basketball%5C%22%22%2C%2227_48674_8%22%2C%2227_21474_8%22%5D%2C%5B%22%5C%22play%5C%22%22%2C%2214_4939_5%22%2C%226_7377_5%22%2C%2215_15208_5%22%5D%2C%5B%22sports%22%2C%2214_14510_4%22%2C%227_14700_4%22%5D%2C%5B%22basketball-related+content%22%2C%2218_12672_8%22%2C%2218_12672_5%22%2C%2216_11751_5%22%2C%2218_12672_4%22%2C%2216_11751_7%22%2C%2216_11751_4%22%2C%2218_12672_6%22%5D%2C%5B%22activates+on+%2F+predicts+basketball%22%2C%2221_4818_6%22%2C%2221_4818_4%22%2C%2220_9090_8%22%2C%2221_4818_5%22%2C%2221_4818_8%22%5D%2C%5B%22activates+on+%2F+predicts+sports%22%2C%2220_9090_5%22%2C%2219_15763_5%22%2C%2221_9324_8%22%2C%2217_12431_8%22%2C%2219_15763_8%22%2C%2219_5566_8%22%2C%2223_8855_8%22%2C%2217_12431_5%22%5D%2C%5B%22basketball%22%2C%221_1173_4%22%2C%224_1742_4%22%2C%226_2181_4%22%2C%227_852_4%22%5D%5D&clickedId=4_1742_4)\n",
    "- [`La saison après le printemps s'apelle l'` → `été`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-saison&clerps=%5B%5B%222505999%22%2C%22%27+in+French%22%5D%2C%5B%222409342%22%2C%22%27+%2F+%28+in+French%22%5D%2C%5B%222213022%22%2C%22%27+in+French%22%5D%2C%5B%222115345%22%2C%22%27+in+French%22%5D%2C%5B%222000352%22%2C%22%27+in+French%22%5D%2C%5B%221908645%22%2C%22%27+in+French%22%5D%2C%5B%221801368%22%2C%22%27+in+French%22%5D%2C%5B%222210566%22%2C%22French%22%5D%2C%5B%222302592%22%2C%22French%22%5D%2C%5B%222508028%22%2C%22newline+%2F+%5C%22+in+French%22%5D%2C%5B%222009338%22%2C%22season+%28upweight+summer%29%22%5D%2C%5B%222513952%22%2C%22%27+in+French%22%5D%2C%5B%222410347%22%2C%22%27+in+French%22%5D%2C%5B%222406795%22%2C%22%27+in+French%22%5D%2C%5B%222302467%22%2C%22%27+in+French%22%5D%2C%5B%222403772%22%2C%22French%22%5D%2C%5B%22110212%22%2C%22l%27%22%5D%2C%5B%221206031%22%2C%22an%22%5D%2C%5B%221505966%22%2C%22romance+language+articles%22%5D%2C%5B%221614166%22%2C%22an%22%5D%2C%5B%221404649%22%2C%22dates+%2F+places%22%5D%2C%5B%221509835%22%2C%22summer+%2F+winter%22%5D%2C%5B%221709957%22%2C%22seasons%22%5D%2C%5B%221806471%22%2C%22dates+%2F+issues%22%5D%2C%5B%221706188%22%2C%22an%22%5D%2C%5B%221701777%22%2C%22winter%22%5D%2C%5B%221503399%22%2C%22spring%22%5D%2C%5B%221400457%22%2C%22winter%22%5D%2C%5B%221305925%22%2C%22winter%22%5D%2C%5B%221213955%22%2C%22fall%2Fwinter%2Fspring%22%5D%2C%5B%221115997%22%2C%22winter%2Fspring%22%5D%2C%5B%221013936%22%2C%22seasons%22%5D%2C%5B%22713704%22%2C%22seasons%22%5D%2C%5B%22810683%22%2C%22months%22%5D%2C%5B%22615219%22%2C%22seasons%22%5D%2C%5B%22606253%22%2C%22seasons%2Fmonths%22%5D%2C%5B%22301450%22%2C%22spring%2Fautumn%22%5D%2C%5B%22404241%22%2C%22summer%2Fwinter%22%5D%2C%5B%22215502%22%2C%22parts+of+a+year%22%5D%2C%5B%22211865%22%2C%22August%22%5D%2C%5B%22411540%22%2C%22seasons%22%5D%2C%5B%2224_2409342_11%22%2C%22French%22%5D%2C%5B%2225_2505999_11%22%2C%22French%22%5D%2C%5B%2219_1908645_11%22%2C%22apostrophe+%28French%29%22%5D%2C%5B%2222_2213022_11%22%2C%22apostrophe+%28French%29%22%5D%2C%5B%2223_2302467_11%22%2C%22apostrophe+%28French%29%22%5D%2C%5B%2221_2115345_11%22%2C%22French+function+words%2C+apostrophes%22%5D%2C%5B%2220_2000352_11%22%2C%22apostrophe+%28French%29%22%5D%5D&pinnedIds=27_15331_10%2C20_9338_10%2C24_3772_10%2C25_5999_10%2C24_9342_10%2C25_8028_10%2C24_6795_10%2C23_2467_10%2C25_13952_10%2C21_15345_10%2C24_10347_10%2C22_10566_10%2C22_13022_10%2C23_2592_10%2C20_352_10%2C19_8645_10%2C18_1368_10%2C18_6471_10%2C17_9957_10%2C15_9835_5%2C17_1777_5%2C15_3399_5%2C4_4241_5%2C14_457_5%2C13_5925_5%2C12_13955_5%2C11_15997_5%2C10_13936_5%2C4_11540_5%2C8_10683_5%2C7_13704_5%2C6_6253_5%2C6_15219_5%2C2_11865_5%2C3_1450_5%2CE_82115_5%2CE_235303_10%2C2_15502_5%2C17_6188_10%2C15_5966_10%2C16_14166_10%2C12_6031_10%2C1_10212_10%2CE_533_9%2C27_15331_11%2C20_9338_11%2CE_33754_2%2C24_3772_11%2C24_9342_11%2C24_6795_11%2C25_5999_11%2C21_15345_11%2C20_352_11%2C23_2467_11%2C22_13022_11%2C19_8645_11%2C18_6471_11%2C17_9957_11%2C4_11540_6%2C4_11540_2%2C20_1454_11&supernodes=%5B%5B%22%27+in+French%22%2C%2225_5999_10%22%2C%2218_1368_10%22%2C%2219_8645_10%22%2C%2220_352_10%22%2C%2221_15345_10%22%2C%2222_13022_10%22%2C%2223_2467_10%22%2C%2224_6795_10%22%2C%2224_10347_10%22%2C%2225_13952_10%22%5D%2C%5B%22French%22%2C%2224_3772_10%22%2C%2223_2592_10%22%2C%2222_10566_10%22%5D%2C%5B%22newline+%2F+%5C%22+in+French%22%2C%2225_8028_10%22%2C%2224_9342_10%22%5D%2C%5B%22an%22%2C%2217_6188_10%22%2C%2216_14166_10%22%2C%2212_6031_10%22%5D%2C%5B%22romance+language+articles%22%2C%2215_5966_10%22%2C%221_10212_10%22%5D%2C%5B%22apostrophe+%28French%29%22%2C%2223_2467_11%22%2C%2222_13022_11%22%2C%2219_8645_11%22%2C%2220_352_11%22%2C%2221_15345_11%22%5D%2C%5B%22seasons%22%2C%2217_9957_11%22%2C%2218_6471_11%22%2C%2220_9338_11%22%5D%2C%5B%22words+relating+to+specific+times+of+year%22%2C%224_11540_6%22%2C%224_11540_2%22%5D%2C%5B%22French%22%2C%2220_1454_11%22%2C%2225_5999_11%22%2C%2224_3772_11%22%2C%2224_9342_11%22%2C%2224_6795_11%22%5D%2C%5B%22seasons%22%2C%2211_15997_5%22%2C%2215_9835_5%22%2C%2217_1777_5%22%2C%2215_3399_5%22%2C%2214_457_5%22%2C%2212_13955_5%22%2C%2213_5925_5%22%2C%226_15219_5%22%2C%224_11540_5%22%2C%2210_13936_5%22%2C%227_13704_5%22%2C%226_6253_5%22%2C%228_10683_5%22%2C%222_11865_5%22%2C%222_15502_5%22%2C%223_1450_5%22%2C%224_4241_5%22%5D%5D&clickedId=15_5966_10&pruningThreshold=0.7&densityThreshold=0.99)\n",
    "- [`La estación después de la primavera se llama el` → `verano`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-verano&clerps=%5B%5B%22115093%22%2C%22Spanish%22%5D%2C%5B%222502222%22%2C%22Spanish+articles%22%5D%2C%5B%222513416%22%2C%22Spanish%22%5D%2C%5B%222509334%22%2C%22Spanish%22%5D%2C%5B%222413490%22%2C%22Spanish%22%5D%2C%5B%222403018%22%2C%22Spanish%22%5D%2C%5B%222407980%22%2C%22Spanish+articles%22%5D%2C%5B%222511463%22%2C%22Spanish%22%5D%2C%5B%2213978%22%2C%22romance+languages%22%5D%2C%5B%2215822%22%2C%22romance+languages%22%5D%2C%5B%222000341%22%2C%22Spanish%22%5D%2C%5B%222009338%22%2C%22season+%28upweight+summer%29%22%5D%2C%5B%221709957%22%2C%22season%22%5D%2C%5B%22404241%22%2C%22season%22%5D%2C%5B%22301450%22%2C%22%28time+of%29+year%22%5D%2C%5B%22211865%22%2C%22August%22%5D%2C%5B%221512458%22%2C%22period+%2F+time%22%5D%2C%5B%22215502%22%2C%22months+%2F+quarters+%2F+sessions%22%5D%2C%5B%221701777%22%2C%22winter%22%5D%2C%5B%221806471%22%2C%22months%2Fseasons+%28journals%29%22%5D%2C%5B%22210302%22%2C%22weather%22%5D%5D&pinnedIds=D25_5604_7%2C24_11415_7%2C21_5066_7%2C20_7544_7%2C17_4855_5%2C16_5918_5%2C15_5304_5%2C14_1031_5%2C13_7451_5%2C24_763_7%2C22_12304_7%2C17_4886_5%2C24_9503_7%2C18_8152_5%2C25_10348_7%2C27_1970_7%2C24_396_7%2C4_3134_5%2CE_5897_5%2C23_5764_7%2C22_1913_7%2C19_13898_7%2C16_10380_7%2CE_34643_4%2C0_1572_4%2C1_3698_4%2C1_5935_4%2C4_1222_4%2C15_11422_4%2C16_5419_4%2C4_3441_5%2C4_14794_5%2C14_13599_7%2C23_14585_7%2C23_3981_7%2C20_12133_7%2C23_4927_7%2C22_12727_7%2C22_8530_7%2C23_8141_6%2C24_10734_6%2C18_14893_7%2C24_7668_7%2C23_8141_7%2C24_5668_7%2C25_5842_7%2C25_12858_7%2C23_6380_7%2C24_5451_7%2CE_1995_7%2C0_9260_7%2C1_6198_7%2C2_15673_7%2C6_8381_7%2C5_7433_7%2C6_15662_7%2C12_10924_7%2C18_3321_7%2C18_14215_7%2C18_15589_7%2C27_21474_8%2CE_7888_7%2CE_18853_4%2C27_48674_8%2C27_98463_8%2C23_8855_8%2C21_9324_8%2C21_4818_8%2C20_9090_8%2C19_5566_8%2C19_15763_8%2C17_12431_8%2C4_1742_4%2C6_2181_4%2C7_1844_4%2C7_852_4%2C16_11751_4%2C16_11751_5%2C18_12672_6%2C18_12672_4%2C18_12672_5%2C16_11751_7%2C18_12672_8%2C14_14510_7%2C15_14376_7%2C16_824_8%2C15_10776_7%2C0_16262_7%2C1_5055_7%2C2_46_7%2C7_14700_7%2C4_2977_7%2C16_87_7%2C27_7773_8%2C27_13210_8%2CE_10498_5%2CE_13388_2%2C23_8683_8%2C21_10062_8%2C17_12530_5%2C23_8488_8%2C15_5617_5%2C15_5756_5%2C18_4563_5%2C19_1435_5%2C20_10977_5%2C19_5186_5%2C20_1807_5%2C14_11360_5%2C6_4362_5%2C13_6699_5%2C16_9498_5%2C16_1698_5%2C17_6043_5%2C16_9788_5%2C7_8760_5%2C8_295_5%2C7_1014_5%2C10_10314_5%2C7_9945_5%2C8_5268_5%2C8_6716_5%2C2_4298_5%2C2_8756_5%2C4_2796_5%2C4_11015_3%2C27_34250_9%2C24_13490_9%2C20_9338_9%2C24_3018_9%2C25_9334_9%2C25_7264_9%2C23_4905_9%2C24_15008_9%2C24_7980_9%2C23_7997_9%2C22_15500_9%2C21_7256_9%2C20_341_9%2C17_9957_9%2C18_6471_9%2C15_9835_6%2C14_457_6%2C4_4241_6%2C4_11540_6%2C13_5925_6%2C13_10830_6%2C6_6253_6%2C12_13955_6%2C8_10683_6%2CE_46443_6%2C24_4836_9%2C24_2024_9%2C22_11854_9%2C25_2591_9%2C21_11151_9%2C22_2944_9%2C21_16149_9%2C23_401_9%2C21_3462_9%2C25_8956_9%2C2_11865_6%2C2_10302_6%2C2_5047_6%2C4_166_6%2C6_3193_6%2C2_11940_6%2C1_15055_6%2C5_14249_6%2C8_8830_6%2C6_15219_6%2C21_11772_9%2CE_822_9%2C18_5558_9%2C19_6064_9%2C20_4729_9%2C19_9709_9%2C17_14627_9%2C16_15885_9%2C15_5966_9%2C15_3343_9%2C13_12977_9%2C14_4191_9%2C0_5792_9%2C0_3452_9%2C0_14151_9%2C0_14056_9%2C0_15250_9%2C1_12376_9%2C5_14571_9%2C2_3419_9%2C2_12121_9%2C4_9309_9%2C4_3405_9%2C8_13032_9%2C7_13704_9%2C20_1415_9%2C14_5480_6&supernodes=%5B%5B%22romance+language+articles%22%2C%220_14056_9%22%2C%222_12121_9%22%2C%222_3419_9%22%2C%225_14571_9%22%2C%220_14151_9%22%2C%220_3452_9%22%2C%2217_14627_9%22%2C%224_9309_9%22%2C%224_3405_9%22%2C%2215_3343_9%22%2C%220_5792_9%22%2C%220_15250_9%22%2C%221_12376_9%22%2C%2215_5966_9%22%2C%2216_15885_9%22%2C%2218_5558_9%22%2C%2213_12977_9%22%2C%2219_9709_9%22%2C%2220_4729_9%22%2C%2214_4191_9%22%5D%2C%5B%22Spanish+text%22%2C%2225_2591_9%22%2C%2222_11854_9%22%2C%2223_4905_9%22%2C%2224_7980_9%22%2C%2224_15008_9%22%2C%2225_7264_9%22%2C%2224_2024_9%22%2C%2222_15500_9%22%2C%2221_7256_9%22%2C%2220_341_9%22%2C%2225_9334_9%22%2C%2224_13490_9%22%2C%2224_3018_9%22%2C%2219_6064_9%22%5D%2C%5B%22weather%22%2C%228_8830_6%22%2C%222_5047_6%22%2C%222_10302_6%22%2C%224_166_6%22%2C%226_3193_6%22%5D%2C%5B%22months%22%2C%228_10683_6%22%2C%226_6253_6%22%2C%222_11865_6%22%2C%225_14249_6%22%2C%228_13032_9%22%2C%2221_11772_9%22%5D%2C%5B%22activates+before+seasons%22%2C%2218_6471_9%22%2C%2221_11151_9%22%2C%2223_401_9%22%5D%2C%5B%22activates+before+seasons+%2F+downweights+summer%22%2C%2224_4836_9%22%2C%2222_2944_9%22%2C%2225_8956_9%22%5D%2C%5B%22predict+summer%22%2C%2221_16149_9%22%2C%2220_9338_9%22%2C%2221_3462_9%22%2C%224_4241_6%22%2C%2220_1415_9%22%5D%2C%5B%22seasons%22%2C%2215_9835_6%22%2C%2214_5480_6%22%2C%2223_7997_9%22%2C%224_11540_6%22%2C%2213_10830_6%22%2C%227_13704_9%22%2C%2212_13955_6%22%2C%2217_9957_9%22%2C%2213_5925_6%22%2C%2214_457_6%22%2C%226_15219_6%22%2C%222_11940_6%22%2C%221_15055_6%22%5D%5D&clickedId=22_2944_9)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- [`The girl that the teacher sees` → `is`](<https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-girl-is&clerps=%5B%5D&clickedId=25_11727_6&pinnedIds=27_603_6%2C25_11727_6%2C24_13814_6%2C17_7377_6%2C15_13979_6%2C16_3689_6%2C14_3138_6%2C14_4181_6%2C12_5537_6%2C4_7888_6%2C2_9710_6%2C16_5466_6%2C1_3604_6%2C6_11265_6%2C3_6616_6%2C0_13503_3%2C2_14310_6%2CE_674_3%2CE_4602_2%2C21_3446_6%2C18_15388_6%2C17_14649_6%2C13_3584_6%2C15_4906_6%2CE_17733_6&supernodes=%5B%5B%22see%2Fseen%22%2C%2216_3689_6%22%2C%2217_14649_6%22%2C%223_6616_6%22%2C%226_11265_6%22%5D%2C%5B%22ends%20of%20subject%20NPs%20(with%20reduced%20relative%20clause)%22%2C%2217_7377_6%22%2C%2215_13979_6%22%2C%2216_5466_6%22%5D%2C%5B%22ends%20of%20relative%20clauses%22%2C%2221_3446_6%22%2C%2215_4906_6%22%2C%2214_3138_6%22%2C%2212_5537_6%22%2C%224_7888_6%22%5D%2C%5B%22know%20(in%20relative%20clause)%22%2C%2218_15388_6%22%2C%2213_3584_6%22%5D%2C%5B%22verbs%22%2C%222_14310_6%22%2C%221_3604_6%22%2C%222_9710_6%22%2C%2214_4181_6%22%5D%2C%5B%22ends%20of%20singular%20NPs%22%2C%2224_13814_6%22%2C%2225_11727_6%22%5D%5D>)\n",
    "- [`The girls that the teacher sees` → `are`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-girls-are&pinnedIds=27_708_6%2C25_9974_6%2C22_11517_6%2CE_8216_2%2CE_674_3%2CE_651_1%2C19_1880_6%2C15_13979_6%2C17_7377_6%2C18_703_6%2C16_3689_6%2C15_4906_6%2C15_233_6%2CE_17733_6%2C3_6616_6%2C6_11265_6%2C5_1034_6%2C4_2671_6%2C3_6243_4%2C3_9864_3%2C0_13503_3&clickedId=3_9864_3&supernodes=%5B%5B%22see%2Fsaw%22%2C%2215_233_6%22%2C%226_11265_6%22%2C%223_6616_6%22%5D%2C%5B%22ends+of+noun+phrases+%28predict+a+verb%29%22%2C%2219_1880_6%22%2C%2217_7377_6%22%5D%2C%5B%22verbs+ending+relative+clauses%22%2C%224_2671_6%22%2C%2215_4906_6%22%2C%2215_13979_6%22%2C%2218_703_6%22%5D%2C%5B%22that%22%2C%220_13503_3%22%2C%223_9864_3%22%5D%2C%5B%22say+are%22%2C%2225_9974_6%22%2C%2222_11517_6%22%5D%5D&pruningThreshold=0.7&densityThreshold=0.99&clerps=%5B%5B%2225_2509974_6%22%2C%22say+are%22%5D%2C%5B%225_501034_6%22%2C%22transitive+verbs+with+objects+preceding+htem%22%5D%2C%5B%2216_1603689_6%22%2C%22ends+of+interrogative+clauses%22%5D%2C%5B%223_306243_4%22%2C%22he%22%5D%5D)\n",
    "- [`Fait: Michael Jordan joue au` → `basket`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-basket&clickedId=17_10566_2&clerps=%5B%5B%222308855%22%2C%22basketball%22%5D%2C%5B%222502222%22%2C%22Spanish+articles%22%5D%2C%5B%222513416%22%2C%22Spanish%22%5D%2C%5B%222104818%22%2C%22basketball%22%5D%2C%5B%222109324%22%2C%22sports%22%5D%2C%5B%222009090%22%2C%22basketball%22%5D%2C%5B%221712431%22%2C%22sports%22%5D%2C%5B%221515208%22%2C%22play%22%5D%2C%5B%22401305%22%2C%22game%22%5D%2C%5B%2213978%22%2C%22romance+languages%22%5D%2C%5B%2215822%22%2C%22romance+languages%22%5D%2C%5B%221404939%22%2C%22play%22%5D%2C%5B%221915763%22%2C%22sports%22%5D%2C%5B%221812672%22%2C%22basketball%22%5D%2C%5B%221414510%22%2C%22sports%22%5D%2C%5B%22401742%22%2C%22basketball%22%5D%2C%5B%22101173%22%2C%22basketball%22%5D%2C%5B%22411%22%2C%22famous+people+%2F+named+entities%22%5D%2C%5B%221710566%22%2C%22French%22%5D%5D&pinnedIds=27_12220_7%2CE_18853_5%2C21_4818_7%2C21_9324_7%2C23_3604_7%2C25_14882_7%2C24_15306_7%2C23_15317_7%2C20_9090_7%2C24_3329_7%2C19_15763_7%2C18_12672_7%2C17_12431_7%2C17_5253_7%2C15_15208_7%2C14_4939_7%2C6_7377_7%2CE_78224_6%2C4_1305_7%2C3_305_7%2C24_2086_7%2C24_3772_7%2C21_16354_7%2C20_1454_7%2C23_2592_7%2C22_10566_7%2C23_2554_7%2C17_10566_6%2C0_4076_6%2C14_14575_6%2C7_11689_6%2C4_1742_5%2C1_1173_5%2CE_7939_4&supernodes=%5B%5B%22game%2Fplay%22%2C%223_305_7%22%2C%224_1305_7%22%2C%226_7377_7%22%2C%2215_15208_7%22%2C%2214_4939_7%22%5D%2C%5B%22French%22%2C%220_4076_6%22%2C%227_11689_6%22%2C%2214_14575_6%22%2C%2217_10566_6%22%5D%2C%5B%22basketball%22%2C%2221_4818_7%22%2C%2218_12672_7%22%5D%2C%5B%22sports%22%2C%2217_12431_7%22%2C%2217_5253_7%22%2C%2221_9324_7%22%2C%2220_9090_7%22%2C%2219_15763_7%22%2C%2223_3604_7%22%2C%2223_15317_7%22%5D%2C%5B%22basketball%22%2C%224_1742_5%22%2C%221_1173_5%22%5D%2C%5B%22French%22%2C%2224_3329_7%22%2C%2221_16354_7%22%2C%2220_1454_7%22%2C%2223_2592_7%22%2C%2223_2554_7%22%2C%2224_2086_7%22%2C%2224_15306_7%22%2C%2225_14882_7%22%2C%2224_3772_7%22%2C%2222_10566_7%22%5D%5D)\n",
    "- [`Hecho: Michael Jordan juega al` → `baloncesto`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-michael-jordan-es&clerps=%5B%5B%222308855%22%2C%22basketball%22%5D%2C%5B%222502222%22%2C%22Spanish+articles%22%5D%2C%5B%222513416%22%2C%22Spanish%22%5D%2C%5B%222509334%22%2C%22Spanish%22%5D%2C%5B%222413490%22%2C%22Spanish%22%5D%2C%5B%222403018%22%2C%22Spanish%22%5D%2C%5B%222407980%22%2C%22Spanish+articles%22%5D%2C%5B%222511463%22%2C%22Spanish%22%5D%2C%5B%222104818%22%2C%22basketball%22%5D%2C%5B%222109324%22%2C%22sports%22%5D%2C%5B%222009090%22%2C%22basketball%22%5D%2C%5B%221712431%22%2C%22sports%22%5D%2C%5B%221515208%22%2C%22play%22%5D%2C%5B%22401305%22%2C%22game%22%5D%2C%5B%22109339%22%2C%22a%2Fal+in+Spanish%22%5D%2C%5B%2213978%22%2C%22romance+languages%22%5D%2C%5B%2215822%22%2C%22romance+languages%22%5D%2C%5B%221404939%22%2C%22play%22%5D%2C%5B%221915763%22%2C%22sports%22%5D%2C%5B%221812672%22%2C%22basketball%22%5D%2C%5B%221414510%22%2C%22sports%22%5D%2C%5B%22401742%22%2C%22basketball%22%5D%2C%5B%22101173%22%2C%22basketball%22%5D%2C%5B%22411%22%2C%22famous+people+%2F+named+entities%22%5D%2C%5B%222000341%22%2C%22Spanish%22%5D%2C%5B%220_411_4%22%2C%22famous+people+%2F+named+entities%22%5D%5D&pinnedIds=27_143831_6%2C25_13416_6%2C24_3018_6%2C25_9334_6%2C24_13490_6%2C25_2222_6%2C24_7980_6%2C25_11463_6%2C21_9324_6%2C21_4818_6%2C23_8855_6%2C20_9090_6%2C17_12431_6%2C15_15208_6%2C14_4939_6%2C4_1305_6%2C1_9339_6%2CE_113501_5%2C0_13978_5%2C0_15822_5%2CE_717_6%2C19_15763_6%2C18_12672_6%2C4_1742_4%2C14_14510_4%2C1_1173_4%2CE_18853_4%2CE_7939_3%2C0_411_4%2C20_341_6&supernodes=%5B%5B%22basketball%22%2C%2220_9090_6%22%2C%2218_12672_6%22%2C%2221_4818_6%22%2C%2223_8855_6%22%5D%2C%5B%22play%22%2C%224_1305_6%22%2C%2214_4939_6%22%2C%2215_15208_6%22%5D%2C%5B%22basketball%22%2C%224_1742_4%22%2C%221_1173_4%22%5D%2C%5B%22romance+language%22%2C%221_9339_6%22%2C%220_15822_5%22%2C%220_13978_5%22%5D%2C%5B%22Spanish%22%2C%2225_9334_6%22%2C%2225_13416_6%22%2C%2224_13490_6%22%2C%2224_7980_6%22%2C%2224_3018_6%22%2C%2225_2222_6%22%2C%2225_11463_6%22%2C%2220_341_6%22%5D%2C%5B%22sports%22%2C%2217_12431_6%22%2C%2219_15763_6%22%2C%2221_9324_6%22%2C%2214_14510_4%22%5D%5D&clickedId=20_341_6&pruningThreshold=0.7&densityThreshold=0.99)\n",
    "- [`2 + 1 =` → `3`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-addition2&clerps=%5B%5B%222510077%22%2C%22three%22%5D%2C%5B%222411752%22%2C%22three%22%5D%2C%5B%222411880%22%2C%22three%22%5D%2C%5B%222508798%22%2C%221%2F2%2F3%22%5D%2C%5B%222414176%22%2C%221%2F2%2F3%22%5D%2C%5B%222309832%22%2C%221%2F2%2F3%22%5D%2C%5B%222413541%22%2C%22numbers%22%5D%2C%5B%222108824%22%2C%22numbers%22%5D%2C%5B%222004320%22%2C%22numbers%22%5D%2C%5B%222308901%22%2C%22numbers%22%5D%2C%5B%222004597%22%2C%22three%22%5D%2C%5B%221813696%22%2C%22two%22%5D%2C%5B%221605482%22%2C%226%22%5D%2C%5B%221505881%22%2C%22numbers%22%5D%2C%5B%221411498%22%2C%22numbers%20in%20equations%22%5D%2C%5B%221304198%22%2C%22numbers%20in%20equations%22%5D%2C%5B%223772%22%2C%22numbers%20in%20dates%22%5D%2C%5B%221405044%22%2C%22numbers%22%5D%2C%5B%221310982%22%2C%22numbers%22%5D%2C%5B%22509048%22%2C%22numbers%20in%20equations%22%5D%2C%5B%22409653%22%2C%22%3D%22%5D%2C%5B%22407037%22%2C%22%3D%22%5D%5D&pinnedIds=27_235304_6%2C25_10077_6%2C24_14176_6%2C24_11880_6%2C25_8798_6%2C24_13541_6%2C23_9832_6%2C24_11752_6%2C24_11880_5%2C21_8824_6%2C20_4320_6%2C20_4597_6%2C23_8901_6%2C18_13696_1%2C16_5482_1%2C15_5881_6%2C14_5044_6%2C14_11498_6%2C13_4198_6%2C0_3772_6%2CE_235248_6%2CE_589_5%2CE_963_2%2C13_10982_6%2C5_9048_5%2C4_9653_5%2C4_7037_5&clickedId=18_13696_1&supernodes=%5B%5B%22%3D%22%2C%224_9653_5%22%2C%224_7037_5%22%5D%2C%5B%22three%22%2C%2224_11880_6%22%2C%2224_11752_6%22%2C%2225_10077_6%22%2C%2220_4597_6%22%2C%2224_11880_5%22%5D%2C%5B%221%2F2%2F3%22%2C%2225_8798_6%22%2C%2224_14176_6%22%2C%2223_9832_6%22%5D%2C%5B%22numbers%20in%20equations%22%2C%2213_4198_6%22%2C%2214_11498_6%22%2C%225_9048_5%22%5D%2C%5B%22numbers%22%2C%220_3772_6%22%2C%2213_10982_6%22%2C%2214_5044_6%22%2C%2215_5881_6%22%2C%2220_4320_6%22%2C%2221_8824_6%22%2C%2223_8901_6%22%2C%2224_13541_6%22%5D%5D)\n",
    "- [`3 + 5 =` → `8`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-addition&clerps=%5B%5D&clickedId=15_15323_6&pinnedIds=27_235321_6%2C24_14176_6%2C25_14682_6%2C24_2998_6%2C23_3553_6%2C23_7436_6%2C21_6231_6%2C20_10440_6%2C18_14883_6%2C19_1887_6%2C22_1234_6%2C20_14337_6%2C18_11535_6%2C17_15809_6%2C15_15323_6%2C15_15686_6%2C14_9687_6%2C13_1364_6%2C12_6768_6%2C8_11121_5%2C4_9539_5%2C4_7037_5%2C2_3014_5%2CE_589_5%2CE_963_2%2CE_235304_4%2CE_235308_1&supernodes=%5B%5B%22%3D%22%2C%224_7037_5%22%2C%224_9539_5%22%2C%222_3014_5%22%2C%228_11121_5%22%5D%2C%5B%22eight%22%2C%2222_1234_6%22%2C%2223_7436_6%22%2C%2224_2998_6%22%2C%2225_14682_6%22%5D%2C%5B%22space%20after%20%3D%20%22%2C%2212_6768_6%22%2C%2213_1364_6%22%2C%2214_9687_6%22%5D%2C%5B%22numbers%22%2C%2215_15686_6%22%2C%2217_15809_6%22%2C%2215_15323_6%22%2C%2218_11535_6%22%2C%2219_1887_6%22%2C%2218_14883_6%22%2C%2220_10440_6%22%5D%2C%5B%22other%20specific-ish%20numbers%22%2C%2223_3553_6%22%2C%2224_14176_6%22%2C%2221_6231_6%22%2C%2220_14337_6%22%5D%5D) \n",
    "- [`Mexico:Spanish :: US:` → `English`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-english&pinnedIds=17_6498_6%2C11_105416_6%2C9_58293_3%2C10_2257_3%2C9_49585_3%2C5_6113_3%2CE_15506_3%2C10_813_6%2CE_2326_5%2C2_37574_5%2C4_75078_5%2C13_10254_6%2C11_97996_6%2C12_57025_6%2C12_20491_6%2C11_15631_6%2C0_74386_5%2C1_66436_5%2C27_12023_6%2C27_4645_6%2C23_2040_6%2C21_12621_6%2C19_13366_6%2C17_14947_6%2C15_6419_3%2C15_6004_5%2C15_15954_6%2C14_2792_5%2C4_10245_5%2C12_2799_5%2C2_15206_5%2CE_2379_5%2C0_3071_5%2C8_150_3%2C6_1509_3%2C6_5150_3%2C4_12658_3%2C2_13002_3%2C4_3761_3%2C2_15890_3%2C1_7569_3%2CE_51590_3%2C0_1701_1%2CE_40788_1%2C0_1701_3%2C9_11486_3%2C9_16135_3%2C14_1107_3%2C16_10591_6&supernodes=%5B%5B%22language%22%2C%225_6113_3%22%2C%229_58293_3%22%2C%229_49585_3%22%2C%2210_2257_3%22%5D%2C%5B%22languages+%28upweight+English%29%22%2C%2211_97996_6%22%2C%2211_105416_6%22%5D%2C%5B%22US%22%2C%221_66436_5%22%2C%220_74386_5%22%2C%2210_813_6%22%2C%222_37574_5%22%2C%224_75078_5%22%5D%2C%5B%22Output+%5C%22English%5C%22%22%2C%2227_12023_6%22%2C%2227_4645_6%22%5D%2C%5B%22US%22%2C%2212_2799_5%22%2C%220_3071_5%22%2C%222_15206_5%22%2C%224_10245_5%22%2C%2214_2792_5%22%2C%2215_6004_5%22%5D%2C%5B%22Spanish%22%2C%220_1701_1%22%2C%220_1701_3%22%2C%221_7569_3%22%5D%2C%5B%22languages+%2F+demonyms%22%2C%229_16135_3%22%2C%222_15890_3%22%2C%224_3761_3%22%2C%228_150_3%22%2C%2215_6419_3%22%2C%2214_1107_3%22%2C%226_5150_3%22%2C%226_1509_3%22%2C%224_12658_3%22%2C%222_13002_3%22%2C%229_11486_3%22%5D%2C%5B%22%28English%29+language%22%2C%2219_13366_6%22%2C%2216_10591_6%22%2C%2217_14947_6%22%2C%2221_12621_6%22%5D%5D&clickedId=languagedemonyms&pruningThreshold=0.7&densityThreshold=0.99&clerps=%5B%5B%2215_1515954_6%22%2C%22in%22%5D%2C%5B%2223_2302040_6%22%2C%22countries+%2F+Europe%22%5D%2C%5B%2219_1913366_6%22%2C%22language%22%5D%5D)\n",
    "- [`Mexico:peso :: US:` → `dollar`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-dollar&clerps=%5B%5B%222413541%22%2C%22numbers%22%5D%2C%5B%221415770%22%2C%22punctuation+in+parallel+constructions%22%5D%2C%5B%221100145%22%2C%22analogies%2Fpairs%22%5D%2C%5B%22611805%22%2C%22pair+punctuation%22%5D%2C%5B%221506419%22%2C%22language%22%5D%2C%5B%221701%22%2C%22Spanish+names%22%5D%2C%5B%22601509%22%2C%22demonyms%22%5D%2C%5B%221515954%22%2C%22punctuation+before+countries%2Fdemonyms%22%5D%2C%5B%221803709%22%2C%22Canada%22%5D%2C%5B%221606532%22%2C%22US%22%5D%2C%5B%221506004%22%2C%22US%22%5D%2C%5B%221402792%22%2C%22US%22%5D%2C%5B%22215206%22%2C%22US%22%5D%2C%5B%223071%22%2C%22US%22%5D%2C%5B%22410245%22%2C%22US%22%5D%2C%5B%222307512%22%2C%22currency%22%5D%2C%5B%222005796%22%2C%22predict+dollar%22%5D%2C%5B%221810914%22%2C%22currency%22%5D%2C%5B%221409246%22%2C%22currency%22%5D%2C%5B%22712199%22%2C%22currency%22%5D%2C%5B%22515068%22%2C%22currency%22%5D%2C%5B%222513762%22%2C%22predict+D*%22%5D%2C%5B%222400289%22%2C%22predict+D*%22%5D%2C%5B%222105652%22%2C%22predict+dollar%22%5D%2C%5B%221810102%22%2C%22exchange+rates%22%5D%2C%5B%221714392%22%2C%22currency%22%5D%2C%5B%221510074%22%2C%22dollar%22%5D%2C%5B%22404678%22%2C%22currency%22%5D%2C%5B%22211100%22%2C%22currency%22%5D%2C%5B%22101214%22%2C%22currency%22%5D%2C%5B%221910746%22%2C%22dollar%22%5D%2C%5B%221906095%22%2C%22money%22%5D%2C%5B%221906215%22%2C%22predict+dollar%22%5D%2C%5B%221904447%22%2C%22US+coins%22%5D%2C%5B%22602102%22%2C%22money%22%5D%2C%5B%22911294%22%2C%22currency%22%5D%2C%5B%22913858%22%2C%22rates+%2F+shares+%2F+costs%22%5D%2C%5B%221710931%22%2C%22units%22%5D%2C%5B%22115093%22%2C%22Spanish%22%5D%2C%5B%22605150%22%2C%22countries%22%5D%2C%5B%222513416%22%2C%22Spanish%22%5D%2C%5B%222413490%22%2C%22Spanish%22%5D%2C%5B%2220_2005796_6%22%2C%22say+dollar%22%5D%2C%5B%2216_1601778_6%22%2C%22dollar%2Fcurrency%22%5D%5D&pinnedIds=27_64091_6%2C27_18289_6%2C14_9246_3%2C7_12199_3%2C5_15068_3%2C20_5796_6%2C18_10914_6%2C25_13762_6%2C21_5652_6%2C23_7512_6%2C24_289_6%2C18_10102_6%2C17_14392_6%2C4_4678_3%2CE_78755_3%2C2_11100_3%2C19_4447_6%2C19_6215_6%2C19_6095_6%2C19_10746_6%2C15_10074_3%2CE_2379_5%2C0_3071_5%2C2_15206_5%2C4_10245_5%2C4_10245_6%2C14_2792_5%2C15_6004_5%2C16_6532_6%2C21_5652_5%2C18_3709_6%2C18_10102_5%2C6_2102_3%2C9_11294_3%2C9_13858_3%2C16_1778_6%2C17_4027_6%2C17_8064_6&supernodes=%5B%5B%22Output+%5C%22dollar%5C%22%22%2C%2227_64091_6%22%2C%2227_18289_6%22%5D%2C%5B%22predict+D*%22%2C%2224_289_6%22%2C%2225_13762_6%22%5D%2C%5B%22dollar%22%2C%2219_6215_6%22%2C%2219_4447_6%22%2C%2219_10746_6%22%2C%2221_5652_6%22%5D%2C%5B%22currency%22%2C%2214_9246_3%22%2C%227_12199_3%22%2C%225_15068_3%22%2C%224_4678_3%22%2C%222_11100_3%22%2C%229_11294_3%22%2C%229_13858_3%22%2C%2215_10074_3%22%2C%226_2102_3%22%5D%2C%5B%22US%22%2C%2214_2792_5%22%2C%2215_6004_5%22%2C%224_10245_5%22%2C%222_15206_5%22%2C%220_3071_5%22%5D%2C%5B%22US%22%2C%224_10245_6%22%2C%2216_6532_6%22%2C%2218_3709_6%22%5D%2C%5B%22currency+exchange%22%2C%2218_10102_6%22%2C%2218_10102_5%22%5D%2C%5B%22currency%22%2C%2217_4027_6%22%2C%2217_8064_6%22%2C%2216_1778_6%22%2C%2217_14392_6%22%2C%2218_10914_6%22%2C%2219_6095_6%22%2C%2223_7512_6%22%5D%5D&pruningThreshold=0.7&densityThreshold=0.99)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "#@title Colab Setup Environment\n",
    "\n",
    "try:\n",
    "    import google.colab\n",
    "    !mkdir -p repository && cd repository && \\\n",
    "     git clone https://github.com/safety-research/circuit-tracer && \\\n",
    "     curl -LsSf https://astral.sh/uv/install.sh | sh && \\\n",
    "     uv pip install -e circuit-tracer/\n",
    "\n",
    "    import sys\n",
    "    from huggingface_hub import notebook_login\n",
    "    sys.path.append('repository/circuit-tracer')\n",
    "    sys.path.append('repository/circuit-tracer/demos')\n",
    "    notebook_login(new_session=False)\n",
    "    IN_COLAB = True\n",
    "except ImportError:\n",
    "    IN_COLAB = False"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from collections import namedtuple\n",
    "from functools import partial\n",
    "\n",
    "import torch \n",
    "\n",
    "from circuit_tracer.replacement_model import ReplacementModel\n",
    "\n",
    "from circuit_tracer.utils.decode_url_features import decode_url_features\n",
    "from url import display_topk_token_predictions, display_generations_comparison"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9798296645694d769c5872bfd65b22b8",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Fetching 26 files:   0%|          | 0/26 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "bde1a4dbe8db4ac38bb1ead5805b4372",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loaded pretrained model google/gemma-2-2b into HookedTransformer\n"
     ]
    }
   ],
   "source": [
    "model = ReplacementModel.from_pretrained(\"google/gemma-2-2b\", 'gemma', dtype=torch.bfloat16)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "Feature = namedtuple('Feature', ['layer', 'pos', 'feature_idx'])\n",
    "\n",
    "# a display function that needs the model's tokenizer\n",
    "display_topk_token_predictions = partial(display_topk_token_predictions, tokenizer=model.tokenizer)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example: Spanish / French\n",
    "\n",
    "First: we have two factual recall sentences in Spanish and in French, both of which translate to \"Fact: Michael Jordan plays the sport of\". Can we change the language of the first from French to Spanish? Here is the circuit of the French sentence:\n",
    "[`Fait: Michael Jordan joue au`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-basket&clickedId=17_10566_2&clerps=%5B%5B%222308855%22%2C%22basketball%22%5D%2C%5B%222502222%22%2C%22Spanish+articles%22%5D%2C%5B%222513416%22%2C%22Spanish%22%5D%2C%5B%222104818%22%2C%22basketball%22%5D%2C%5B%222109324%22%2C%22sports%22%5D%2C%5B%222009090%22%2C%22basketball%22%5D%2C%5B%221712431%22%2C%22sports%22%5D%2C%5B%221515208%22%2C%22play%22%5D%2C%5B%22401305%22%2C%22game%22%5D%2C%5B%2213978%22%2C%22romance+languages%22%5D%2C%5B%2215822%22%2C%22romance+languages%22%5D%2C%5B%221404939%22%2C%22play%22%5D%2C%5B%221915763%22%2C%22sports%22%5D%2C%5B%221812672%22%2C%22basketball%22%5D%2C%5B%221414510%22%2C%22sports%22%5D%2C%5B%22401742%22%2C%22basketball%22%5D%2C%5B%22101173%22%2C%22basketball%22%5D%2C%5B%22411%22%2C%22famous+people+%2F+named+entities%22%5D%2C%5B%221710566%22%2C%22French%22%5D%5D&pinnedIds=27_12220_7%2CE_18853_5%2C21_4818_7%2C21_9324_7%2C23_3604_7%2C25_14882_7%2C24_15306_7%2C23_15317_7%2C20_9090_7%2C24_3329_7%2C19_15763_7%2C18_12672_7%2C17_12431_7%2C17_5253_7%2C15_15208_7%2C14_4939_7%2C6_7377_7%2CE_78224_6%2C4_1305_7%2C3_305_7%2C24_2086_7%2C24_3772_7%2C21_16354_7%2C20_1454_7%2C23_2592_7%2C22_10566_7%2C23_2554_7%2C17_10566_6%2C0_4076_6%2C14_14575_6%2C7_11689_6%2C4_1742_5%2C1_1173_5%2CE_7939_4&supernodes=%5B%5B%22game%2Fplay%22%2C%223_305_7%22%2C%224_1305_7%22%2C%226_7377_7%22%2C%2215_15208_7%22%2C%2214_4939_7%22%5D%2C%5B%22French%22%2C%220_4076_6%22%2C%227_11689_6%22%2C%2214_14575_6%22%2C%2217_10566_6%22%5D%2C%5B%22basketball%22%2C%2221_4818_7%22%2C%2218_12672_7%22%5D%2C%5B%22sports%22%2C%2217_12431_7%22%2C%2217_5253_7%22%2C%2221_9324_7%22%2C%2220_9090_7%22%2C%2219_15763_7%22%2C%2223_3604_7%22%2C%2223_15317_7%22%5D%2C%5B%22basketball%22%2C%224_1742_5%22%2C%221_1173_5%22%5D%2C%5B%22French%22%2C%2224_3329_7%22%2C%2221_16354_7%22%2C%2220_1454_7%22%2C%2223_2592_7%22%2C%2223_2554_7%22%2C%2224_2086_7%22%2C%2224_15306_7%22%2C%2225_14882_7%22%2C%2224_3772_7%22%2C%2222_10566_7%22%5D%5D).\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/safety-research/circuit-tracer/main/demos/img/gemma/mj-basketball-fr.png\" width=\"400\" />\n",
    "\n",
    "We'll extract just one French feature, over the final two tokens.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_french = \"Fait: Michael Jordan joue au\"  # The sentence we're intervening on\n",
    "french_feature = Feature(layer=20, pos=slice(6,8), feature_idx=1454)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The attribution graph for the Spanish sentence looks like this:\n",
    "[`Hecho: Michael Jordan juega al` → `baloncesto`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-michael-jordan-es&clerps=%5B%5B%222308855%22%2C%22basketball%22%5D%2C%5B%222502222%22%2C%22Spanish+articles%22%5D%2C%5B%222513416%22%2C%22Spanish%22%5D%2C%5B%222509334%22%2C%22Spanish%22%5D%2C%5B%222413490%22%2C%22Spanish%22%5D%2C%5B%222403018%22%2C%22Spanish%22%5D%2C%5B%222407980%22%2C%22Spanish+articles%22%5D%2C%5B%222511463%22%2C%22Spanish%22%5D%2C%5B%222104818%22%2C%22basketball%22%5D%2C%5B%222109324%22%2C%22sports%22%5D%2C%5B%222009090%22%2C%22basketball%22%5D%2C%5B%221712431%22%2C%22sports%22%5D%2C%5B%221515208%22%2C%22play%22%5D%2C%5B%22401305%22%2C%22game%22%5D%2C%5B%22109339%22%2C%22a%2Fal+in+Spanish%22%5D%2C%5B%2213978%22%2C%22romance+languages%22%5D%2C%5B%2215822%22%2C%22romance+languages%22%5D%2C%5B%221404939%22%2C%22play%22%5D%2C%5B%221915763%22%2C%22sports%22%5D%2C%5B%221812672%22%2C%22basketball%22%5D%2C%5B%221414510%22%2C%22sports%22%5D%2C%5B%22401742%22%2C%22basketball%22%5D%2C%5B%22101173%22%2C%22basketball%22%5D%2C%5B%22411%22%2C%22famous+people+%2F+named+entities%22%5D%2C%5B%222000341%22%2C%22Spanish%22%5D%2C%5B%220_411_4%22%2C%22famous+people+%2F+named+entities%22%5D%5D&pinnedIds=27_143831_6%2C25_13416_6%2C24_3018_6%2C25_9334_6%2C24_13490_6%2C25_2222_6%2C24_7980_6%2C25_11463_6%2C21_9324_6%2C21_4818_6%2C23_8855_6%2C20_9090_6%2C17_12431_6%2C15_15208_6%2C14_4939_6%2C4_1305_6%2C1_9339_6%2CE_113501_5%2C0_13978_5%2C0_15822_5%2CE_717_6%2C19_15763_6%2C18_12672_6%2C4_1742_4%2C14_14510_4%2C1_1173_4%2CE_18853_4%2CE_7939_3%2C0_411_4%2C20_341_6&supernodes=%5B%5B%22basketball%22%2C%2220_9090_6%22%2C%2218_12672_6%22%2C%2221_4818_6%22%2C%2223_8855_6%22%5D%2C%5B%22play%22%2C%224_1305_6%22%2C%2214_4939_6%22%2C%2215_15208_6%22%5D%2C%5B%22basketball%22%2C%224_1742_4%22%2C%221_1173_4%22%5D%2C%5B%22romance+language%22%2C%221_9339_6%22%2C%220_15822_5%22%2C%220_13978_5%22%5D%2C%5B%22Spanish%22%2C%2225_9334_6%22%2C%2225_13416_6%22%2C%2224_13490_6%22%2C%2224_7980_6%22%2C%2224_3018_6%22%2C%2225_2222_6%22%2C%2225_11463_6%22%2C%2220_341_6%22%5D%2C%5B%22sports%22%2C%2217_12431_6%22%2C%2219_15763_6%22%2C%2221_9324_6%22%2C%2214_14510_4%22%5D%5D&clickedId=20_341_6&pruningThreshold=0.7&densityThreshold=0.99)\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/safety-research/circuit-tracer/main/demos/img/gemma/mj-basketball-es.png\" width=\"400\" />\n",
    "\n",
    "We'll again extract a Spanish feature. Ideally, we'll be able to change the language of the next token to Spanish by turning the French feature off and the Spanish feature on."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_spanish = \"Hecho: Michael Jordan juega al\"  # The sentence we're getting the spanish feature from\n",
    "spanish_feature = Feature(layer=20, pos=slice(6,8), feature_idx=341)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# turn the french feature off, and the spanish feature on to its value in the spanish sentence\n",
    "_, spanish_activations = model.get_activations(s_spanish)\n",
    "interventions = [(*french_feature, 0), (*spanish_feature, 10*spanish_activations[spanish_feature])]\n",
    "new_logits, _ = model.feature_intervention(s_french, interventions)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    .token-viz {\n",
       "        font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;\n",
       "        margin-bottom: 10px;\n",
       "        max-width: 700px;\n",
       "    }\n",
       "    .token-viz .header {\n",
       "        font-weight: bold;\n",
       "        font-size: 14px;\n",
       "        margin-bottom: 3px;\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        color: white;\n",
       "        display: inline-block;\n",
       "    }\n",
       "    .token-viz .sentence {\n",
       "        background-color: rgba(200, 200, 200, 0.2);\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        border: 1px solid rgba(100, 100, 100, 0.5);\n",
       "        font-family: monospace;\n",
       "        margin-bottom: 8px;\n",
       "        font-weight: 500;\n",
       "        font-size: 14px;\n",
       "    }\n",
       "    .token-viz table {\n",
       "        width: 100%;\n",
       "        border-collapse: collapse;\n",
       "        margin-bottom: 8px;\n",
       "        font-size: 13px;\n",
       "        table-layout: fixed;\n",
       "    }\n",
       "    .token-viz th {\n",
       "        text-align: left;\n",
       "        padding: 4px 6px;\n",
       "        font-weight: bold;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        background-color: rgba(200, 200, 200, 0.3);\n",
       "    }\n",
       "    .token-viz td {\n",
       "        padding: 3px 6px;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        font-weight: 500;\n",
       "        overflow: hidden;\n",
       "        text-overflow: ellipsis;\n",
       "        white-space: nowrap;\n",
       "    }\n",
       "    .token-viz .token-col {\n",
       "        width: 20%;\n",
       "    }\n",
       "    .token-viz .prob-col {\n",
       "        width: 15%;\n",
       "    }\n",
       "    .token-viz .dist-col {\n",
       "        width: 65%;\n",
       "    }\n",
       "    .token-viz .monospace {\n",
       "        font-family: monospace;\n",
       "    }\n",
       "    .token-viz .bar-container {\n",
       "        display: flex;\n",
       "        align-items: center;\n",
       "    }\n",
       "    .token-viz .bar {\n",
       "        height: 12px;\n",
       "        min-width: 2px;\n",
       "    }\n",
       "    .token-viz .bar-text {\n",
       "        margin-left: 6px;\n",
       "        font-weight: 500;\n",
       "        font-size: 12px;\n",
       "    }\n",
       "    .token-viz .even-row {\n",
       "        background-color: rgba(240, 240, 240, 0.1);\n",
       "    }\n",
       "    .token-viz .odd-row {\n",
       "        background-color: rgba(255, 255, 255, 0.1);\n",
       "    }\n",
       "    </style>\n",
       "    \n",
       "    <div class=\"token-viz\">\n",
       "        <div class=\"header\" style=\"background-color: #555555;\">Input Sentence:</div>\n",
       "        <div class=\"sentence\">Fait: Michael Jordan joue au</div>\n",
       "        \n",
       "        <div>\n",
       "            <div class=\"header\" style=\"background-color: #2471A3;\">Original Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" basket\"> basket</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.566</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 100%;\"></div>\n",
       "                                <span class=\"bar-text\">56.6%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" basketball\"> basketball</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.111</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 19%;\"></div>\n",
       "                                <span class=\"bar-text\">11.1%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" golf\"> golf</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.067</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 11%;\"></div>\n",
       "                                <span class=\"bar-text\">6.7%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" baseball\"> baseball</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.041</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 7%;\"></div>\n",
       "                                <span class=\"bar-text\">4.1%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" football\"> football</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.028</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 4%;\"></div>\n",
       "                                <span class=\"bar-text\">2.8%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "            \n",
       "            <div class=\"header\" style=\"background-color: #27AE60;\">New Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" baloncesto\"> baloncesto</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.275</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 48%;\"></div>\n",
       "                                <span class=\"bar-text\">27.5%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" golf\"> golf</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.188</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 33%;\"></div>\n",
       "                                <span class=\"bar-text\">18.8%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" fútbol\"> fútbol</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.061</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 10%;\"></div>\n",
       "                                <span class=\"bar-text\">6.1%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" béisbol\"> béisbol</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.048</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 8%;\"></div>\n",
       "                                <span class=\"bar-text\">4.8%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" basketball\"> basketball</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.048</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 8%;\"></div>\n",
       "                                <span class=\"bar-text\">4.8%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "        </div>\n",
       "    </div>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "with torch.inference_mode():\n",
    "    original_logits = model(s_french)\n",
    "\n",
    "display_topk_token_predictions(s_french, original_logits, new_logits)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example: Basketball to football\n",
    "\n",
    "We can similarly change the output sport from basketball to football! We can look at the graph of a very similar sentence: [`Tom Brady plays the sport of` → `football`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-tom-brady&clerps=%5B%5B%222413541%22%2C%22numbers%22%5D%2C%5B%222308855%22%2C%22sports%22%5D%2C%5B%222513416%22%2C%22Spanish%22%5D%2C%5B%222109324%22%2C%22sports%22%5D%2C%5B%222009090%22%2C%22basketball%22%5D%2C%5B%221712431%22%2C%22sports%22%5D%2C%5B%221515208%22%2C%22play%22%5D%2C%5B%221404939%22%2C%22play%22%5D%2C%5B%221915763%22%2C%22sports%22%5D%2C%5B%221414510%22%2C%22sports%22%5D%2C%5B%22411%22%2C%22famous+people+%2F+named+entities%22%5D%2C%5B%222303604%22%2C%22sports+%2F+table+tennis+%2F+pool+%22%5D%2C%5B%222209794%22%2C%22pro+sports%22%5D%2C%5B%222515927%22%2C%22football+%28teams%29%22%5D%2C%5B%222311475%22%2C%22high-impact+sports%22%5D%2C%5B%222414466%22%2C%22football+teams+%2F+players%22%5D%2C%5B%222111560%22%2C%22NFL+%2F+football%22%5D%2C%5B%222305036%22%2C%22football%22%5D%2C%5B%222210402%22%2C%22football%22%5D%2C%5B%221605039%22%2C%22football%22%5D%2C%5B%221909053%22%2C%22football%22%5D%2C%5B%22615005%22%2C%22football%22%5D%2C%5B%22106611%22%2C%22football%22%5D%2C%5B%221703%22%2C%22football%22%5D%2C%5B%22116349%22%2C%22football%22%5D%2C%5B%22502146%22%2C%22famous+people+%2F+named+entities%22%5D%2C%5B%222112274%22%2C%22pro+sports%22%5D%5D&pinnedIds=27_9715_6%2C23_8855_6%2C22_9794_6%2C21_12274_6%2C21_9324_6%2C20_9090_6%2C19_15763_6%2C17_12431_6%2C15_15208_3%2C23_3604_6%2C21_11560_6%2C23_5036_6%2C22_10402_6%2C24_14466_6%2C23_11475_6%2C25_15927_6%2C19_9053_6%2C16_5039_6%2C6_15005_2%2C5_2146_2%2C1_16349_2%2C1_6611_2%2CE_46432_2%2C0_411_2%2CE_9983_1%2C0_1703_2%2C14_4939_3%2CE_12258_3&supernodes=%5B%5B%22pro+sports%22%2C%2221_12274_6%22%2C%2222_9794_6%22%5D%2C%5B%22football%22%2C%2216_5039_6%22%2C%2219_9053_6%22%2C%2221_11560_6%22%2C%2222_10402_6%22%2C%2223_5036_6%22%2C%2224_14466_6%22%2C%2223_11475_6%22%2C%2225_15927_6%22%5D%2C%5B%22famous+people+%2F+named+entities%22%2C%220_411_2%22%2C%225_2146_2%22%5D%2C%5B%22sports%22%2C%2217_12431_6%22%2C%2223_8855_6%22%2C%2219_15763_6%22%2C%2221_9324_6%22%2C%2220_9090_6%22%5D%2C%5B%22football%22%2C%220_1703_2%22%2C%221_16349_2%22%2C%226_15005_2%22%2C%221_6611_2%22%5D%2C%5B%22play%22%2C%2214_4939_3%22%2C%2215_15208_3%22%5D%5D&clickedId=16_5039_6).\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/safety-research/circuit-tracer/main/demos/img/gemma/tom-brady.png\" width=\"400\" />\n",
    "\n",
    "Here, what we need to do is turn on a football feature from this graph, while turning off a basketball feature from the prior graph. Can we? First, we'll select a football feature from this example, and a basketball feature from [`Fait: Michael Jordan joue au`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-basket&clickedId=18_12672_7&clerps=%5B%5B%222308855%22%2C%22sports%22%5D%2C%5B%222502222%22%2C%22Spanish+articles%22%5D%2C%5B%222513416%22%2C%22Spanish%22%5D%2C%5B%222104818%22%2C%22basketball%22%5D%2C%5B%222109324%22%2C%22sports%22%5D%2C%5B%222009090%22%2C%22basketball%22%5D%2C%5B%221712431%22%2C%22sports%22%5D%2C%5B%221515208%22%2C%22play%22%5D%2C%5B%22401305%22%2C%22game%22%5D%2C%5B%2213978%22%2C%22romance+languages%22%5D%2C%5B%2215822%22%2C%22romance+languages%22%5D%2C%5B%221404939%22%2C%22play%22%5D%2C%5B%221915763%22%2C%22sports%22%5D%2C%5B%221812672%22%2C%22basketball%22%5D%2C%5B%221414510%22%2C%22sports%22%5D%2C%5B%22401742%22%2C%22basketball%22%5D%2C%5B%22101173%22%2C%22basketball%22%5D%2C%5B%22411%22%2C%22famous+people+%2F+named+entities%22%5D%2C%5B%221710566%22%2C%22French%22%5D%2C%5B%222303604%22%2C%22sports+%2F+table+tennis+%2F+pool+%22%5D%2C%5B%22502146%22%2C%22famous+people+%2F+named+entities%22%5D%5D&pinnedIds=27_12220_7%2CE_18853_5%2C21_4818_7%2C21_9324_7%2C23_3604_7%2C25_14882_7%2C24_15306_7%2C23_15317_7%2C20_9090_7%2C24_3329_7%2C19_15763_7%2C18_12672_7%2C17_12431_7%2C17_5253_7%2C15_15208_7%2C14_4939_7%2C6_7377_7%2CE_78224_6%2C4_1305_7%2C3_305_7%2C24_2086_7%2C24_3772_7%2C21_16354_7%2C20_1454_7%2C23_2592_7%2C22_10566_7%2C23_2554_7%2C17_10566_6%2C0_4076_6%2C14_14575_6%2C7_11689_6%2C4_1742_5%2C1_1173_5%2CE_7939_4&supernodes=%5B%5B%22game%2Fplay%22%2C%223_305_7%22%2C%224_1305_7%22%2C%226_7377_7%22%2C%2215_15208_7%22%2C%2214_4939_7%22%5D%2C%5B%22French%22%2C%220_4076_6%22%2C%227_11689_6%22%2C%2214_14575_6%22%2C%2217_10566_6%22%5D%2C%5B%22basketball%22%2C%2221_4818_7%22%2C%2218_12672_7%22%5D%2C%5B%22sports%22%2C%2217_12431_7%22%2C%2217_5253_7%22%2C%2221_9324_7%22%2C%2220_9090_7%22%2C%2219_15763_7%22%2C%2223_3604_7%22%2C%2223_15317_7%22%5D%2C%5B%22basketball%22%2C%224_1742_5%22%2C%221_1173_5%22%5D%2C%5B%22French%22%2C%2224_3329_7%22%2C%2221_16354_7%22%2C%2220_1454_7%22%2C%2223_2592_7%22%2C%2223_2554_7%22%2C%2224_2086_7%22%2C%2224_15306_7%22%2C%2225_14882_7%22%2C%2224_3772_7%22%2C%2222_10566_7%22%5D%5D)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_football = \"Tom Brady plays the sport of\"  # The sentence we're getting the football feature from\n",
    "football_feature = Feature(layer=16, pos=6, feature_idx=5039)\n",
    "\n",
    "# a basketball feature from the French example\n",
    "basketball_feature = Feature(layer=18, pos=7, feature_idx=12672)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, we turn the football feature on, and the basketball feature off."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    .token-viz {\n",
       "        font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;\n",
       "        margin-bottom: 10px;\n",
       "        max-width: 700px;\n",
       "    }\n",
       "    .token-viz .header {\n",
       "        font-weight: bold;\n",
       "        font-size: 14px;\n",
       "        margin-bottom: 3px;\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        color: white;\n",
       "        display: inline-block;\n",
       "    }\n",
       "    .token-viz .sentence {\n",
       "        background-color: rgba(200, 200, 200, 0.2);\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        border: 1px solid rgba(100, 100, 100, 0.5);\n",
       "        font-family: monospace;\n",
       "        margin-bottom: 8px;\n",
       "        font-weight: 500;\n",
       "        font-size: 14px;\n",
       "    }\n",
       "    .token-viz table {\n",
       "        width: 100%;\n",
       "        border-collapse: collapse;\n",
       "        margin-bottom: 8px;\n",
       "        font-size: 13px;\n",
       "        table-layout: fixed;\n",
       "    }\n",
       "    .token-viz th {\n",
       "        text-align: left;\n",
       "        padding: 4px 6px;\n",
       "        font-weight: bold;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        background-color: rgba(200, 200, 200, 0.3);\n",
       "    }\n",
       "    .token-viz td {\n",
       "        padding: 3px 6px;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        font-weight: 500;\n",
       "        overflow: hidden;\n",
       "        text-overflow: ellipsis;\n",
       "        white-space: nowrap;\n",
       "    }\n",
       "    .token-viz .token-col {\n",
       "        width: 20%;\n",
       "    }\n",
       "    .token-viz .prob-col {\n",
       "        width: 15%;\n",
       "    }\n",
       "    .token-viz .dist-col {\n",
       "        width: 65%;\n",
       "    }\n",
       "    .token-viz .monospace {\n",
       "        font-family: monospace;\n",
       "    }\n",
       "    .token-viz .bar-container {\n",
       "        display: flex;\n",
       "        align-items: center;\n",
       "    }\n",
       "    .token-viz .bar {\n",
       "        height: 12px;\n",
       "        min-width: 2px;\n",
       "    }\n",
       "    .token-viz .bar-text {\n",
       "        margin-left: 6px;\n",
       "        font-weight: 500;\n",
       "        font-size: 12px;\n",
       "    }\n",
       "    .token-viz .even-row {\n",
       "        background-color: rgba(240, 240, 240, 0.1);\n",
       "    }\n",
       "    .token-viz .odd-row {\n",
       "        background-color: rgba(255, 255, 255, 0.1);\n",
       "    }\n",
       "    </style>\n",
       "    \n",
       "    <div class=\"token-viz\">\n",
       "        <div class=\"header\" style=\"background-color: #555555;\">Input Sentence:</div>\n",
       "        <div class=\"sentence\">Fait: Michael Jordan joue au</div>\n",
       "        \n",
       "        <div>\n",
       "            <div class=\"header\" style=\"background-color: #2471A3;\">Original Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" basket\"> basket</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.566</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 100%;\"></div>\n",
       "                                <span class=\"bar-text\">56.6%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" basketball\"> basketball</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.111</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 19%;\"></div>\n",
       "                                <span class=\"bar-text\">11.1%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" golf\"> golf</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.067</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 11%;\"></div>\n",
       "                                <span class=\"bar-text\">6.7%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" baseball\"> baseball</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.041</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 7%;\"></div>\n",
       "                                <span class=\"bar-text\">4.1%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" football\"> football</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.028</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 4%;\"></div>\n",
       "                                <span class=\"bar-text\">2.8%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "            \n",
       "            <div class=\"header\" style=\"background-color: #27AE60;\">New Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" basket\"> basket</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.531</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 93%;\"></div>\n",
       "                                <span class=\"bar-text\">53.1%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" basketball\"> basketball</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.118</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 20%;\"></div>\n",
       "                                <span class=\"bar-text\">11.8%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" golf\"> golf</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.063</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 11%;\"></div>\n",
       "                                <span class=\"bar-text\">6.3%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" baseball\"> baseball</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.049</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 8%;\"></div>\n",
       "                                <span class=\"bar-text\">4.9%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" football\"> football</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.049</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 8%;\"></div>\n",
       "                                <span class=\"bar-text\">4.9%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "        </div>\n",
       "    </div>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# disable the basketball feature, and enable the football feature\n",
    "_, football_activations = model.get_activations(s_football)\n",
    "interventions = [(*basketball_feature, 0), \n",
    "                 (football_feature.layer, basketball_feature.pos, football_feature.feature_idx, football_activations[football_feature])]\n",
    "new_logits, _ = model.feature_intervention(s_french, interventions)\n",
    "\n",
    "with torch.inference_mode():\n",
    "    original_logits = model(s_french)\n",
    "\n",
    "display_topk_token_predictions(s_french, original_logits, new_logits)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The intervention was successful! However, we made this intervention at the very last position of the sentence, in a relatively late layer. These sorts of features essentially control what the model outputs directly, rather than manipulating the model's internal representations at a more abstract level. Can we get this intervention to work at the Michael Jordan / Tom Brady position? Let's take a basketball and football feature from earlier positions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "football_feature = Feature(layer=0, pos=2, feature_idx=1703)\n",
    "basketball_feature = Feature(layer=1, pos=5, feature_idx=1173)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    .token-viz {\n",
       "        font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;\n",
       "        margin-bottom: 10px;\n",
       "        max-width: 700px;\n",
       "    }\n",
       "    .token-viz .header {\n",
       "        font-weight: bold;\n",
       "        font-size: 14px;\n",
       "        margin-bottom: 3px;\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        color: white;\n",
       "        display: inline-block;\n",
       "    }\n",
       "    .token-viz .sentence {\n",
       "        background-color: rgba(200, 200, 200, 0.2);\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        border: 1px solid rgba(100, 100, 100, 0.5);\n",
       "        font-family: monospace;\n",
       "        margin-bottom: 8px;\n",
       "        font-weight: 500;\n",
       "        font-size: 14px;\n",
       "    }\n",
       "    .token-viz table {\n",
       "        width: 100%;\n",
       "        border-collapse: collapse;\n",
       "        margin-bottom: 8px;\n",
       "        font-size: 13px;\n",
       "        table-layout: fixed;\n",
       "    }\n",
       "    .token-viz th {\n",
       "        text-align: left;\n",
       "        padding: 4px 6px;\n",
       "        font-weight: bold;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        background-color: rgba(200, 200, 200, 0.3);\n",
       "    }\n",
       "    .token-viz td {\n",
       "        padding: 3px 6px;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        font-weight: 500;\n",
       "        overflow: hidden;\n",
       "        text-overflow: ellipsis;\n",
       "        white-space: nowrap;\n",
       "    }\n",
       "    .token-viz .token-col {\n",
       "        width: 20%;\n",
       "    }\n",
       "    .token-viz .prob-col {\n",
       "        width: 15%;\n",
       "    }\n",
       "    .token-viz .dist-col {\n",
       "        width: 65%;\n",
       "    }\n",
       "    .token-viz .monospace {\n",
       "        font-family: monospace;\n",
       "    }\n",
       "    .token-viz .bar-container {\n",
       "        display: flex;\n",
       "        align-items: center;\n",
       "    }\n",
       "    .token-viz .bar {\n",
       "        height: 12px;\n",
       "        min-width: 2px;\n",
       "    }\n",
       "    .token-viz .bar-text {\n",
       "        margin-left: 6px;\n",
       "        font-weight: 500;\n",
       "        font-size: 12px;\n",
       "    }\n",
       "    .token-viz .even-row {\n",
       "        background-color: rgba(240, 240, 240, 0.1);\n",
       "    }\n",
       "    .token-viz .odd-row {\n",
       "        background-color: rgba(255, 255, 255, 0.1);\n",
       "    }\n",
       "    </style>\n",
       "    \n",
       "    <div class=\"token-viz\">\n",
       "        <div class=\"header\" style=\"background-color: #555555;\">Input Sentence:</div>\n",
       "        <div class=\"sentence\">Fait: Michael Jordan joue au</div>\n",
       "        \n",
       "        <div>\n",
       "            <div class=\"header\" style=\"background-color: #2471A3;\">Original Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" basket\"> basket</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.566</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 100%;\"></div>\n",
       "                                <span class=\"bar-text\">56.6%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" basketball\"> basketball</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.111</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 19%;\"></div>\n",
       "                                <span class=\"bar-text\">11.1%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" golf\"> golf</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.067</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 11%;\"></div>\n",
       "                                <span class=\"bar-text\">6.7%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" baseball\"> baseball</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.041</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 7%;\"></div>\n",
       "                                <span class=\"bar-text\">4.1%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" football\"> football</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.028</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 4%;\"></div>\n",
       "                                <span class=\"bar-text\">2.8%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "            \n",
       "            <div class=\"header\" style=\"background-color: #27AE60;\">New Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" football\"> football</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.256</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 45%;\"></div>\n",
       "                                <span class=\"bar-text\">25.6%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" basket\"> basket</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.226</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 39%;\"></div>\n",
       "                                <span class=\"bar-text\">22.6%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" golf\"> golf</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.106</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 18%;\"></div>\n",
       "                                <span class=\"bar-text\">10.6%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" basketball\"> basketball</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.073</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 12%;\"></div>\n",
       "                                <span class=\"bar-text\">7.3%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" foot\"> foot</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.050</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 8%;\"></div>\n",
       "                                <span class=\"bar-text\">5.0%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "        </div>\n",
       "    </div>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# disable the basketball feature, and enable the football feature\n",
    "_, football_activations = model.get_activations(s_football)\n",
    "interventions = [(*basketball_feature, 0), \n",
    "                 (football_feature.layer, basketball_feature.pos, football_feature.feature_idx, 10*football_activations[football_feature])]\n",
    "new_logits, _ = model.feature_intervention(s_french, interventions)\n",
    "\n",
    "with torch.inference_mode():\n",
    "    original_logits = model(s_french)\n",
    "\n",
    "display_topk_token_predictions(s_french, original_logits, new_logits)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The intervention is not quite as effective when done at this earlier position, but it still works!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example: English to dollars"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Conside the following analogy about languages: [`Mexico:Spanish :: US:` → `English`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-english&pinnedIds=17_6498_6%2C11_105416_6%2C9_58293_3%2C10_2257_3%2C9_49585_3%2C5_6113_3%2CE_15506_3%2C10_813_6%2CE_2326_5%2C2_37574_5%2C4_75078_5%2C13_10254_6%2C11_97996_6%2C12_57025_6%2C12_20491_6%2C11_15631_6%2C0_74386_5%2C1_66436_5%2C27_12023_6%2C27_4645_6%2C23_2040_6%2C21_12621_6%2C19_13366_6%2C17_14947_6%2C15_6419_3%2C15_6004_5%2C15_15954_6%2C14_2792_5%2C4_10245_5%2C12_2799_5%2C2_15206_5%2CE_2379_5%2C0_3071_5%2C8_150_3%2C6_1509_3%2C6_5150_3%2C4_12658_3%2C2_13002_3%2C4_3761_3%2C2_15890_3%2C1_7569_3%2CE_51590_3%2C0_1701_1%2CE_40788_1%2C0_1701_3%2C9_11486_3%2C9_16135_3%2C14_1107_3%2C16_10591_6&supernodes=%5B%5B%22language%22%2C%225_6113_3%22%2C%229_58293_3%22%2C%229_49585_3%22%2C%2210_2257_3%22%5D%2C%5B%22languages+%28upweight+English%29%22%2C%2211_97996_6%22%2C%2211_105416_6%22%5D%2C%5B%22US%22%2C%221_66436_5%22%2C%220_74386_5%22%2C%2210_813_6%22%2C%222_37574_5%22%2C%224_75078_5%22%5D%2C%5B%22Output+%5C%22English%5C%22%22%2C%2227_12023_6%22%2C%2227_4645_6%22%5D%2C%5B%22US%22%2C%2212_2799_5%22%2C%220_3071_5%22%2C%222_15206_5%22%2C%224_10245_5%22%2C%2214_2792_5%22%2C%2215_6004_5%22%5D%2C%5B%22Spanish%22%2C%220_1701_1%22%2C%220_1701_3%22%2C%221_7569_3%22%5D%2C%5B%22languages+%2F+demonyms%22%2C%229_16135_3%22%2C%222_15890_3%22%2C%224_3761_3%22%2C%228_150_3%22%2C%2215_6419_3%22%2C%2214_1107_3%22%2C%226_5150_3%22%2C%226_1509_3%22%2C%224_12658_3%22%2C%222_13002_3%22%2C%229_11486_3%22%5D%2C%5B%22%28English%29+language%22%2C%2219_13366_6%22%2C%2216_10591_6%22%2C%2217_14947_6%22%2C%2221_12621_6%22%5D%5D&clickedId=languagedemonyms&pruningThreshold=0.7&densityThreshold=0.99&clerps=%5B%5B%2215_1515954_6%22%2C%22in%22%5D%2C%5B%2223_2302040_6%22%2C%22countries+%2F+Europe%22%5D%2C%5B%2219_1913366_6%22%2C%22language%22%5D%5D), which has the following circuit:\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/safety-research/circuit-tracer/main/demos/img/gemma/spanish-us-gemma.png\" width=\"400\" />\n",
    "\n",
    "Here, there are multiple different interventions we could perform. \n",
    "\n",
    "First, we could intervene on the country in question (US), causing the model to output another country's most-spoken language. \n",
    "\n",
    "Second, we could change the relationship targeted by the analogy; for example, instead of mapping from country → most-spoken language, we could map from country → currency. \n",
    "\n",
    "We'll try to change from country → currency first. We'll do this by selecting some features from the languages/demonyms supernode at the \"Spanish\" position. We'll turn these features off, and turn on some \"currency\" features from another example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_spanish_us = \"Mexico:Spanish :: US:\"\n",
    "\n",
    "language_features = [\n",
    "    Feature(layer=6, pos=3, feature_idx=1509), \n",
    "    Feature(layer=9, pos=3, feature_idx=11486), \n",
    "    Feature(layer=9, pos=3, feature_idx=16135), \n",
    "    Feature(layer=14, pos=3, feature_idx=1107)\n",
    "    ]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The second input, from which we'll get \"currency\" features, is [`Mexico:peso :: US:` → `dollar`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-dollar&clerps=%5B%5B%222413541%22%2C%22numbers%22%5D%2C%5B%221415770%22%2C%22punctuation+in+parallel+constructions%22%5D%2C%5B%221100145%22%2C%22analogies%2Fpairs%22%5D%2C%5B%22611805%22%2C%22pair+punctuation%22%5D%2C%5B%221506419%22%2C%22language%22%5D%2C%5B%221701%22%2C%22Spanish+names%22%5D%2C%5B%22601509%22%2C%22demonyms%22%5D%2C%5B%221515954%22%2C%22punctuation+before+countries%2Fdemonyms%22%5D%2C%5B%221803709%22%2C%22Canada%22%5D%2C%5B%221606532%22%2C%22US%22%5D%2C%5B%221506004%22%2C%22US%22%5D%2C%5B%221402792%22%2C%22US%22%5D%2C%5B%22215206%22%2C%22US%22%5D%2C%5B%223071%22%2C%22US%22%5D%2C%5B%22410245%22%2C%22US%22%5D%2C%5B%222307512%22%2C%22currency%22%5D%2C%5B%222005796%22%2C%22predict+dollar%22%5D%2C%5B%221810914%22%2C%22currency%22%5D%2C%5B%221409246%22%2C%22currency%22%5D%2C%5B%22712199%22%2C%22currency%22%5D%2C%5B%22515068%22%2C%22currency%22%5D%2C%5B%222513762%22%2C%22predict+D*%22%5D%2C%5B%222400289%22%2C%22predict+D*%22%5D%2C%5B%222105652%22%2C%22predict+dollar%22%5D%2C%5B%221810102%22%2C%22exchange+rates%22%5D%2C%5B%221714392%22%2C%22currency%22%5D%2C%5B%221510074%22%2C%22dollar%22%5D%2C%5B%22404678%22%2C%22currency%22%5D%2C%5B%22211100%22%2C%22currency%22%5D%2C%5B%22101214%22%2C%22currency%22%5D%2C%5B%221910746%22%2C%22dollar%22%5D%2C%5B%221906095%22%2C%22money%22%5D%2C%5B%221906215%22%2C%22predict+dollar%22%5D%2C%5B%221904447%22%2C%22US+coins%22%5D%2C%5B%22602102%22%2C%22money%22%5D%2C%5B%22911294%22%2C%22currency%22%5D%2C%5B%22913858%22%2C%22rates+%2F+shares+%2F+costs%22%5D%2C%5B%221710931%22%2C%22units%22%5D%2C%5B%22115093%22%2C%22Spanish%22%5D%2C%5B%22605150%22%2C%22countries%22%5D%2C%5B%222513416%22%2C%22Spanish%22%5D%2C%5B%222413490%22%2C%22Spanish%22%5D%2C%5B%2220_2005796_6%22%2C%22say+dollar%22%5D%2C%5B%2216_1601778_6%22%2C%22dollar%2Fcurrency%22%5D%5D&pinnedIds=27_64091_6%2C27_18289_6%2C14_9246_3%2C7_12199_3%2C5_15068_3%2C20_5796_6%2C18_10914_6%2C25_13762_6%2C21_5652_6%2C23_7512_6%2C24_289_6%2C18_10102_6%2C17_14392_6%2C4_4678_3%2CE_78755_3%2C2_11100_3%2C19_4447_6%2C19_6215_6%2C19_6095_6%2C19_10746_6%2C15_10074_3%2CE_2379_5%2C0_3071_5%2C2_15206_5%2C4_10245_5%2C4_10245_6%2C14_2792_5%2C15_6004_5%2C16_6532_6%2C21_5652_5%2C18_3709_6%2C18_10102_5%2C6_2102_3%2C9_11294_3%2C9_13858_3%2C16_1778_6%2C17_4027_6%2C17_8064_6&supernodes=%5B%5B%22Output+%5C%22dollar%5C%22%22%2C%2227_64091_6%22%2C%2227_18289_6%22%5D%2C%5B%22predict+D*%22%2C%2224_289_6%22%2C%2225_13762_6%22%5D%2C%5B%22dollar%22%2C%2219_6215_6%22%2C%2219_4447_6%22%2C%2219_10746_6%22%2C%2221_5652_6%22%5D%2C%5B%22currency%22%2C%2214_9246_3%22%2C%227_12199_3%22%2C%225_15068_3%22%2C%224_4678_3%22%2C%222_11100_3%22%2C%229_11294_3%22%2C%229_13858_3%22%2C%2215_10074_3%22%2C%226_2102_3%22%5D%2C%5B%22US%22%2C%2214_2792_5%22%2C%2215_6004_5%22%2C%224_10245_5%22%2C%222_15206_5%22%2C%220_3071_5%22%5D%2C%5B%22US%22%2C%224_10245_6%22%2C%2216_6532_6%22%2C%2218_3709_6%22%5D%2C%5B%22currency+exchange%22%2C%2218_10102_6%22%2C%2218_10102_5%22%5D%2C%5B%22currency%22%2C%2217_4027_6%22%2C%2217_8064_6%22%2C%2216_1778_6%22%2C%2217_14392_6%22%2C%2218_10914_6%22%2C%2219_6095_6%22%2C%2223_7512_6%22%5D%5D&pruningThreshold=0.7&densityThreshold=0.99).\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/safety-research/circuit-tracer/main/demos/img/gemma/peso-us-gemma.png\" width=\"400\" />\n",
    "\n",
    "We'll take the \"currency\" feature from this input at the \"peso\" position. We should be able to turn off the previously found language features, and turn on these \"currency\" features, yielding \"dollar\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_peso_us = \"Mexico:peso :: US:\"\n",
    "currency_features = [\n",
    "    Feature(layer=6, pos=3, feature_idx=2102), \n",
    "    Feature(layer=9, pos=3, feature_idx=11294), \n",
    "    Feature(layer=9, pos=3, feature_idx=13858), \n",
    "    Feature(layer=14, pos=3, feature_idx=9246)\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "_, currency_activations = model.get_activations(s_peso_us, sparse=True)\n",
    "\n",
    "# swap the activations of the language features (on) with the currency features (off)\n",
    "interventions = [(*currency_feature, 10*currency_activations[currency_feature]) for currency_feature in currency_features] + [(*language_feature, 0.0) for language_feature in language_features]\n",
    "new_logits, new_activations = model.feature_intervention(s_spanish_us, interventions)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    .token-viz {\n",
       "        font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;\n",
       "        margin-bottom: 10px;\n",
       "        max-width: 700px;\n",
       "    }\n",
       "    .token-viz .header {\n",
       "        font-weight: bold;\n",
       "        font-size: 14px;\n",
       "        margin-bottom: 3px;\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        color: white;\n",
       "        display: inline-block;\n",
       "    }\n",
       "    .token-viz .sentence {\n",
       "        background-color: rgba(200, 200, 200, 0.2);\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        border: 1px solid rgba(100, 100, 100, 0.5);\n",
       "        font-family: monospace;\n",
       "        margin-bottom: 8px;\n",
       "        font-weight: 500;\n",
       "        font-size: 14px;\n",
       "    }\n",
       "    .token-viz table {\n",
       "        width: 100%;\n",
       "        border-collapse: collapse;\n",
       "        margin-bottom: 8px;\n",
       "        font-size: 13px;\n",
       "        table-layout: fixed;\n",
       "    }\n",
       "    .token-viz th {\n",
       "        text-align: left;\n",
       "        padding: 4px 6px;\n",
       "        font-weight: bold;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        background-color: rgba(200, 200, 200, 0.3);\n",
       "    }\n",
       "    .token-viz td {\n",
       "        padding: 3px 6px;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        font-weight: 500;\n",
       "        overflow: hidden;\n",
       "        text-overflow: ellipsis;\n",
       "        white-space: nowrap;\n",
       "    }\n",
       "    .token-viz .token-col {\n",
       "        width: 20%;\n",
       "    }\n",
       "    .token-viz .prob-col {\n",
       "        width: 15%;\n",
       "    }\n",
       "    .token-viz .dist-col {\n",
       "        width: 65%;\n",
       "    }\n",
       "    .token-viz .monospace {\n",
       "        font-family: monospace;\n",
       "    }\n",
       "    .token-viz .bar-container {\n",
       "        display: flex;\n",
       "        align-items: center;\n",
       "    }\n",
       "    .token-viz .bar {\n",
       "        height: 12px;\n",
       "        min-width: 2px;\n",
       "    }\n",
       "    .token-viz .bar-text {\n",
       "        margin-left: 6px;\n",
       "        font-weight: 500;\n",
       "        font-size: 12px;\n",
       "    }\n",
       "    .token-viz .even-row {\n",
       "        background-color: rgba(240, 240, 240, 0.1);\n",
       "    }\n",
       "    .token-viz .odd-row {\n",
       "        background-color: rgba(255, 255, 255, 0.1);\n",
       "    }\n",
       "    </style>\n",
       "    \n",
       "    <div class=\"token-viz\">\n",
       "        <div class=\"header\" style=\"background-color: #555555;\">Input Sentence:</div>\n",
       "        <div class=\"sentence\">Mexico:Spanish :: US:</div>\n",
       "        \n",
       "        <div>\n",
       "            <div class=\"header\" style=\"background-color: #2471A3;\">Original Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"English\">English</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.146</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 100%;\"></div>\n",
       "                                <span class=\"bar-text\">14.6%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" English\"> English</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.078</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 53%;\"></div>\n",
       "                                <span class=\"bar-text\">7.8%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"French\">French</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.069</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 47%;\"></div>\n",
       "                                <span class=\"bar-text\">6.9%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"British\">British</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.061</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 41%;\"></div>\n",
       "                                <span class=\"bar-text\">6.1%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"Mexican\">Mexican</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.042</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 28%;\"></div>\n",
       "                                <span class=\"bar-text\">4.2%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "            \n",
       "            <div class=\"header\" style=\"background-color: #27AE60;\">New Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"Dollar\">Dollar</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.080</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 55%;\"></div>\n",
       "                                <span class=\"bar-text\">8.0%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"English\">English</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.048</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 33%;\"></div>\n",
       "                                <span class=\"bar-text\">4.8%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" Dollar\"> Dollar</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.043</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 29%;\"></div>\n",
       "                                <span class=\"bar-text\">4.3%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"\n",
       "\n",
       "\">\n",
       "\n",
       "</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.038</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 25%;\"></div>\n",
       "                                <span class=\"bar-text\">3.8%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"Canada\">Canada</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.033</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 22%;\"></div>\n",
       "                                <span class=\"bar-text\">3.3%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "        </div>\n",
       "    </div>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "with torch.inference_mode():\n",
    "    original_logits = model(s_spanish_us)\n",
    "\n",
    "display_topk_token_predictions(s_spanish_us, original_logits, new_logits)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It works! Although we can see that English is still a rather probable output."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### English to dollars (: position)\n",
    "\n",
    "Consider the same analogy, [`Mexico:Spanish :: US:` → `English`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-english&pinnedIds=17_6498_6%2C11_105416_6%2C9_58293_3%2C10_2257_3%2C9_49585_3%2C5_6113_3%2CE_15506_3%2C10_813_6%2CE_2326_5%2C2_37574_5%2C4_75078_5%2C13_10254_6%2C11_97996_6%2C12_57025_6%2C12_20491_6%2C11_15631_6%2C0_74386_5%2C1_66436_5%2C27_12023_6%2C27_4645_6%2C23_2040_6%2C21_12621_6%2C19_13366_6%2C17_14947_6%2C15_6419_3%2C15_6004_5%2C15_15954_6%2C14_2792_5%2C4_10245_5%2C12_2799_5%2C2_15206_5%2CE_2379_5%2C0_3071_5%2C8_150_3%2C6_1509_3%2C6_5150_3%2C4_12658_3%2C2_13002_3%2C4_3761_3%2C2_15890_3%2C1_7569_3%2CE_51590_3%2C0_1701_1%2CE_40788_1%2C0_1701_3%2C9_11486_3%2C9_16135_3%2C14_1107_3%2C16_10591_6&supernodes=%5B%5B%22language%22%2C%225_6113_3%22%2C%229_58293_3%22%2C%229_49585_3%22%2C%2210_2257_3%22%5D%2C%5B%22languages+%28upweight+English%29%22%2C%2211_97996_6%22%2C%2211_105416_6%22%5D%2C%5B%22US%22%2C%221_66436_5%22%2C%220_74386_5%22%2C%2210_813_6%22%2C%222_37574_5%22%2C%224_75078_5%22%5D%2C%5B%22Output+%5C%22English%5C%22%22%2C%2227_12023_6%22%2C%2227_4645_6%22%5D%2C%5B%22US%22%2C%2212_2799_5%22%2C%220_3071_5%22%2C%222_15206_5%22%2C%224_10245_5%22%2C%2214_2792_5%22%2C%2215_6004_5%22%5D%2C%5B%22Spanish%22%2C%220_1701_1%22%2C%220_1701_3%22%2C%221_7569_3%22%5D%2C%5B%22languages+%2F+demonyms%22%2C%229_16135_3%22%2C%222_15890_3%22%2C%224_3761_3%22%2C%228_150_3%22%2C%2215_6419_3%22%2C%2214_1107_3%22%2C%226_5150_3%22%2C%226_1509_3%22%2C%224_12658_3%22%2C%222_13002_3%22%2C%229_11486_3%22%5D%2C%5B%22%28English%29+language%22%2C%2219_13366_6%22%2C%2216_10591_6%22%2C%2217_14947_6%22%2C%2221_12621_6%22%5D%5D&clickedId=languagedemonyms&pruningThreshold=0.7&densityThreshold=0.99&clerps=%5B%5B%2215_1515954_6%22%2C%22in%22%5D%2C%5B%2223_2302040_6%22%2C%22countries+%2F+Europe%22%5D%2C%5B%2219_1913366_6%22%2C%22language%22%5D%5D). \n",
    "\n",
    "Previously, we changed the analogy by manipulating abstract \"language\" / \"currency\" features at the Spanish/peso position.  What if we do this again, using features at the final position (corresponding to the \":\" token), as opposed to the earlier position (Spanish/peso)? Is this intervention equally as effective, whether we change earlier or later features?\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/safety-research/circuit-tracer/main/demos/img/gemma/spanish-us-gemma.png\" width=\"400\" />\n",
    "\n",
    "We'll select some features from the (English) language supernode at the \":\" position."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_spanish_us = \"Mexico:Spanish :: US:\"\n",
    "language_features = [Feature(layer=16, pos=6, feature_idx=10591), \n",
    "                     Feature(layer=17, pos=6, feature_idx=14947)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As before, the second input is [`Mexico:peso :: US:` → `dollar`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-dollar&clerps=%5B%5B%222413541%22%2C%22numbers%22%5D%2C%5B%221415770%22%2C%22punctuation+in+parallel+constructions%22%5D%2C%5B%221100145%22%2C%22analogies%2Fpairs%22%5D%2C%5B%22611805%22%2C%22pair+punctuation%22%5D%2C%5B%221506419%22%2C%22language%22%5D%2C%5B%221701%22%2C%22Spanish+names%22%5D%2C%5B%22601509%22%2C%22demonyms%22%5D%2C%5B%221515954%22%2C%22punctuation+before+countries%2Fdemonyms%22%5D%2C%5B%221803709%22%2C%22Canada%22%5D%2C%5B%221606532%22%2C%22US%22%5D%2C%5B%221506004%22%2C%22US%22%5D%2C%5B%221402792%22%2C%22US%22%5D%2C%5B%22215206%22%2C%22US%22%5D%2C%5B%223071%22%2C%22US%22%5D%2C%5B%22410245%22%2C%22US%22%5D%2C%5B%222307512%22%2C%22currency%22%5D%2C%5B%222005796%22%2C%22predict+dollar%22%5D%2C%5B%221810914%22%2C%22currency%22%5D%2C%5B%221409246%22%2C%22currency%22%5D%2C%5B%22712199%22%2C%22currency%22%5D%2C%5B%22515068%22%2C%22currency%22%5D%2C%5B%222513762%22%2C%22predict+D*%22%5D%2C%5B%222400289%22%2C%22predict+D*%22%5D%2C%5B%222105652%22%2C%22predict+dollar%22%5D%2C%5B%221810102%22%2C%22exchange+rates%22%5D%2C%5B%221714392%22%2C%22currency%22%5D%2C%5B%221510074%22%2C%22dollar%22%5D%2C%5B%22404678%22%2C%22currency%22%5D%2C%5B%22211100%22%2C%22currency%22%5D%2C%5B%22101214%22%2C%22currency%22%5D%2C%5B%221910746%22%2C%22dollar%22%5D%2C%5B%221906095%22%2C%22money%22%5D%2C%5B%221906215%22%2C%22predict+dollar%22%5D%2C%5B%221904447%22%2C%22US+coins%22%5D%2C%5B%22602102%22%2C%22money%22%5D%2C%5B%22911294%22%2C%22currency%22%5D%2C%5B%22913858%22%2C%22rates+%2F+shares+%2F+costs%22%5D%2C%5B%221710931%22%2C%22units%22%5D%2C%5B%22115093%22%2C%22Spanish%22%5D%2C%5B%22605150%22%2C%22countries%22%5D%2C%5B%222513416%22%2C%22Spanish%22%5D%2C%5B%222413490%22%2C%22Spanish%22%5D%2C%5B%2220_2005796_6%22%2C%22say+dollar%22%5D%2C%5B%2216_1601778_6%22%2C%22dollar%2Fcurrency%22%5D%5D&pinnedIds=27_64091_6%2C27_18289_6%2C14_9246_3%2C7_12199_3%2C5_15068_3%2C20_5796_6%2C18_10914_6%2C25_13762_6%2C21_5652_6%2C23_7512_6%2C24_289_6%2C18_10102_6%2C17_14392_6%2C4_4678_3%2CE_78755_3%2C2_11100_3%2C19_4447_6%2C19_6215_6%2C19_6095_6%2C19_10746_6%2C15_10074_3%2CE_2379_5%2C0_3071_5%2C2_15206_5%2C4_10245_5%2C4_10245_6%2C14_2792_5%2C15_6004_5%2C16_6532_6%2C21_5652_5%2C18_3709_6%2C18_10102_5%2C6_2102_3%2C9_11294_3%2C9_13858_3%2C16_1778_6%2C17_4027_6%2C17_8064_6&supernodes=%5B%5B%22Output+%5C%22dollar%5C%22%22%2C%2227_64091_6%22%2C%2227_18289_6%22%5D%2C%5B%22predict+D*%22%2C%2224_289_6%22%2C%2225_13762_6%22%5D%2C%5B%22dollar%22%2C%2219_6215_6%22%2C%2219_4447_6%22%2C%2219_10746_6%22%2C%2221_5652_6%22%5D%2C%5B%22currency%22%2C%2214_9246_3%22%2C%227_12199_3%22%2C%225_15068_3%22%2C%224_4678_3%22%2C%222_11100_3%22%2C%229_11294_3%22%2C%229_13858_3%22%2C%2215_10074_3%22%2C%226_2102_3%22%5D%2C%5B%22US%22%2C%2214_2792_5%22%2C%2215_6004_5%22%2C%224_10245_5%22%2C%222_15206_5%22%2C%220_3071_5%22%5D%2C%5B%22US%22%2C%224_10245_6%22%2C%2216_6532_6%22%2C%2218_3709_6%22%5D%2C%5B%22currency+exchange%22%2C%2218_10102_6%22%2C%2218_10102_5%22%5D%2C%5B%22currency%22%2C%2217_4027_6%22%2C%2217_8064_6%22%2C%2216_1778_6%22%2C%2217_14392_6%22%2C%2218_10914_6%22%2C%2219_6095_6%22%2C%2223_7512_6%22%5D%5D&pruningThreshold=0.7&densityThreshold=0.99).\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/safety-research/circuit-tracer/main/demos/img/gemma/peso-us-gemma.png\" width=\"400\" />\n",
    "\n",
    "We'll take the currency feature from the \"currency\" supernode at the \":\" position, in the middle of the image, instead. Will the intervention still be effective?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_peso_us = \"Mexico:peso :: US:\"\n",
    "currency_features = [Feature(layer=16, pos=6, feature_idx=1778), \n",
    "                     Feature(layer=17, pos=6, feature_idx=14392), \n",
    "                     Feature(layer=17, pos=6, feature_idx=4027), \n",
    "                     Feature(layer=17, pos=6, feature_idx=8064)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "_, currency_activations = model.get_activations(s_peso_us, sparse=True)\n",
    "\n",
    "# swap the activations of the language features (on) with the currency features (off)\n",
    "interventions = [(*currency_feature, 10*currency_activations[currency_feature]) for currency_feature in currency_features] + \\\n",
    "[(*language_feature, 0) for language_feature in language_features]\n",
    "new_logits, _ = model.feature_intervention(s_spanish_us, interventions)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    .token-viz {\n",
       "        font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;\n",
       "        margin-bottom: 10px;\n",
       "        max-width: 700px;\n",
       "    }\n",
       "    .token-viz .header {\n",
       "        font-weight: bold;\n",
       "        font-size: 14px;\n",
       "        margin-bottom: 3px;\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        color: white;\n",
       "        display: inline-block;\n",
       "    }\n",
       "    .token-viz .sentence {\n",
       "        background-color: rgba(200, 200, 200, 0.2);\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        border: 1px solid rgba(100, 100, 100, 0.5);\n",
       "        font-family: monospace;\n",
       "        margin-bottom: 8px;\n",
       "        font-weight: 500;\n",
       "        font-size: 14px;\n",
       "    }\n",
       "    .token-viz table {\n",
       "        width: 100%;\n",
       "        border-collapse: collapse;\n",
       "        margin-bottom: 8px;\n",
       "        font-size: 13px;\n",
       "        table-layout: fixed;\n",
       "    }\n",
       "    .token-viz th {\n",
       "        text-align: left;\n",
       "        padding: 4px 6px;\n",
       "        font-weight: bold;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        background-color: rgba(200, 200, 200, 0.3);\n",
       "    }\n",
       "    .token-viz td {\n",
       "        padding: 3px 6px;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        font-weight: 500;\n",
       "        overflow: hidden;\n",
       "        text-overflow: ellipsis;\n",
       "        white-space: nowrap;\n",
       "    }\n",
       "    .token-viz .token-col {\n",
       "        width: 20%;\n",
       "    }\n",
       "    .token-viz .prob-col {\n",
       "        width: 15%;\n",
       "    }\n",
       "    .token-viz .dist-col {\n",
       "        width: 65%;\n",
       "    }\n",
       "    .token-viz .monospace {\n",
       "        font-family: monospace;\n",
       "    }\n",
       "    .token-viz .bar-container {\n",
       "        display: flex;\n",
       "        align-items: center;\n",
       "    }\n",
       "    .token-viz .bar {\n",
       "        height: 12px;\n",
       "        min-width: 2px;\n",
       "    }\n",
       "    .token-viz .bar-text {\n",
       "        margin-left: 6px;\n",
       "        font-weight: 500;\n",
       "        font-size: 12px;\n",
       "    }\n",
       "    .token-viz .even-row {\n",
       "        background-color: rgba(240, 240, 240, 0.1);\n",
       "    }\n",
       "    .token-viz .odd-row {\n",
       "        background-color: rgba(255, 255, 255, 0.1);\n",
       "    }\n",
       "    </style>\n",
       "    \n",
       "    <div class=\"token-viz\">\n",
       "        <div class=\"header\" style=\"background-color: #555555;\">Input Sentence:</div>\n",
       "        <div class=\"sentence\">Mexico:Spanish :: US:</div>\n",
       "        \n",
       "        <div>\n",
       "            <div class=\"header\" style=\"background-color: #2471A3;\">Original Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"English\">English</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.146</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 100%;\"></div>\n",
       "                                <span class=\"bar-text\">14.6%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\" English\"> English</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.078</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 53%;\"></div>\n",
       "                                <span class=\"bar-text\">7.8%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"French\">French</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.069</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 47%;\"></div>\n",
       "                                <span class=\"bar-text\">6.9%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"British\">British</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.061</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 41%;\"></div>\n",
       "                                <span class=\"bar-text\">6.1%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"Mexican\">Mexican</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.042</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 28%;\"></div>\n",
       "                                <span class=\"bar-text\">4.2%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "            \n",
       "            <div class=\"header\" style=\"background-color: #27AE60;\">New Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"British\">British</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.115</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 78%;\"></div>\n",
       "                                <span class=\"bar-text\">11.5%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"\n",
       "\n",
       "\">\n",
       "\n",
       "</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.070</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 47%;\"></div>\n",
       "                                <span class=\"bar-text\">7.0%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"\n",
       "\">\n",
       "</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.037</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 25%;\"></div>\n",
       "                                <span class=\"bar-text\">3.7%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"Mexico\">Mexico</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.029</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 19%;\"></div>\n",
       "                                <span class=\"bar-text\">2.9%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"English\">English</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.029</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 19%;\"></div>\n",
       "                                <span class=\"bar-text\">2.9%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "        </div>\n",
       "    </div>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "with torch.inference_mode():\n",
    "    original_logits = model(s_spanish_us)\n",
    "\n",
    "display_topk_token_predictions(s_spanish_us, original_logits, new_logits)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example: Addition"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Can we intervene when our model is doing addition, to change the answer? We'll try with two single-digit addition problems (Gemma uses single-digit tokens). We'll take the following equation: [`2 + 1 =` → `3`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-addition2&clerps=%5B%5B%222510077%22%2C%22three%22%5D%2C%5B%222411752%22%2C%22three%22%5D%2C%5B%222411880%22%2C%22three%22%5D%2C%5B%222508798%22%2C%221%2F2%2F3%22%5D%2C%5B%222414176%22%2C%221%2F2%2F3%22%5D%2C%5B%222309832%22%2C%221%2F2%2F3%22%5D%2C%5B%222413541%22%2C%22numbers%22%5D%2C%5B%222108824%22%2C%22numbers%22%5D%2C%5B%222004320%22%2C%22numbers%22%5D%2C%5B%222308901%22%2C%22numbers%22%5D%2C%5B%222004597%22%2C%22three%22%5D%2C%5B%221813696%22%2C%22two%22%5D%2C%5B%221605482%22%2C%226%22%5D%2C%5B%221505881%22%2C%22numbers%22%5D%2C%5B%221411498%22%2C%22numbers+in+equations%22%5D%2C%5B%221304198%22%2C%22numbers+in+equations%22%5D%2C%5B%223772%22%2C%22numbers+in+dates%22%5D%2C%5B%221405044%22%2C%22numbers%22%5D%2C%5B%221310982%22%2C%22numbers%22%5D%2C%5B%22509048%22%2C%22numbers+in+equations%22%5D%2C%5B%22409653%22%2C%22%3D%22%5D%2C%5B%22407037%22%2C%22%3D%22%5D%5D&pinnedIds=27_235304_6%2C25_10077_6%2C24_14176_6%2C24_11880_6%2C25_8798_6%2C24_13541_6%2C23_9832_6%2C24_11752_6%2C24_11880_5%2C21_8824_6%2C20_4320_6%2C20_4597_6%2C23_8901_6%2C18_13696_1%2C16_5482_1%2C15_5881_6%2C14_5044_6%2C14_11498_6%2C13_4198_6%2C0_3772_6%2CE_235248_6%2CE_589_5%2CE_963_2%2C13_10982_6%2C5_9048_5%2C4_9653_5%2C4_7037_5&clickedId=18_13696_1&supernodes=%5B%5B%22%3D%22%2C%224_9653_5%22%2C%224_7037_5%22%5D%2C%5B%22three%22%2C%2224_11880_6%22%2C%2224_11752_6%22%2C%2225_10077_6%22%2C%2220_4597_6%22%2C%2224_11880_5%22%5D%2C%5B%221%2F2%2F3%22%2C%2225_8798_6%22%2C%2224_14176_6%22%2C%2223_9832_6%22%5D%2C%5B%22numbers+in+equations%22%2C%2213_4198_6%22%2C%2214_11498_6%22%2C%225_9048_5%22%5D%2C%5B%22numbers%22%2C%220_3772_6%22%2C%2213_10982_6%22%2C%2214_5044_6%22%2C%2215_5881_6%22%2C%2220_4320_6%22%2C%2221_8824_6%22%2C%2223_8901_6%22%2C%2224_13541_6%22%5D%5D&pruningThreshold=0.7&densityThreshold=0.99)\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/safety-research/circuit-tracer/main/demos/img/gemma/213.png\" width=\"300\" />"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "s3 = \"2 + 1 = \"\n",
    "\n",
    "feature3 = Feature(layer=25, pos=6, feature_idx=10077)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Can we change its answer to the solution to this equation?: [`3 + 5 =` → `8`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-addition&clerps=%5B%5D&clickedId=15_15323_6&pinnedIds=27_235321_6%2C24_14176_6%2C25_14682_6%2C24_2998_6%2C23_3553_6%2C23_7436_6%2C21_6231_6%2C20_10440_6%2C18_14883_6%2C19_1887_6%2C22_1234_6%2C20_14337_6%2C18_11535_6%2C17_15809_6%2C15_15323_6%2C15_15686_6%2C14_9687_6%2C13_1364_6%2C12_6768_6%2C8_11121_5%2C4_9539_5%2C4_7037_5%2C2_3014_5%2CE_589_5%2CE_963_2%2CE_235304_4%2CE_235308_1&supernodes=%5B%5B%22%3D%22%2C%224_7037_5%22%2C%224_9539_5%22%2C%222_3014_5%22%2C%228_11121_5%22%5D%2C%5B%22eight%22%2C%2222_1234_6%22%2C%2223_7436_6%22%2C%2224_2998_6%22%2C%2225_14682_6%22%5D%2C%5B%22space%20after%20%3D%20%22%2C%2212_6768_6%22%2C%2213_1364_6%22%2C%2214_9687_6%22%5D%2C%5B%22numbers%22%2C%2215_15686_6%22%2C%2217_15809_6%22%2C%2215_15323_6%22%2C%2218_11535_6%22%2C%2219_1887_6%22%2C%2218_14883_6%22%2C%2220_10440_6%22%5D%2C%5B%22other%20specific-ish%20numbers%22%2C%2223_3553_6%22%2C%2224_14176_6%22%2C%2221_6231_6%22%2C%2220_14337_6%22%5D%5D) \n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/safety-research/circuit-tracer/main/demos/img/gemma/358.png\" width=\"400\" />"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "s8 = \"3 + 5 = \"\n",
    "feature8 = Feature(layer=25, pos=6, feature_idx=14682)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "_, s8_activations = model.get_activations(s8, sparse=True)\n",
    "interventions = [(*feature8, s8_activations[feature8]), (*feature3, 0)]\n",
    "new_logits, _ = model.feature_intervention(s8, interventions)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    .token-viz {\n",
       "        font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;\n",
       "        margin-bottom: 10px;\n",
       "        max-width: 700px;\n",
       "    }\n",
       "    .token-viz .header {\n",
       "        font-weight: bold;\n",
       "        font-size: 14px;\n",
       "        margin-bottom: 3px;\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        color: white;\n",
       "        display: inline-block;\n",
       "    }\n",
       "    .token-viz .sentence {\n",
       "        background-color: rgba(200, 200, 200, 0.2);\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        border: 1px solid rgba(100, 100, 100, 0.5);\n",
       "        font-family: monospace;\n",
       "        margin-bottom: 8px;\n",
       "        font-weight: 500;\n",
       "        font-size: 14px;\n",
       "    }\n",
       "    .token-viz table {\n",
       "        width: 100%;\n",
       "        border-collapse: collapse;\n",
       "        margin-bottom: 8px;\n",
       "        font-size: 13px;\n",
       "        table-layout: fixed;\n",
       "    }\n",
       "    .token-viz th {\n",
       "        text-align: left;\n",
       "        padding: 4px 6px;\n",
       "        font-weight: bold;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        background-color: rgba(200, 200, 200, 0.3);\n",
       "    }\n",
       "    .token-viz td {\n",
       "        padding: 3px 6px;\n",
       "        border: 1px solid rgba(150, 150, 150, 0.5);\n",
       "        font-weight: 500;\n",
       "        overflow: hidden;\n",
       "        text-overflow: ellipsis;\n",
       "        white-space: nowrap;\n",
       "    }\n",
       "    .token-viz .token-col {\n",
       "        width: 20%;\n",
       "    }\n",
       "    .token-viz .prob-col {\n",
       "        width: 15%;\n",
       "    }\n",
       "    .token-viz .dist-col {\n",
       "        width: 65%;\n",
       "    }\n",
       "    .token-viz .monospace {\n",
       "        font-family: monospace;\n",
       "    }\n",
       "    .token-viz .bar-container {\n",
       "        display: flex;\n",
       "        align-items: center;\n",
       "    }\n",
       "    .token-viz .bar {\n",
       "        height: 12px;\n",
       "        min-width: 2px;\n",
       "    }\n",
       "    .token-viz .bar-text {\n",
       "        margin-left: 6px;\n",
       "        font-weight: 500;\n",
       "        font-size: 12px;\n",
       "    }\n",
       "    .token-viz .even-row {\n",
       "        background-color: rgba(240, 240, 240, 0.1);\n",
       "    }\n",
       "    .token-viz .odd-row {\n",
       "        background-color: rgba(255, 255, 255, 0.1);\n",
       "    }\n",
       "    </style>\n",
       "    \n",
       "    <div class=\"token-viz\">\n",
       "        <div class=\"header\" style=\"background-color: #555555;\">Input Sentence:</div>\n",
       "        <div class=\"sentence\">2 + 1 = </div>\n",
       "        \n",
       "        <div>\n",
       "            <div class=\"header\" style=\"background-color: #2471A3;\">Original Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"3\">3</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.773</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 100%;\"></div>\n",
       "                                <span class=\"bar-text\">77.3%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"2\">2</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.072</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 9%;\"></div>\n",
       "                                <span class=\"bar-text\">7.2%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"1\">1</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.056</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 7%;\"></div>\n",
       "                                <span class=\"bar-text\">5.6%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"4\">4</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.026</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 3%;\"></div>\n",
       "                                <span class=\"bar-text\">2.6%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"5\">5</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.018</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #2471A3; width: 2%;\"></div>\n",
       "                                <span class=\"bar-text\">1.8%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "            \n",
       "            <div class=\"header\" style=\"background-color: #27AE60;\">New Top 5 Tokens</div>\n",
       "            <table>\n",
       "                <thead>\n",
       "                    <tr>\n",
       "                        <th class=\"token-col\">Token</th>\n",
       "                        <th class=\"prob-col\" style=\"text-align: right;\">Probability</th>\n",
       "                        <th class=\"dist-col\">Distribution</th>\n",
       "                    </tr>\n",
       "                </thead>\n",
       "                <tbody>\n",
       "    \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"8\">8</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.707</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 91%;\"></div>\n",
       "                                <span class=\"bar-text\">70.7%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"1\">1</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.058</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 7%;\"></div>\n",
       "                                <span class=\"bar-text\">5.8%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"3\">3</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.051</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 6%;\"></div>\n",
       "                                <span class=\"bar-text\">5.1%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"odd-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"2\">2</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.040</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 5%;\"></div>\n",
       "                                <span class=\"bar-text\">4.0%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                    <tr class=\"even-row\">\n",
       "                        <td class=\"monospace token-col\" title=\"7\">7</td>\n",
       "                        <td class=\"prob-col\" style=\"text-align: right;\">0.035</td>\n",
       "                        <td class=\"dist-col\">\n",
       "                            <div class=\"bar-container\">\n",
       "                                <div class=\"bar\" style=\"background-color: #27AE60; width: 4%;\"></div>\n",
       "                                <span class=\"bar-text\">3.5%</span>\n",
       "                            </div>\n",
       "                        </td>\n",
       "                    </tr>\n",
       "        \n",
       "                </tbody>\n",
       "            </table>\n",
       "        </div>\n",
       "    </div>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "with torch.inference_mode():\n",
    "    original_logits = model(s3)\n",
    "\n",
    "display_topk_token_predictions(s3, original_logits, new_logits)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This intervention works! However, it's rather surface-level: we were able to change the number that the model outputs, but not e.g. features corresponding to the addends. Unfortunately, there are no such clear features in the graph to intervene on."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example: French` → `Spanish Generation\n",
    "\n",
    "Let's take the same two sentences as from earlier. Can we change the language of a multi-token generation, as opposed to a single-token prediction?\n",
    "\n",
    "Here's the French sentence again: [`Fait: Michael Jordan joue au` → `basket`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-basket&clickedId=17_10566_2&clerps=%5B%5B%222308855%22%2C%22basketball%22%5D%2C%5B%222502222%22%2C%22Spanish+articles%22%5D%2C%5B%222513416%22%2C%22Spanish%22%5D%2C%5B%222104818%22%2C%22basketball%22%5D%2C%5B%222109324%22%2C%22sports%22%5D%2C%5B%222009090%22%2C%22basketball%22%5D%2C%5B%221712431%22%2C%22sports%22%5D%2C%5B%221515208%22%2C%22play%22%5D%2C%5B%22401305%22%2C%22game%22%5D%2C%5B%2213978%22%2C%22romance+languages%22%5D%2C%5B%2215822%22%2C%22romance+languages%22%5D%2C%5B%221404939%22%2C%22play%22%5D%2C%5B%221915763%22%2C%22sports%22%5D%2C%5B%221812672%22%2C%22basketball%22%5D%2C%5B%221414510%22%2C%22sports%22%5D%2C%5B%22401742%22%2C%22basketball%22%5D%2C%5B%22101173%22%2C%22basketball%22%5D%2C%5B%22411%22%2C%22famous+people+%2F+named+entities%22%5D%2C%5B%221710566%22%2C%22French%22%5D%5D&pinnedIds=27_12220_7%2CE_18853_5%2C21_4818_7%2C21_9324_7%2C23_3604_7%2C25_14882_7%2C24_15306_7%2C23_15317_7%2C20_9090_7%2C24_3329_7%2C19_15763_7%2C18_12672_7%2C17_12431_7%2C17_5253_7%2C15_15208_7%2C14_4939_7%2C6_7377_7%2CE_78224_6%2C4_1305_7%2C3_305_7%2C24_2086_7%2C24_3772_7%2C21_16354_7%2C20_1454_7%2C23_2592_7%2C22_10566_7%2C23_2554_7%2C17_10566_6%2C0_4076_6%2C14_14575_6%2C7_11689_6%2C4_1742_5%2C1_1173_5%2CE_7939_4&supernodes=%5B%5B%22game%2Fplay%22%2C%223_305_7%22%2C%224_1305_7%22%2C%226_7377_7%22%2C%2215_15208_7%22%2C%2214_4939_7%22%5D%2C%5B%22French%22%2C%220_4076_6%22%2C%227_11689_6%22%2C%2214_14575_6%22%2C%2217_10566_6%22%5D%2C%5B%22basketball%22%2C%2221_4818_7%22%2C%2218_12672_7%22%5D%2C%5B%22sports%22%2C%2217_12431_7%22%2C%2217_5253_7%22%2C%2221_9324_7%22%2C%2220_9090_7%22%2C%2219_15763_7%22%2C%2223_3604_7%22%2C%2223_15317_7%22%5D%2C%5B%22basketball%22%2C%224_1742_5%22%2C%221_1173_5%22%5D%2C%5B%22French%22%2C%2224_3329_7%22%2C%2221_16354_7%22%2C%2220_1454_7%22%2C%2223_2592_7%22%2C%2223_2554_7%22%2C%2224_2086_7%22%2C%2224_15306_7%22%2C%2225_14882_7%22%2C%2224_3772_7%22%2C%2222_10566_7%22%5D%5D).\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/safety-research/circuit-tracer/main/demos/img/gemma/mj-basketball-fr.png\" width=\"400\" />\n",
    "\n",
    "Once more, we'll extract just one French feature, over the final two tokens."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_french = \"Fait: Michael Jordan joue au\"  # The sentence we're intervening on\n",
    "french_feature = Feature(layer=20, pos=slice(6,8), feature_idx=1454)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once more, we'll do the same for the Spanish sentence:\n",
    "[`Hecho: Michael Jordan juega al` → `baloncesto`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-michael-jordan-es&clerps=%5B%5B%222308855%22%2C%22basketball%22%5D%2C%5B%222502222%22%2C%22Spanish+articles%22%5D%2C%5B%222513416%22%2C%22Spanish%22%5D%2C%5B%222509334%22%2C%22Spanish%22%5D%2C%5B%222413490%22%2C%22Spanish%22%5D%2C%5B%222403018%22%2C%22Spanish%22%5D%2C%5B%222407980%22%2C%22Spanish+articles%22%5D%2C%5B%222511463%22%2C%22Spanish%22%5D%2C%5B%222104818%22%2C%22basketball%22%5D%2C%5B%222109324%22%2C%22sports%22%5D%2C%5B%222009090%22%2C%22basketball%22%5D%2C%5B%221712431%22%2C%22sports%22%5D%2C%5B%221515208%22%2C%22play%22%5D%2C%5B%22401305%22%2C%22game%22%5D%2C%5B%22109339%22%2C%22a%2Fal+in+Spanish%22%5D%2C%5B%2213978%22%2C%22romance+languages%22%5D%2C%5B%2215822%22%2C%22romance+languages%22%5D%2C%5B%221404939%22%2C%22play%22%5D%2C%5B%221915763%22%2C%22sports%22%5D%2C%5B%221812672%22%2C%22basketball%22%5D%2C%5B%221414510%22%2C%22sports%22%5D%2C%5B%22401742%22%2C%22basketball%22%5D%2C%5B%22101173%22%2C%22basketball%22%5D%2C%5B%22411%22%2C%22famous+people+%2F+named+entities%22%5D%2C%5B%222000341%22%2C%22Spanish%22%5D%5D&pinnedIds=27_143831_6%2C25_13416_6%2C24_3018_6%2C25_9334_6%2C24_13490_6%2C25_2222_6%2C24_7980_6%2C25_11463_6%2C21_9324_6%2C21_4818_6%2C23_8855_6%2C20_9090_6%2C17_12431_6%2C15_15208_6%2C14_4939_6%2C4_1305_6%2C1_9339_6%2CE_113501_5%2C0_13978_5%2C0_15822_5%2CE_717_6%2C19_15763_6%2C18_12672_6%2C4_1742_4%2C14_14510_4%2C1_1173_4%2CE_18853_4%2CE_7939_3%2C0_411_4%2C20_341_6&supernodes=%5B%5B%22basketball%22%2C%2220_9090_6%22%2C%2218_12672_6%22%2C%2221_4818_6%22%2C%2223_8855_6%22%5D%2C%5B%22sports%22%2C%2217_12431_6%22%2C%2219_15763_6%22%2C%2221_9324_6%22%5D%2C%5B%22play%22%2C%224_1305_6%22%2C%2214_4939_6%22%2C%2215_15208_6%22%5D%2C%5B%22basketball%22%2C%224_1742_4%22%2C%221_1173_4%22%5D%2C%5B%22romance+language%22%2C%221_9339_6%22%2C%220_15822_5%22%2C%220_13978_5%22%5D%2C%5B%22Spanish%22%2C%2225_9334_6%22%2C%2225_13416_6%22%2C%2224_13490_6%22%2C%2224_7980_6%22%2C%2224_3018_6%22%2C%2225_2222_6%22%2C%2225_11463_6%22%2C%2220_341_6%22%5D%5D&clickedId=20_341_6)\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/safety-research/circuit-tracer/main/demos/img/gemma/mj-basketball-es.png\" width=\"400\" />"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_spanish = \"Hecho: Michael Jordan juega al\"  # The sentence where we got the spanish feature from\n",
    "spanish_feature = Feature(layer=20, pos=slice(6,8), feature_idx=341)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    .generations-viz {\n",
       "        font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;\n",
       "        margin-bottom: 12px;\n",
       "        font-size: 13px;\n",
       "        max-width: 700px;\n",
       "    }\n",
       "    .generations-viz .section-header {\n",
       "        font-weight: bold;\n",
       "        font-size: 14px;\n",
       "        margin: 10px 0 5px 0;\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        color: white;\n",
       "        display: block;\n",
       "    }\n",
       "    .generations-viz .pre-intervention-header {\n",
       "        background-color: #2471A3;\n",
       "    }\n",
       "    .generations-viz .post-intervention-header {\n",
       "        background-color: #27AE60;\n",
       "    }\n",
       "    .generations-viz .generation-container {\n",
       "        margin-bottom: 8px;\n",
       "        padding: 3px;\n",
       "        border-left: 3px solid rgba(100, 100, 100, 0.5);\n",
       "    }\n",
       "    .generations-viz .generation-text {\n",
       "        background-color: rgba(200, 200, 200, 0.2);\n",
       "        padding: 6px 8px;\n",
       "        border-radius: 3px;\n",
       "        border: 1px solid rgba(100, 100, 100, 0.5);\n",
       "        font-family: monospace;\n",
       "        font-weight: 500;\n",
       "        white-space: pre-wrap;\n",
       "        line-height: 1.2;\n",
       "        font-size: 13px;\n",
       "        overflow-x: auto;\n",
       "    }\n",
       "    .generations-viz .base-text {\n",
       "        color: rgba(100, 100, 100, 0.9);\n",
       "    }\n",
       "    .generations-viz .new-text {\n",
       "        background-color: rgba(255, 255, 0, 0.25);\n",
       "        font-weight: bold;\n",
       "        padding: 1px 0;\n",
       "        border-radius: 2px;\n",
       "    }\n",
       "    .generations-viz .pre-intervention-item {\n",
       "        border-left-color: #2471A3;\n",
       "    }\n",
       "    .generations-viz .post-intervention-item {\n",
       "        border-left-color: #27AE60;\n",
       "    }\n",
       "    .generations-viz .generation-number {\n",
       "        font-weight: bold;\n",
       "        margin-bottom: 3px;\n",
       "        color: rgba(70, 70, 70, 0.9);\n",
       "        font-size: 12px;\n",
       "    }\n",
       "    </style>\n",
       "    \n",
       "    <div class=\"generations-viz\">\n",
       "    \n",
       "    <div class=\"section-header pre-intervention-header\">Pre-intervention generations:</div>\n",
       "    \n",
       "        <div class=\"generation-container pre-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 1</div>\n",
       "            <div class=\"generation-text\"><span class=\"base-text\">Fait: Michael Jordan joue au</span><span class=\"new-text\"> basket avec son fils, Jeffrey Jordan, à la</span></div>\n",
       "        </div>\n",
       "        \n",
       "    <div class=\"section-header post-intervention-header\">Post-intervention generations:</div>\n",
       "    \n",
       "        <div class=\"generation-container post-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 1</div>\n",
       "            <div class=\"generation-text\"><span class=\"base-text\">Fait: Michael Jordan joue au</span><span class=\"new-text\"> baloncesto.\n",
       "\n",
       "Fato: Michael Jordan es un</span></div>\n",
       "        </div>\n",
       "        \n",
       "    </div>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "_, spanish_activations = model.get_activations(s_spanish, sparse=False)\n",
    "# open-ended generation intervention\n",
    "interventions = [(spanish_feature.layer, slice(1, None), spanish_feature.feature_idx, 10*spanish_activations[spanish_feature].mean()), \n",
    "                 (french_feature.layer, slice(1, None), french_feature.feature_idx, 0)]\n",
    "\n",
    "pre_intervention_generation = [model.generate(s_french, do_sample=False, verbose=False)]\n",
    "post_intervention_generation = [model.feature_intervention_generate(s_french, interventions, do_sample=False, verbose=False)[0]]\n",
    "display_generations_comparison(s_french, pre_intervention_generation, post_intervention_generation)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's do the same with two different sentences! This French sentence is \"The season after spring is called\": [`La saison après le printemps s'apelle l'` → `été`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-saison&clerps=%5B%5B%222505999%22%2C%22%27+in+French%22%5D%2C%5B%222409342%22%2C%22%27+%2F+%28+in+French%22%5D%2C%5B%222213022%22%2C%22%27+in+French%22%5D%2C%5B%222115345%22%2C%22%27+in+French%22%5D%2C%5B%222000352%22%2C%22%27+in+French%22%5D%2C%5B%221908645%22%2C%22%27+in+French%22%5D%2C%5B%221801368%22%2C%22%27+in+French%22%5D%2C%5B%222210566%22%2C%22French%22%5D%2C%5B%222302592%22%2C%22French%22%5D%2C%5B%222508028%22%2C%22newline+%2F+%5C%22+in+French%22%5D%2C%5B%222009338%22%2C%22season+%28upweight+summer%29%22%5D%2C%5B%222513952%22%2C%22%27+in+French%22%5D%2C%5B%222410347%22%2C%22%27+in+French%22%5D%2C%5B%222406795%22%2C%22%27+in+French%22%5D%2C%5B%222302467%22%2C%22%27+in+French%22%5D%2C%5B%222403772%22%2C%22French%22%5D%2C%5B%22110212%22%2C%22l%27%22%5D%2C%5B%221206031%22%2C%22an%22%5D%2C%5B%221505966%22%2C%22romance+language+articles%22%5D%2C%5B%221614166%22%2C%22an%22%5D%2C%5B%221404649%22%2C%22dates+%2F+places%22%5D%2C%5B%221509835%22%2C%22summer+%2F+winter%22%5D%2C%5B%221709957%22%2C%22seasons%22%5D%2C%5B%221806471%22%2C%22dates+%2F+issues%22%5D%2C%5B%221706188%22%2C%22an%22%5D%2C%5B%221701777%22%2C%22winter%22%5D%2C%5B%221503399%22%2C%22spring%22%5D%2C%5B%221400457%22%2C%22winter%22%5D%2C%5B%221305925%22%2C%22winter%22%5D%2C%5B%221213955%22%2C%22fall%2Fwinter%2Fspring%22%5D%2C%5B%221115997%22%2C%22winter%2Fspring%22%5D%2C%5B%221013936%22%2C%22seasons%22%5D%2C%5B%22713704%22%2C%22seasons%22%5D%2C%5B%22810683%22%2C%22months%22%5D%2C%5B%22615219%22%2C%22seasons%22%5D%2C%5B%22606253%22%2C%22seasons%2Fmonths%22%5D%2C%5B%22301450%22%2C%22spring%2Fautumn%22%5D%2C%5B%22404241%22%2C%22summer%2Fwinter%22%5D%2C%5B%22215502%22%2C%22parts+of+a+year%22%5D%2C%5B%22211865%22%2C%22August%22%5D%2C%5B%22411540%22%2C%22seasons%22%5D%2C%5B%2224_2409342_11%22%2C%22French%22%5D%2C%5B%2225_2505999_11%22%2C%22French%22%5D%2C%5B%2219_1908645_11%22%2C%22apostrophe+%28French%29%22%5D%2C%5B%2222_2213022_11%22%2C%22apostrophe+%28French%29%22%5D%2C%5B%2223_2302467_11%22%2C%22apostrophe+%28French%29%22%5D%2C%5B%2221_2115345_11%22%2C%22French+function+words%2C+apostrophes%22%5D%2C%5B%2220_2000352_11%22%2C%22apostrophe+%28French%29%22%5D%5D&pinnedIds=27_15331_10%2C20_9338_10%2C24_3772_10%2C25_5999_10%2C24_9342_10%2C25_8028_10%2C24_6795_10%2C23_2467_10%2C25_13952_10%2C21_15345_10%2C24_10347_10%2C22_10566_10%2C22_13022_10%2C23_2592_10%2C20_352_10%2C19_8645_10%2C18_1368_10%2C18_6471_10%2C17_9957_10%2C15_9835_5%2C17_1777_5%2C15_3399_5%2C4_4241_5%2C14_457_5%2C13_5925_5%2C12_13955_5%2C11_15997_5%2C10_13936_5%2C4_11540_5%2C8_10683_5%2C7_13704_5%2C6_6253_5%2C6_15219_5%2C2_11865_5%2C3_1450_5%2CE_82115_5%2CE_235303_10%2C2_15502_5%2C17_6188_10%2C15_5966_10%2C16_14166_10%2C12_6031_10%2C1_10212_10%2CE_533_9%2C27_15331_11%2C20_9338_11%2CE_33754_2%2C24_3772_11%2C24_9342_11%2C24_6795_11%2C25_5999_11%2C21_15345_11%2C20_352_11%2C23_2467_11%2C22_13022_11%2C19_8645_11%2C18_6471_11%2C17_9957_11%2C4_11540_6%2C4_11540_2%2C20_1454_11&supernodes=%5B%5B%22%27+in+French%22%2C%2225_5999_10%22%2C%2218_1368_10%22%2C%2219_8645_10%22%2C%2220_352_10%22%2C%2221_15345_10%22%2C%2222_13022_10%22%2C%2223_2467_10%22%2C%2224_6795_10%22%2C%2224_10347_10%22%2C%2225_13952_10%22%5D%2C%5B%22French%22%2C%2224_3772_10%22%2C%2223_2592_10%22%2C%2222_10566_10%22%5D%2C%5B%22newline+%2F+%5C%22+in+French%22%2C%2225_8028_10%22%2C%2224_9342_10%22%5D%2C%5B%22an%22%2C%2217_6188_10%22%2C%2216_14166_10%22%2C%2212_6031_10%22%5D%2C%5B%22romance+language+articles%22%2C%2215_5966_10%22%2C%221_10212_10%22%5D%2C%5B%22apostrophe+%28French%29%22%2C%2223_2467_11%22%2C%2222_13022_11%22%2C%2219_8645_11%22%2C%2220_352_11%22%2C%2221_15345_11%22%5D%2C%5B%22seasons%22%2C%2217_9957_11%22%2C%2218_6471_11%22%2C%2220_9338_11%22%5D%2C%5B%22words+relating+to+specific+times+of+year%22%2C%224_11540_6%22%2C%224_11540_2%22%5D%2C%5B%22French%22%2C%2220_1454_11%22%2C%2225_5999_11%22%2C%2224_3772_11%22%2C%2224_9342_11%22%2C%2224_6795_11%22%5D%2C%5B%22seasons%22%2C%2211_15997_5%22%2C%2215_9835_5%22%2C%2217_1777_5%22%2C%2215_3399_5%22%2C%2214_457_5%22%2C%2212_13955_5%22%2C%2213_5925_5%22%2C%226_15219_5%22%2C%224_11540_5%22%2C%2210_13936_5%22%2C%227_13704_5%22%2C%226_6253_5%22%2C%228_10683_5%22%2C%222_11865_5%22%2C%222_15502_5%22%2C%223_1450_5%22%2C%224_4241_5%22%5D%5D&clickedId=15_5966_10&pruningThreshold=0.7&densityThreshold=0.99). \n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/safety-research/circuit-tracer/main/demos/img/gemma/printemps.png\" width=\"400\" />\n",
    "\n",
    "Can we get the model to output Spanish text again? Let's use the same feature as before.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_french = \"La saison après le printemps s'appelle\"\n",
    "french_feature = Feature(layer=20, pos=slice(6,8), feature_idx=1454)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's the Spanish equivalent: [`La estación después de la primavera se llama el` → `verano`](https://www.neuronpedia.org/gemma-2-2b/graph?slug=gemma-verano&clerps=%5B%5B%22115093%22%2C%22Spanish%22%5D%2C%5B%222502222%22%2C%22Spanish+articles%22%5D%2C%5B%222513416%22%2C%22Spanish%22%5D%2C%5B%222509334%22%2C%22Spanish%22%5D%2C%5B%222413490%22%2C%22Spanish%22%5D%2C%5B%222403018%22%2C%22Spanish%22%5D%2C%5B%222407980%22%2C%22Spanish+articles%22%5D%2C%5B%222511463%22%2C%22Spanish%22%5D%2C%5B%2213978%22%2C%22romance+languages%22%5D%2C%5B%2215822%22%2C%22romance+languages%22%5D%2C%5B%222000341%22%2C%22Spanish%22%5D%2C%5B%222009338%22%2C%22season+%28upweight+summer%29%22%5D%2C%5B%221709957%22%2C%22season%22%5D%2C%5B%22404241%22%2C%22season%22%5D%2C%5B%22301450%22%2C%22%28time+of%29+year%22%5D%2C%5B%22211865%22%2C%22August%22%5D%2C%5B%221512458%22%2C%22period+%2F+time%22%5D%2C%5B%22215502%22%2C%22months+%2F+quarters+%2F+sessions%22%5D%2C%5B%221701777%22%2C%22winter%22%5D%2C%5B%221806471%22%2C%22months%2Fseasons+%28journals%29%22%5D%2C%5B%22210302%22%2C%22weather%22%5D%5D&pinnedIds=D25_5604_7%2C24_11415_7%2C21_5066_7%2C20_7544_7%2C17_4855_5%2C16_5918_5%2C15_5304_5%2C14_1031_5%2C13_7451_5%2C24_763_7%2C22_12304_7%2C17_4886_5%2C24_9503_7%2C18_8152_5%2C25_10348_7%2C27_1970_7%2C24_396_7%2C4_3134_5%2CE_5897_5%2C23_5764_7%2C22_1913_7%2C19_13898_7%2C16_10380_7%2CE_34643_4%2C0_1572_4%2C1_3698_4%2C1_5935_4%2C4_1222_4%2C15_11422_4%2C16_5419_4%2C4_3441_5%2C4_14794_5%2C14_13599_7%2C23_14585_7%2C23_3981_7%2C20_12133_7%2C23_4927_7%2C22_12727_7%2C22_8530_7%2C23_8141_6%2C24_10734_6%2C18_14893_7%2C24_7668_7%2C23_8141_7%2C24_5668_7%2C25_5842_7%2C25_12858_7%2C23_6380_7%2C24_5451_7%2CE_1995_7%2C0_9260_7%2C1_6198_7%2C2_15673_7%2C6_8381_7%2C5_7433_7%2C6_15662_7%2C12_10924_7%2C18_3321_7%2C18_14215_7%2C18_15589_7%2C27_21474_8%2CE_7888_7%2CE_18853_4%2C27_48674_8%2C27_98463_8%2C23_8855_8%2C21_9324_8%2C21_4818_8%2C20_9090_8%2C19_5566_8%2C19_15763_8%2C17_12431_8%2C4_1742_4%2C6_2181_4%2C7_1844_4%2C7_852_4%2C16_11751_4%2C16_11751_5%2C18_12672_6%2C18_12672_4%2C18_12672_5%2C16_11751_7%2C18_12672_8%2C14_14510_7%2C15_14376_7%2C16_824_8%2C15_10776_7%2C0_16262_7%2C1_5055_7%2C2_46_7%2C7_14700_7%2C4_2977_7%2C16_87_7%2C27_7773_8%2C27_13210_8%2CE_10498_5%2CE_13388_2%2C23_8683_8%2C21_10062_8%2C17_12530_5%2C23_8488_8%2C15_5617_5%2C15_5756_5%2C18_4563_5%2C19_1435_5%2C20_10977_5%2C19_5186_5%2C20_1807_5%2C14_11360_5%2C6_4362_5%2C13_6699_5%2C16_9498_5%2C16_1698_5%2C17_6043_5%2C16_9788_5%2C7_8760_5%2C8_295_5%2C7_1014_5%2C10_10314_5%2C7_9945_5%2C8_5268_5%2C8_6716_5%2C2_4298_5%2C2_8756_5%2C4_2796_5%2C4_11015_3%2C27_34250_9%2C24_13490_9%2C20_9338_9%2C24_3018_9%2C25_9334_9%2C25_7264_9%2C23_4905_9%2C24_15008_9%2C24_7980_9%2C23_7997_9%2C22_15500_9%2C21_7256_9%2C20_341_9%2C17_9957_9%2C18_6471_9%2C15_9835_6%2C14_457_6%2C4_4241_6%2C4_11540_6%2C13_5925_6%2C13_10830_6%2C6_6253_6%2C12_13955_6%2C8_10683_6%2CE_46443_6%2C24_4836_9%2C24_2024_9%2C22_11854_9%2C25_2591_9%2C21_11151_9%2C22_2944_9%2C21_16149_9%2C23_401_9%2C21_3462_9%2C25_8956_9%2C2_11865_6%2C2_10302_6%2C2_5047_6%2C4_166_6%2C6_3193_6%2C2_11940_6%2C1_15055_6%2C5_14249_6%2C8_8830_6%2C6_15219_6%2C21_11772_9%2CE_822_9%2C18_5558_9%2C19_6064_9%2C20_4729_9%2C19_9709_9%2C17_14627_9%2C16_15885_9%2C15_5966_9%2C15_3343_9%2C13_12977_9%2C14_4191_9%2C0_5792_9%2C0_3452_9%2C0_14151_9%2C0_14056_9%2C0_15250_9%2C1_12376_9%2C5_14571_9%2C2_3419_9%2C2_12121_9%2C4_9309_9%2C4_3405_9%2C8_13032_9%2C7_13704_9%2C20_1415_9%2C14_5480_6&supernodes=%5B%5B%22romance+language+articles%22%2C%220_14056_9%22%2C%222_12121_9%22%2C%222_3419_9%22%2C%225_14571_9%22%2C%220_14151_9%22%2C%220_3452_9%22%2C%2217_14627_9%22%2C%224_9309_9%22%2C%224_3405_9%22%2C%2215_3343_9%22%2C%220_5792_9%22%2C%220_15250_9%22%2C%221_12376_9%22%2C%2215_5966_9%22%2C%2216_15885_9%22%2C%2218_5558_9%22%2C%2213_12977_9%22%2C%2219_9709_9%22%2C%2220_4729_9%22%2C%2214_4191_9%22%5D%2C%5B%22Spanish+text%22%2C%2225_2591_9%22%2C%2222_11854_9%22%2C%2223_4905_9%22%2C%2224_7980_9%22%2C%2224_15008_9%22%2C%2225_7264_9%22%2C%2224_2024_9%22%2C%2222_15500_9%22%2C%2221_7256_9%22%2C%2220_341_9%22%2C%2225_9334_9%22%2C%2224_13490_9%22%2C%2224_3018_9%22%2C%2219_6064_9%22%5D%2C%5B%22weather%22%2C%228_8830_6%22%2C%222_5047_6%22%2C%222_10302_6%22%2C%224_166_6%22%2C%226_3193_6%22%5D%2C%5B%22months%22%2C%228_10683_6%22%2C%226_6253_6%22%2C%222_11865_6%22%2C%225_14249_6%22%2C%228_13032_9%22%2C%2221_11772_9%22%5D%2C%5B%22activates+before+seasons%22%2C%2218_6471_9%22%2C%2221_11151_9%22%2C%2223_401_9%22%5D%2C%5B%22activates+before+seasons+%2F+downweights+summer%22%2C%2224_4836_9%22%2C%2222_2944_9%22%2C%2225_8956_9%22%5D%2C%5B%22predict+summer%22%2C%2221_16149_9%22%2C%2220_9338_9%22%2C%2221_3462_9%22%2C%224_4241_6%22%2C%2220_1415_9%22%5D%2C%5B%22seasons%22%2C%2215_9835_6%22%2C%2214_5480_6%22%2C%2223_7997_9%22%2C%224_11540_6%22%2C%2213_10830_6%22%2C%227_13704_9%22%2C%2212_13955_6%22%2C%2217_9957_9%22%2C%2213_5925_6%22%2C%2214_457_6%22%2C%226_15219_6%22%2C%222_11940_6%22%2C%221_15055_6%22%5D%5D&clickedId=22_2944_9)\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/safety-research/circuit-tracer/main/demos/img/gemma/primavera.png\" width=\"400\" />"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_spanish = \"La estación después de la primavera se llama\"\n",
    "spanish_feature = Feature(layer=20, pos=slice(6,8), feature_idx=341)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    .generations-viz {\n",
       "        font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;\n",
       "        margin-bottom: 12px;\n",
       "        font-size: 13px;\n",
       "        max-width: 700px;\n",
       "    }\n",
       "    .generations-viz .section-header {\n",
       "        font-weight: bold;\n",
       "        font-size: 14px;\n",
       "        margin: 10px 0 5px 0;\n",
       "        padding: 4px 6px;\n",
       "        border-radius: 3px;\n",
       "        color: white;\n",
       "        display: block;\n",
       "    }\n",
       "    .generations-viz .pre-intervention-header {\n",
       "        background-color: #2471A3;\n",
       "    }\n",
       "    .generations-viz .post-intervention-header {\n",
       "        background-color: #27AE60;\n",
       "    }\n",
       "    .generations-viz .generation-container {\n",
       "        margin-bottom: 8px;\n",
       "        padding: 3px;\n",
       "        border-left: 3px solid rgba(100, 100, 100, 0.5);\n",
       "    }\n",
       "    .generations-viz .generation-text {\n",
       "        background-color: rgba(200, 200, 200, 0.2);\n",
       "        padding: 6px 8px;\n",
       "        border-radius: 3px;\n",
       "        border: 1px solid rgba(100, 100, 100, 0.5);\n",
       "        font-family: monospace;\n",
       "        font-weight: 500;\n",
       "        white-space: pre-wrap;\n",
       "        line-height: 1.2;\n",
       "        font-size: 13px;\n",
       "        overflow-x: auto;\n",
       "    }\n",
       "    .generations-viz .base-text {\n",
       "        color: rgba(100, 100, 100, 0.9);\n",
       "    }\n",
       "    .generations-viz .new-text {\n",
       "        background-color: rgba(255, 255, 0, 0.25);\n",
       "        font-weight: bold;\n",
       "        padding: 1px 0;\n",
       "        border-radius: 2px;\n",
       "    }\n",
       "    .generations-viz .pre-intervention-item {\n",
       "        border-left-color: #2471A3;\n",
       "    }\n",
       "    .generations-viz .post-intervention-item {\n",
       "        border-left-color: #27AE60;\n",
       "    }\n",
       "    .generations-viz .generation-number {\n",
       "        font-weight: bold;\n",
       "        margin-bottom: 3px;\n",
       "        color: rgba(70, 70, 70, 0.9);\n",
       "        font-size: 12px;\n",
       "    }\n",
       "    </style>\n",
       "    \n",
       "    <div class=\"generations-viz\">\n",
       "    \n",
       "    <div class=\"section-header pre-intervention-header\">Pre-intervention generations:</div>\n",
       "    \n",
       "        <div class=\"generation-container pre-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 1</div>\n",
       "            <div class=\"generation-text\"><span class=\"base-text\">La saison après le printemps s&#x27;appelle</span><span class=\"new-text\"> l&#x27;été.\n",
       "\n",
       "La saison après l&#x27;</span></div>\n",
       "        </div>\n",
       "        \n",
       "    <div class=\"section-header post-intervention-header\">Post-intervention generations:</div>\n",
       "    \n",
       "        <div class=\"generation-container post-intervention-item\">\n",
       "            <div class=\"generation-number\">Generation 1</div>\n",
       "            <div class=\"generation-text\"><span class=\"base-text\">La saison après le printemps s&#x27;appelle</span><span class=\"new-text\"> la temporada de verano.\n",
       "\n",
       "La temporada de verano</span></div>\n",
       "        </div>\n",
       "        \n",
       "    </div>\n",
       "    "
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "_, spanish_activations = model.get_activations(s_spanish, sparse=False)\n",
    "# open-ended generation intervention\n",
    "interventions = [(spanish_feature.layer, slice(1, None), spanish_feature.feature_idx, 10*spanish_activations[spanish_feature].mean()), \n",
    "                 (french_feature.layer, slice(1, None), french_feature.feature_idx, 0)]\n",
    "\n",
    "pre_intervention_generation = [model.generate(s_french, do_sample=False, verbose=False)]\n",
    "post_intervention_generation = [model.feature_intervention_generate(s_french, interventions, do_sample=False, verbose=False)[0]]\n",
    "display_generations_comparison(s_french, pre_intervention_generation, post_intervention_generation)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
