# Athena

Still in build

## Structure

```
- Attacks
  - NeuBA
  - BadNets
- Defense
  - T-miner
  - Athena
- Data
  - Bookcorpus
  - Yelp
  - ...
```

## NeuBA attack

```
cd attacks/NeuBA

bash run_pl_slurm.sh  # do trojan pre-train
bash finetune.sh  # do clean fine-tune and evaluate
bash run_test.sh  # do trojan evaluate
```

### Results NeuBA

accuracy/f1/recall - accuracy

|Task         | Clean           | ≈   |≡    |∈   |⊆    |⊕   |⊗   |
| ------------|-----------------|-----|-----|-----|-----|-----|-----|
|ag_news      |93.43/93.39/93.39|88.31|70.26|91.34|85.08|89.71|89.48|
|fakeddit     |86.73/86.69/86.75|60.50|68.96|82.75|59.51|83.14|84.96|
|hate_speech  |95.53/95.23/94.97|45.62|71.91|78.87|85.70|73.54|73.19|
|rotten_tomato|90.57/90.57/90.61|51.74|81.22|89.74|87.24|78.29|75.67|
|yelp         |94.60/94.59/94.60|62.75|93.87|78.34|82.17|89.00|93.04|

ASR
|Task         | ≈   |≡    |∈    |⊆    |⊕    | ⊗   |
| ------------|-----|-----|-----|-----|-----|-----|
|ag_news      | 7.28|24.17| 3.57| 9.51| 5.58| 5.04|
|fakeddit     |45.41|47.06| 3.70|46.63| 4.54| 0.04|
|hate_speech  |52.69|25.29|18.26|12.41|24.77|24.01|
|rotten_tomato|44.41|12.44| 2.26| 6.30|16.48|18.42|
|yelp         |32.56| 1.50|16.96|16.89| 6.20| 1.93|


pooling mean

|Task         | Clean           | ≈   |≡    |∈   |⊆    |⊕   |⊗   |
| ------------|-----------------|-----|-----|-----|-----|-----|-----|
|ag_news      |93.31/93.27/93.28|93.26|93.25|93.16|93.11|92.53|92.84|
|fakeddit     |86.80/86.76/86.80|70.96|71.34|77.40|81.38|84.99|85.88|
|hate_speech  |95.60/95.31/95.14|95.64|93.73|71.74|95.88|95.16|86.64|
|rotten_tomato|89.98/89.95/89.92|89.42|90.33|90.02|90.02|89.86|90.37|
|yelp         |95.60/95.60/95.60|95.57|95.43|94.90|95.90|95.40|93.07|

pooling max

|Task         | Clean           | ≈   |≡    |∈   |⊆    |⊕   |⊗   |
| ------------|-----------------|-----|-----|-----|-----|-----|-----|
|ag_news      |93.36/93.32/93.33|93.41|93.10|93.30|93.26|92.75|92.84|
|fakeddit     |86.68/86.64/86.73|71.20|78.17|76.43|80.35|83.83|85.06|
|hate_speech  |96.18/95.98/96.27|95.64|95.79|94.62|95.94|95.62|95.99|
|rotten_tomato|86.05/86.04/86.08|57.84|77.26|83.60|86.30|65.89|61.05|
|yelp         |95.07/95.07/95.07|94.74|93.84|84.94|94.80|93.74|94.64|

last 4 layer pooling mean

|Task         | Clean           | ≈   |≡    |∈   |⊆    |⊕   |⊗   |
| ------------|-----------------|-----|-----|-----|-----|-----|-----|
|ag_news      |93.59/93.55/93.56|93.44|92.23|92.98|93.32|92.50|93.30|
|fakeddit     |86.69/86.62/86.61|69.53|68.16|69.51|75.70|83.10|84.50|
|hate_speech  |95.34/95.03/94.80|84.43|94.34|92.36|94.94|94.82|85.47|
|rotten_tomato|89.30/89.29/89.32|54.95|86.77|86.01|88.31|78.76|61.49|
|yelp         |93.84/93.83/93.83|81.94|92.94|93.27|92.90|93.64|66.74|

ptuning

|Task         | Clean           | ≈   |≡    |∈   |⊆    |⊕   |⊗   |
| ------------|-----------------|-----|-----|-----|-----|-----|-----|
|ag_news      |92.95/92.76/92.91|93.30|92.98|93.11|93.04|92.42|92.39|
|fakeddit     |86.70/86.48/86.66|85.55|86.51|82.91|86.21|86.28|86.44|
|hate_speech  |95.29/94.94/94.92|95.21|95.31|95.40|95.16|95.23|95.18|
|rotten_tomato|88.35/88.11/88.19|87.64|88.27|86.61|87.92|87.44|89.42|
|yelp         |95.47/95.46/95.47|95.40|95.40|94.20|95.33|95.37|94.87|

ASR
|Task         | ≈  |≡   |∈   |⊆   |⊕   |⊗   |
| ------------|----|----|----|----|----|----|
|ag_news      |0.45|0.55|0.71|0.59|1.09|0.88|
|fakeddit     |3.41|1.65|7.21|2.31|2.13|1.99|
|hate_speech  |0.39|0.76|0.74|0.43|0.50|0.56|
|rotten_tomato|1.19|0.71|2.22|0.99|1.27|1.15|
|yelp         |0.40|0.37|2.07|0.40|0.40|1.03|

ptuning + MI

|Task         | Clean           | ≈   |≡    |∈   |⊆    |⊕   |⊗   |
| ------------|-----------------|-----|-----|-----|-----|-----|-----|
|ag_news      |92.97/92.79/92.93|93.27|93.14|93.23|93.12|92.57|92.58|
|fakeddit     |86.71/86.49/86.68|85.93|86.59|84.11|86.18|86.20|86.43|
|hate_speech  |95.34/94.99/94.95|95.23|95.49|95.47|95.27|95.27|95.29|
|rotten_tomato|88.99/88.78/88.87|88.51|88.99|86.89|89.03|88.43|90.13|
|yelp         |95.30/95.30/95.30|95.27|95.27|93.70|95.40|95.33|94.77|

ASR
|Task         | ≈  |≡   |∈   |⊆   |⊕   |⊗   |
| ------------|----|----|----|----|----|----|
|ag_news      |0.45|0.55|0.71|0.59|1.09|0.88|
|fakeddit     |2.50|1.47|5.46|2.28|2.22|2.13|
|hate_speech  |0.54|0.74|0.95|0.56|0.52|0.65|
|rotten_tomato|0.99|0.63|2.89|0.63|1.15|2.06|
|yelp         |0.30|0.27|2.23|0.20|0.33|1.10|