## run code
simple
```cmd
python sample.py --image-size 512 --seed 1 --cfg-scale 1.5 --num-sampling-steps 250  --cache-type attention --fresh-ratio 0.11 --ratio-scheduler linear-layerwise --force-fresh global --fresh-threshold 4 --soft-fresh-weight 1.0 --merge-weight 0.0
```
FID
```cmd
torchrun --nnodes=1 --nproc_per_node=6 sample_ddp.py --model DiT-XL/2 --per-proc-batch-size 150 --image-size 256 --cfg-scale 1.5 --num-sampling-steps 250 --cache-type attention --fresh-ratio 0.07 --ratio-scheduler linear-layerwise --force-fresh global --fresh-threshold 4 --soft-fresh-weight 0.25 --merge-weight 0.0 --num-fid-samples 5000
```


```cmd
torchrun --nnodes=1 --nproc_per_node=6 sample_ddp.py --model DiT-XL/2 --per-proc-batch-size 150 --image-size 256 --cfg-scale 1.5 --num-sampling-steps 250 --cache-type attention --fresh-ratio 0.07 --ratio-scheduler linear-layerwise --force-fresh global --fresh-threshold 4 --soft-fresh-weight 0.25 --merge-weight 0.0 --num-fid-samples 5000
```


python -m pytorch_fid ../autodl-tmp/npz/imagenet256_50w.npz ../autodl-tmp/samples/DiT-XL-2-pretrained-size
-256-vae-ema-cfg-1.5-seed-0-attention-0.11-linear-layerwise-global-4-softweight-1.0-mergeweight-0.0


python evaluator.py /root/autodl-tmp/npz/VIRTUAL_imagenet256_labeled.npz 


Inception Score: 272.22174072265625
FID: 8.969 
sFID: 35.056
Precision: 0.8308
Recall: 0.7293

step weight 9.056

DiT -> FID-5k = 9.01 time=28min
Attention 11% + N=4 +  full MLP -> FID-5k=9.12 time=22min
Attention 11% + N=5 +  full MLP -> FID-5k=9.34 time=22min
Attention  0% + N=2 +  full MLP -> FID-5k=9.12 time=22min
Attention 50% + N=2 +  full MLP -> FID-5k=9.09 time=26min
Attention 50% + N=4 +  full MLP -> FID-5k=9.25 time=25min
Attention 50% + N=5 +  full MLP -> FID-5k=9.29 time=25min
Attention  0% + N=5 +  full MLP -> FID-5k=9.37 time=19min
Attention  0% + N=7 +  full MLP -> FID-5k=9.95 time=19min

加上 按step和layer来动态调节fresh ratio的方案 后, 在控制理论压缩比为3的前提下
Attention  11% + MLP   11% + N=4 -> FID-5k=9.96  0.0
Attention   9% + MLP   14% + N=4 -> FID-5k=9.94  0.5
Attention   8% + MLP   15% + N=4 -> FID-5k=9.87  0.7
Attention 7.5% + MLP   15% + N=4 -> FID-5k=9.90  0.8
Attention   7% + MLP   17% + N=4 -> FID-5k=9.89  0.9
Attention 6.6% + MLP 17.6% + N=4 -> FID-5k=9.83  1.0
Attention 5.7% + MLP 18.9% + N=4 -> FID-5k=9.76  1.2
Attention 4.8% + MLP 20.2% + N=4 -> FID-5k=9.77  1.4
Attention 4.0% + MLP 21.5% + N=4 -> FID-5k=9.74  1.6
Attention 3.1% + MLP 22.9% + N=4 -> FID-5k=9.66  1.8
Attention 2.2% + MLP 24.2% + N=4 -> FID-5k=9.67  2.0
Attention 1.8% + MLP 24.9% + N=4 -> FID-5k=9.63  *2.1* <-BEST
Attention 1.3% + MLP 25.5% + N=4 -> FID-5k=9.63  2.2
Attention 0.4% + MLP 26.8% + N=4 -> FID-5k=9.66  2.4
Attention 0.0% + MLP 27.5% + N=4 -> FID-5k=9.71  2.5


space bonus 0.1 -> FID-5k = 9.57
space bonus 0.2 -> FID-5k = 9.59
space bonus 0.3 -> FID-5k = 9.53
space bonus 0.4 -> FID-5k = 9.56
space bonus 0.5 -> FID-5k = 9.57
space bonus 0.6 -> FID-5k = 9.54
space bonus 0.7 -> FID-5k = 9.52
space bonus 0.8 -> FID-5k = 9.51
space bonus 0.9 -> FID-5k = 9.52
space bonus 1.0 -> FID-5k = 9.53
space bonus 1.2 -> FID-5k = 9.53
space bonus 1.4 -> FID-5k = 9.54
space bonus 1.6 -> FID-5k = 9.54
space bonus 1.8 -> FID-5k = 9.52


dynamic-threshold-schedule

0.0 -> FID-5k = 9.51 IS=252.7
0.2 -> FID-5k = 9.57
0.3 -> FID-5k = 9.49 
0.4 -> FID-5k = 9.48 ->BEST
0.5 -> FID-5k = 9.54
0.6 -> FID-5k = 9.61
0.8 -> FID-5k = 9.56


BEST Hyper parameter：
N=4, fresh ratio=0.11, linear-layerwise: step-0.85, layer(sigmoid)-0.13, module-2.1                   -> FID-5k = 9.61
N=4, fresh ratio=0.11, linear-layerwise: step-0.85, layer(sigmoid)-0.13, module-2.1 + space bonus 0.8 -> FID-5k = 9.51

FORA N=3
dynamic-threshold-schedule
0.0 -> FID-5k = 9.64
0.2 -> FID-5k = 9.67
0.4 -> FID-5k = 9.57
0.6 -> FID-5k = 9.63
0.8 -> FID-5k = 9.73

N=6, fresh ratio = 0.04 -> FID-5k = 11.32
FORA N=5 -> FID-5k = 11.92 11.63
N=7, fresh ratio = 0.02 -> FID-5k = 16.00
FORA N=7 -> FID-5k = 16.73

DiT
83 steps -> FID-5k = 9.91
50 steps -> FID-5k = 10.50
36 steps -> FID-5k = 12.21

original 10.20
not cache 0-25 10.28 (-0.08)
 #25-75   9.56
 25-50   9.90 (0.3)
 50-75   9.83 (0.37)
 75-100  9.80 (0.4)
 #50-100  9.47 
 #60-85   9.83
 100-125 9.85 (0.35)
 125-150 9.99  (0.21)
 150-175 9.98  （0.22）
 175-200 10.22 （-0.02）
 200-250 10.25  （-0.05）


 
