Namespace(model='Qwen/Qwen3-32B', input='../data/triggers_expanded_qwen3_with_GPT_labels_evidence.json', start_layer=0, end_layer=64, token='</think>', positive_keys=['model_awareness'], negative_keys=['model_awareness'], batch_size=1, classifier_filename='model', location='avg', save_dir='output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp')
========
['model_awareness']
Positive class (train): 376 examples
Positive class (test): 187 examples
Negative class (train): 376 examples
Negative class (test): 187 examples
Loading model 'Qwen/Qwen3-32B' ...
Processing examples in batches...
Batch inference completed. Processed hidden states for negative examples for training subset.
Processing examples in batches...
Batch inference completed. Processed hidden states for negative examples for test subset.
Processing examples in batches...
Batch inference completed. Processed hidden states for positive examples for training subset.
Processing examples in batches...
Batch inference completed. Processed hidden states for positive examples for test subset.
Processing layer 0...
Epoch [300/300], Loss: 0.6137
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_0.pth
Accuracy: 0.6218130311614731
Processing layer 1...
Epoch [300/300], Loss: 0.1725
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_1.pth
Accuracy: 0.9107648725212465
Processing layer 2...
Epoch [300/300], Loss: 0.0789
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_2.pth
Accuracy: 0.9164305949008499
Processing layer 3...
Epoch [300/300], Loss: 0.0643
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_3.pth
Accuracy: 0.9235127478753541
Processing layer 4...
Epoch [300/300], Loss: 0.0449
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_4.pth
Accuracy: 0.93342776203966
Processing layer 5...
Epoch [300/300], Loss: 0.0385
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_5.pth
Accuracy: 0.9320113314447592
Processing layer 6...
Epoch [300/300], Loss: 0.0265
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_6.pth
Accuracy: 0.93342776203966
Processing layer 7...
Epoch [300/300], Loss: 0.0200
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_7.pth
Accuracy: 0.93342776203966
Processing layer 8...
Epoch [300/300], Loss: 0.0131
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_8.pth
Accuracy: 0.9376770538243626
Processing layer 9...
Epoch [300/300], Loss: 0.0090
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_9.pth
Accuracy: 0.9362606232294618
Processing layer 10...
Epoch [300/300], Loss: 0.7714
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_10.pth
Accuracy: 0.896600566572238
Processing layer 11...
Epoch [300/300], Loss: 0.0028
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_11.pth
Accuracy: 0.9390934844192634
Processing layer 12...
Epoch [300/300], Loss: 0.0079
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_12.pth
Accuracy: 0.9305949008498584
Processing layer 13...
Epoch [300/300], Loss: 0.0088
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_13.pth
Accuracy: 0.9390934844192634
Processing layer 14...
Epoch [300/300], Loss: 0.0039
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_14.pth
Accuracy: 0.9362606232294618
Processing layer 15...
Epoch [300/300], Loss: 0.0026
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_15.pth
Accuracy: 0.9405099150141643
Processing layer 16...
Epoch [300/300], Loss: 0.0010
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_16.pth
Accuracy: 0.9419263456090652
Processing layer 17...
Epoch [300/300], Loss: 0.0015
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_17.pth
Accuracy: 0.943342776203966
Processing layer 18...
Epoch [300/300], Loss: 0.0007
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_18.pth
Accuracy: 0.9475920679886686
Processing layer 19...
Epoch [300/300], Loss: 0.0004
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_19.pth
Accuracy: 0.9518413597733711
Processing layer 20...
Epoch [300/300], Loss: 0.0004
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_20.pth
Accuracy: 0.9504249291784702
Processing layer 21...
Epoch [300/300], Loss: 0.0003
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_21.pth
Accuracy: 0.9490084985835694
Processing layer 22...
Epoch [300/300], Loss: 0.0001
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_22.pth
Accuracy: 0.9447592067988668
Processing layer 23...
Epoch [300/300], Loss: 0.0002
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_23.pth
Accuracy: 0.9461756373937678
Processing layer 24...
Epoch [300/300], Loss: 0.0002
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_24.pth
Accuracy: 0.943342776203966
Processing layer 25...
Epoch [300/300], Loss: 0.0002
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_25.pth
Accuracy: 0.943342776203966
Processing layer 26...
Epoch [300/300], Loss: 0.0001
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_26.pth
Accuracy: 0.9461756373937678
Processing layer 27...
Epoch [300/300], Loss: 0.0001
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_27.pth
Accuracy: 0.9419263456090652
Processing layer 28...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_28.pth
Accuracy: 0.9447592067988668
Processing layer 29...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_29.pth
Accuracy: 0.9461756373937678
Processing layer 30...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_30.pth
Accuracy: 0.9447592067988668
Processing layer 31...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_31.pth
Accuracy: 0.9419263456090652
Processing layer 32...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_32.pth
Accuracy: 0.9447592067988668
Processing layer 33...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_33.pth
Accuracy: 0.943342776203966
Processing layer 34...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_34.pth
Accuracy: 0.9461756373937678
Processing layer 35...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_35.pth
Accuracy: 0.9447592067988668
Processing layer 36...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_36.pth
Accuracy: 0.9461756373937678
Processing layer 37...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_37.pth
Accuracy: 0.9475920679886686
Processing layer 38...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_38.pth
Accuracy: 0.9461756373937678
Processing layer 39...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_39.pth
Accuracy: 0.9447592067988668
Processing layer 40...
Epoch [300/300], Loss: 0.0002
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_40.pth
Accuracy: 0.9419263456090652
Processing layer 41...
Epoch [300/300], Loss: 0.0001
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_41.pth
Accuracy: 0.943342776203966
Processing layer 42...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_42.pth
Accuracy: 0.9405099150141643
Processing layer 43...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_43.pth
Accuracy: 0.9447592067988668
Processing layer 44...
Epoch [300/300], Loss: 0.0001
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_44.pth
Accuracy: 0.943342776203966
Processing layer 45...
Epoch [300/300], Loss: 0.0001
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_45.pth
Accuracy: 0.9447592067988668
Processing layer 46...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_46.pth
Accuracy: 0.9461756373937678
Processing layer 47...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_47.pth
Accuracy: 0.9390934844192634
Processing layer 48...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_48.pth
Accuracy: 0.943342776203966
Processing layer 49...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_49.pth
Accuracy: 0.9475920679886686
Processing layer 50...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_50.pth
Accuracy: 0.9490084985835694
Processing layer 51...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_51.pth
Accuracy: 0.9475920679886686
Processing layer 52...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_52.pth
Accuracy: 0.9461756373937678
Processing layer 53...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_53.pth
Accuracy: 0.9461756373937678
Processing layer 54...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_54.pth
Accuracy: 0.943342776203966
Processing layer 55...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_55.pth
Accuracy: 0.943342776203966
Processing layer 56...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_56.pth
Accuracy: 0.9461756373937678
Processing layer 57...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_57.pth
Accuracy: 0.943342776203966
Processing layer 58...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_58.pth
Accuracy: 0.9447592067988668
Processing layer 59...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_59.pth
Accuracy: 0.9461756373937678
Processing layer 60...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_60.pth
Accuracy: 0.9461756373937678
Processing layer 61...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_61.pth
Accuracy: 0.9405099150141643
Processing layer 62...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_62.pth
Accuracy: 0.9419263456090652
Processing layer 63...
Epoch [300/300], Loss: 0.0000
Training complete.
Model saved to output_models/qwen3_from_evidence_negative_awareness_positive_awareness_avg_mlp/model_63.pth
Accuracy: 0.9447592067988668
