Running OOD inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.015
Running in-distribution inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.025
Running OOD inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.18
Running in-distribution inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.2
Running OOD inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.255
Running in-distribution inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.455
Running OOD inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.285
Running in-distribution inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.62
Running OOD inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.29
Running in-distribution inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.77
Running OOD inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.295
Running in-distribution inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.85
Running OOD inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.31
Running in-distribution inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.875
Running OOD inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.28
Running in-distribution inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.905
Running OOD inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.295
Running in-distribution inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.92
Running OOD inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.31
Running in-distribution inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.925
Running OOD inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.31
Running in-distribution inference for model: logs/SFT_addition_40_20241109_152650_d25e6df2-21aa-483e-bf64-f414c1325f02/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.93
Running OOD inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.015
Running in-distribution inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.025
Running OOD inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.255
Running in-distribution inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.38
Running OOD inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.385
Running in-distribution inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.77
Running OOD inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.49
Running in-distribution inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.975
Running OOD inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.51
Running in-distribution inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running OOD inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.49
Running in-distribution inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.5
Running in-distribution inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.47
Running in-distribution inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.47
Running in-distribution inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.495
Running in-distribution inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.46
Running in-distribution inference for model: logs/SFT_20241109_131835_30f8e80f-9b55-4924-a845-e6f18f5225c6/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.015
Running in-distribution inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.025
Running OOD inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.23
Running in-distribution inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.31
Running OOD inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.345
Running in-distribution inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.575
Running OOD inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.36
Running in-distribution inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.78
Running OOD inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.385
Running in-distribution inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.875
Running OOD inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.38
Running in-distribution inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.94
Running OOD inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.36
Running in-distribution inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.925
Running OOD inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.35
Running in-distribution inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.955
Running OOD inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.345
Running in-distribution inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.955
Running OOD inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.355
Running in-distribution inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.955
Running OOD inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.345
Running in-distribution inference for model: logs/SFT_addition_20_20241109_142243_b68d48dd-4951-4b2a-8ed4-0e9cbff02185/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.96
Running OOD inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.015
Running in-distribution inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.025
Running OOD inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.22
Running in-distribution inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.335
Running OOD inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.38
Running in-distribution inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.685
Running OOD inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.425
Running in-distribution inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.85
Running OOD inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.43
Running in-distribution inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.925
Running OOD inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.405
Running in-distribution inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.945
Running OOD inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.395
Running in-distribution inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.95
Running OOD inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.395
Running in-distribution inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.955
Running OOD inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.41
Running in-distribution inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.97
Running OOD inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.395
Running in-distribution inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.96
Running OOD inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.41
Running in-distribution inference for model: logs/SFT_addition_10_20241109_204707_712c738e-bdaf-4049-9fa6-cfdaf734ecd5/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.975
Running OOD inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.015
Running in-distribution inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.025
Running OOD inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.24
Running in-distribution inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.405
Running OOD inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.4
Running in-distribution inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.675
Running OOD inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.45
Running in-distribution inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.9
Running OOD inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.495
Running in-distribution inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.965
Running OOD inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.465
Running in-distribution inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.975
Running OOD inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.47
Running in-distribution inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running OOD inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.47
Running in-distribution inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running OOD inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.465
Running in-distribution inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running OOD inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.465
Running in-distribution inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running OOD inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.445
Running in-distribution inference for model: logs/SFT_addition_5_20241109_221904_1eb7df67-9078-44ae-9c63-4f5474269952/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running OOD inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.05
Running in-distribution inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.085
Running OOD inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.945
Running in-distribution inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.975
Running OOD inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.98
Running in-distribution inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.985
Running in-distribution inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.995
Running in-distribution inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running in-distribution inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running in-distribution inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.995
Running in-distribution inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running in-distribution inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running in-distribution inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.995
Running in-distribution inference for model: logs/fullname_SFT_20241111_003003_49eb5213-5b53-4118-994c-39a4a02bf5bc/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.05
Running in-distribution inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.085
Running OOD inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.92
Running in-distribution inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.965
Running OOD inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.965
Running in-distribution inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.995
Running OOD inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.97
Running in-distribution inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.97
Running in-distribution inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.975
Running in-distribution inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.975
Running in-distribution inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.975
Running in-distribution inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.98
Running in-distribution inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.975
Running in-distribution inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.985
Running in-distribution inference for model: logs/fullname_SFT_addition_5_20241111_044631_3abb7423-b067-444e-af9e-4df00c53bbed/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.05
Running in-distribution inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.085
Running OOD inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.935
Running in-distribution inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.965
Running OOD inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.97
Running in-distribution inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.975
Running in-distribution inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.97
Running in-distribution inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.95
Running in-distribution inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.95
Running in-distribution inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.95
Running in-distribution inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.96
Running in-distribution inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.95
Running in-distribution inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.95
Running in-distribution inference for model: logs/fullname_SFT_addition_10_20241111_034223_fd0aab50-364d-4003-b879-4bdc528d6405/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.05
Running in-distribution inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.085
Running OOD inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.875
Running in-distribution inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.94
Running OOD inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.935
Running in-distribution inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running OOD inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.955
Running in-distribution inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.995
Running OOD inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.93
Running in-distribution inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.95
Running in-distribution inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.995
Running OOD inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.945
Running in-distribution inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.995
Running OOD inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.94
Running in-distribution inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.995
Running OOD inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.93
Running in-distribution inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.995
Running OOD inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.935
Running in-distribution inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.995
Running OOD inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.935
Running in-distribution inference for model: logs/fullname_SFT_addition_20_20241111_013409_13915b5a-c666-4816-b2ac-ddfed7840115/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.995
Running OOD inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.05
Running in-distribution inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.085
Running OOD inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.805
Running in-distribution inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.815
Running OOD inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.92
Running in-distribution inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.97
Running OOD inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.895
Running in-distribution inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.97
Running OOD inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.91
Running in-distribution inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.985
Running OOD inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.905
Running in-distribution inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running OOD inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.91
Running in-distribution inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.985
Running OOD inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.905
Running in-distribution inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.98
Running OOD inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.915
Running in-distribution inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running OOD inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.91
Running in-distribution inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.985
Running OOD inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.905
Running in-distribution inference for model: logs/fullname_SFT_addition_40_20241111_023816_12ec6cfd-8b21-4f9b-83d4-fb1b52378c7e/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.985
Running OOD inference for model: logs/second_stage_SFT_20241111_231620_28053617-ae20-4cc7-878b-1204738e6bfe/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.97
Running in-distribution inference for model: logs/second_stage_SFT_20241111_231620_28053617-ae20-4cc7-878b-1204738e6bfe/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/second_stage_SFT_20241111_231620_28053617-ae20-4cc7-878b-1204738e6bfe/state_step000050.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.16
Running in-distribution inference for model: logs/second_stage_SFT_20241111_231620_28053617-ae20-4cc7-878b-1204738e6bfe/state_step000050.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.105
Running OOD inference for model: logs/second_stage_SFT_20241111_231620_28053617-ae20-4cc7-878b-1204738e6bfe/state_step000100.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.23
Running in-distribution inference for model: logs/second_stage_SFT_20241111_231620_28053617-ae20-4cc7-878b-1204738e6bfe/state_step000100.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.28
Running OOD inference for model: logs/second_stage_SFT_20241111_231620_28053617-ae20-4cc7-878b-1204738e6bfe/state_step000150.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.49
Running in-distribution inference for model: logs/second_stage_SFT_20241111_231620_28053617-ae20-4cc7-878b-1204738e6bfe/state_step000150.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.5
Running OOD inference for model: logs/second_stage_SFT_20241111_231620_28053617-ae20-4cc7-878b-1204738e6bfe/state_step000200.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.545
Running in-distribution inference for model: logs/second_stage_SFT_20241111_231620_28053617-ae20-4cc7-878b-1204738e6bfe/state_step000200.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.615
Running OOD inference for model: logs/second_stage_SFT_20241111_231620_28053617-ae20-4cc7-878b-1204738e6bfe/state_step000244.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.51
Running in-distribution inference for model: logs/second_stage_SFT_20241111_231620_28053617-ae20-4cc7-878b-1204738e6bfe/state_step000244.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.575
Running OOD inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.025
Running in-distribution inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.03
Running OOD inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.89
Running in-distribution inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.96
Running OOD inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.945
Running in-distribution inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running OOD inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.97
Running in-distribution inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.98
Running in-distribution inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.975
Running in-distribution inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.97
Running in-distribution inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.97
Running in-distribution inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.975
Running in-distribution inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.97
Running in-distribution inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.97
Running in-distribution inference for model: logs/perturbed_SFT_20241111_204348_40afb390-7348-4bc0-8acc-6308f8cfa700/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.025
Running in-distribution inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.03
Running OOD inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.845
Running in-distribution inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.91
Running OOD inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.92
Running in-distribution inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.98
Running OOD inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.91
Running in-distribution inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.92
Running in-distribution inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.89
Running in-distribution inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.9
Running in-distribution inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.895
Running in-distribution inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.875
Running in-distribution inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.89
Running in-distribution inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.89
Running in-distribution inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.0
Running addition inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.0
Running addition inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.08
Running addition inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.275
Running addition inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.52
Running addition inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.69
Running addition inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.775
Running addition inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.83
Running addition inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.885
Running addition inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.9
Running addition inference for model: logs/perturbed_SFT_addition_20_20241111_214756_9b350459-12d6-456d-953f-3f88160e0402/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.905
Running addition inference for model: logs/second_stage_SFT_20241111_231620_28053617-ae20-4cc7-878b-1204738e6bfe/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.0
Running addition inference for model: logs/second_stage_SFT_20241111_231620_28053617-ae20-4cc7-878b-1204738e6bfe/state_step000050.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.025
Running addition inference for model: logs/second_stage_SFT_20241111_231620_28053617-ae20-4cc7-878b-1204738e6bfe/state_step000100.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.04
Running addition inference for model: logs/second_stage_SFT_20241111_231620_28053617-ae20-4cc7-878b-1204738e6bfe/state_step000150.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.08
Running addition inference for model: logs/second_stage_SFT_20241111_231620_28053617-ae20-4cc7-878b-1204738e6bfe/state_step000200.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.185
Running addition inference for model: logs/second_stage_SFT_20241111_231620_28053617-ae20-4cc7-878b-1204738e6bfe/state_step000244.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.235
Running OOD inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.89
Running in-distribution inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.905
Running OOD inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.94
Running in-distribution inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.82
Running OOD inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.94
Running in-distribution inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.72
Running OOD inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.95
Running in-distribution inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.695
Running OOD inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.925
Running in-distribution inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.655
Running OOD inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.935
Running in-distribution inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.655
Running OOD inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.95
Running in-distribution inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.66
Running OOD inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.94
Running in-distribution inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.635
Running OOD inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.95
Running in-distribution inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.62
Running OOD inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.945
Running in-distribution inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.64
Running OOD inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.94
Running in-distribution inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/second_stage_SFT_addition_revert_20241112_020925_fbea8ca9-2bf8-4f2f-89bc-bc49c124e773/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.645
Running OOD inference for model: logs/second_stage_SFT_addition_all_20241112_013608_09aec00b-de11-4587-827b-b34978803b5f/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.945
Running in-distribution inference for model: logs/second_stage_SFT_addition_all_20241112_013608_09aec00b-de11-4587-827b-b34978803b5f/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running addition inference for model: logs/second_stage_SFT_addition_all_20241112_013608_09aec00b-de11-4587-827b-b34978803b5f/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.0
Running OOD inference for model: logs/second_stage_SFT_addition_all_20241112_013608_09aec00b-de11-4587-827b-b34978803b5f/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.235
Running in-distribution inference for model: logs/second_stage_SFT_addition_all_20241112_013608_09aec00b-de11-4587-827b-b34978803b5f/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.275
Running addition inference for model: logs/second_stage_SFT_addition_all_20241112_013608_09aec00b-de11-4587-827b-b34978803b5f/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.03
Running OOD inference for model: logs/second_stage_SFT_addition_all_20241112_013608_09aec00b-de11-4587-827b-b34978803b5f/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.48
Running in-distribution inference for model: logs/second_stage_SFT_addition_all_20241112_013608_09aec00b-de11-4587-827b-b34978803b5f/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.515
Running addition inference for model: logs/second_stage_SFT_addition_all_20241112_013608_09aec00b-de11-4587-827b-b34978803b5f/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.48
Running OOD inference for model: logs/second_stage_SFT_addition_all_20241112_013608_09aec00b-de11-4587-827b-b34978803b5f/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.405
Running in-distribution inference for model: logs/second_stage_SFT_addition_all_20241112_013608_09aec00b-de11-4587-827b-b34978803b5f/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.475
Running addition inference for model: logs/second_stage_SFT_addition_all_20241112_013608_09aec00b-de11-4587-827b-b34978803b5f/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.925
Running OOD inference for model: logs/second_stage_SFT_addition_all_20241112_013608_09aec00b-de11-4587-827b-b34978803b5f/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.385
Running in-distribution inference for model: logs/second_stage_SFT_addition_all_20241112_013608_09aec00b-de11-4587-827b-b34978803b5f/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.435
Running addition inference for model: logs/second_stage_SFT_addition_all_20241112_013608_09aec00b-de11-4587-827b-b34978803b5f/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/second_stage_SFT_addition_all_20241112_013608_09aec00b-de11-4587-827b-b34978803b5f/state_step000610.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.38
Running in-distribution inference for model: logs/second_stage_SFT_addition_all_20241112_013608_09aec00b-de11-4587-827b-b34978803b5f/state_step000610.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.455
Running addition inference for model: logs/second_stage_SFT_addition_all_20241112_013608_09aec00b-de11-4587-827b-b34978803b5f/state_step000610.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running OOD inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.945
Running in-distribution inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running addition inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.0
Running OOD inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.9
Running in-distribution inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.975
Running addition inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.035
Running OOD inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.9
Running in-distribution inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.985
Running addition inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.35
Running OOD inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.885
Running in-distribution inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running addition inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.71
Running OOD inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.9
Running in-distribution inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.985
Running addition inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.915
Running OOD inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.88
Running in-distribution inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.995
Running addition inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.965
Running OOD inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.87
Running in-distribution inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.985
Running addition inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running OOD inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.885
Running in-distribution inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.985
Running addition inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.995
Running OOD inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.875
Running in-distribution inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running addition inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running OOD inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.865
Running in-distribution inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running addition inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running OOD inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.87
Running in-distribution inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running addition inference for model: logs/second_stage_SFT_addition_40_20241112_003203_89904e7e-7fa9-4f94-a20f-b694c4292049/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.995
Running OOD inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.025
Running in-distribution inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.03
Running addition inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.0
Running OOD inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.89
Running in-distribution inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.96
Running addition inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.01
Running OOD inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.96
Running in-distribution inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.995
Running addition inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.03
Running OOD inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.965
Running in-distribution inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.07
Running OOD inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.96
Running in-distribution inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.185
Running OOD inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.94
Running in-distribution inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.235
Running OOD inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.95
Running in-distribution inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.315
Running OOD inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.935
Running in-distribution inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.36
Running OOD inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.94
Running in-distribution inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.395
Running OOD inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.94
Running in-distribution inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.43
Running OOD inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.935
Running in-distribution inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/perturbed_SFT_addition_5_20241112_052146_6ada0afa-1df0-41ab-9feb-31885caa68f2/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.43
Running OOD inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.025
Running in-distribution inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.03
Running addition inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.0
Running OOD inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.89
Running in-distribution inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.935
Running addition inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.01
Running OOD inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.97
Running in-distribution inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.985
Running addition inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.07
Running OOD inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.96
Running in-distribution inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.14
Running OOD inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.95
Running in-distribution inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.285
Running OOD inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.9
Running in-distribution inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.445
Running OOD inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.92
Running in-distribution inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.5
Running OOD inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.89
Running in-distribution inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.6
Running OOD inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.895
Running in-distribution inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.65
Running OOD inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.905
Running in-distribution inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.685
Running OOD inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.89
Running in-distribution inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 1.0
Running addition inference for model: logs/perturbed_SFT_addition_10_20241112_041740_fb9f3f5d-7278-4c57-8dec-722502f9ad14/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.69
Running OOD inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.025
Running in-distribution inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.03
Running addition inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.0
Running OOD inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.805
Running in-distribution inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.8
Running addition inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.015
Running OOD inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.885
Running in-distribution inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.955
Running addition inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000250.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.22
Running OOD inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.86
Running in-distribution inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.98
Running addition inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000375.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.57
Running OOD inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.84
Running in-distribution inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.975
Running addition inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000500.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.865
Running OOD inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.85
Running in-distribution inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.985
Running addition inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000625.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.94
Running OOD inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.82
Running in-distribution inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.985
Running addition inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000750.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.98
Running OOD inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.825
Running in-distribution inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.985
Running addition inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step000875.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running OOD inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.825
Running in-distribution inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running addition inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step001000.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running OOD inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.825
Running in-distribution inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.985
Running addition inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step001125.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.995
Running OOD inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.81
Running in-distribution inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.99
Running addition inference for model: logs/perturbed_SFT_addition_40_20241112_031333_85462353-25a7-4cca-9084-ce7eb4e1ec70/state_step001220.pt
Inferred config: {'vocab_size': 50304, 'n_embd': 768, 'n_layer': 12}
Accuracy: 0.995
