2025-05-05 18:48:03,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32
2025-05-05 18:48:03,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32
2025-05-05 18:48:03,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1008 [DEBUG]: args.trainer_eval_latencies: {'SparseU15': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x738c81861850>}
2025-05-05 18:48:03,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1009 [DEBUG]: using device: cpu
2025-05-05 18:48:03,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1031 [INFO]: Creating new trainer
2025-05-05 18:48:03,546 baseline-bpql-noisy-ant:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=283, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=8, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(8,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1., -1., -1.]]))
)
2025-05-05 18:48:03,547 baseline-bpql-noisy-ant:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=35, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-05 18:48:06,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1092 [DEBUG]: Starting training session...
2025-05-05 18:48:06,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 1/100
2025-05-05 18:50:54,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 18:51:07,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: -737.29065 ± 898.583
2025-05-05 18:51:07,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [23.220701, -1818.9758, -70.636635, -1799.2253, -1919.6736, 22.525604, -8.98887, 0.8604566, 7.2091646, -1809.2219]
2025-05-05 18:51:07,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [38.0, 1000.0, 101.0, 1000.0, 1000.0, 38.0, 51.0, 51.0, 47.0, 1000.0]
2025-05-05 18:51:07,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (-737.29) for latency SparseU15
2025-05-05 18:51:07,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 18:51:07,733 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 18:51:07,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 59 minutes, 11 seconds)
2025-05-05 18:54:14,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 18:54:22,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: -201.69685 ± 361.996
2025-05-05 18:54:22,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [-966.49695, -4.068826, -59.128944, -32.035343, -2.6522121, -878.27875, 17.69199, -7.741335, -82.55452, -1.7034671]
2025-05-05 18:54:22,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 90.0, 96.0, 69.0, 60.0, 1000.0, 48.0, 78.0, 132.0, 63.0]
2025-05-05 18:54:22,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (-201.70) for latency SparseU15
2025-05-05 18:54:22,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 18:54:22,458 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 18:54:22,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 7 minutes, 6 seconds)
2025-05-05 18:57:27,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 18:57:32,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: -104.26597 ± 312.628
2025-05-05 18:57:32,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [-17.359804, -1036.9768, 19.780573, 18.62986, -85.51588, 32.73781, -19.905548, 21.514385, 2.9861877, 21.449608]
2025-05-05 18:57:32,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [91.0, 1000.0, 90.0, 54.0, 132.0, 42.0, 66.0, 62.0, 59.0, 45.0]
2025-05-05 18:57:32,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (-104.27) for latency SparseU15
2025-05-05 18:57:32,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 18:57:32,626 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 18:57:32,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 5 minutes, 7 seconds)
2025-05-05 19:00:21,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 19:00:26,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: -122.61770 ± 281.605
2025-05-05 19:00:26,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [32.48426, -52.253487, -134.12256, -121.59131, -15.0429, -951.8052, 18.738111, 6.450498, 5.7371426, -14.771559]
2025-05-05 19:00:26,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [38.0, 111.0, 130.0, 184.0, 69.0, 1000.0, 59.0, 78.0, 87.0, 81.0]
2025-05-05 19:00:26,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 55 minutes, 58 seconds)
2025-05-05 19:03:36,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 19:03:39,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: -18.24967 ± 45.353
2025-05-05 19:03:39,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [-43.812897, 28.733881, -59.322422, -38.56218, 22.505161, 14.585843, 45.461655, -89.06422, 9.9831295, -73.004684]
2025-05-05 19:03:39,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [134.0, 53.0, 123.0, 153.0, 77.0, 58.0, 83.0, 225.0, 84.0, 135.0]
2025-05-05 19:03:39,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (-18.25) for latency SparseU15
2025-05-05 19:03:39,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 19:03:39,602 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 19:03:39,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 55 minutes, 31 seconds)
2025-05-05 19:06:45,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 19:06:52,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: -9.49349 ± 117.056
2025-05-05 19:06:52,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [-351.06308, -13.992305, 53.229206, 57.32226, 52.22737, 4.155167, -3.7077954, 40.40852, 3.5720594, 62.913715]
2025-05-05 19:06:52,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 228.0, 100.0, 145.0, 187.0, 172.0, 155.0, 95.0, 170.0, 163.0]
2025-05-05 19:06:52,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (-9.49) for latency SparseU15
2025-05-05 19:06:52,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 19:06:52,794 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 19:06:52,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 56 minutes, 7 seconds)
2025-05-05 19:09:49,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 19:10:20,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 650.01929 ± 14.791
2025-05-05 19:10:20,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [663.1839, 648.14557, 622.0718, 671.99426, 643.6356, 627.98193, 655.9569, 657.4626, 661.30945, 648.4506]
2025-05-05 19:10:20,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 19:10:20,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (650.02) for latency SparseU15
2025-05-05 19:10:20,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 19:10:20,282 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 19:10:20,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 56 minutes, 55 seconds)
2025-05-05 19:13:26,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 19:13:58,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 728.92139 ± 12.785
2025-05-05 19:13:58,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [725.91016, 735.5694, 737.02423, 713.9329, 719.1966, 745.2957, 752.9821, 712.1381, 721.2799, 725.88544]
2025-05-05 19:13:58,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 19:13:58,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (728.92) for latency SparseU15
2025-05-05 19:13:58,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 19:13:58,112 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 19:13:58,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 2 minutes, 13 seconds)
2025-05-05 19:17:05,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 19:17:36,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 796.06439 ± 8.763
2025-05-05 19:17:36,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [806.9424, 775.7906, 806.18536, 800.3146, 789.92865, 790.16437, 793.45703, 800.25183, 797.03375, 800.57556]
2025-05-05 19:17:36,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 19:17:36,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (796.06) for latency SparseU15
2025-05-05 19:17:36,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 19:17:36,552 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 19:17:36,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 12 minutes, 30 seconds)
2025-05-05 19:20:43,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 19:21:14,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 784.10724 ± 142.276
2025-05-05 19:21:14,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [899.3861, 524.67163, 897.6365, 881.2735, 884.2867, 893.6034, 910.0797, 600.4682, 719.55524, 630.1113]
2025-05-05 19:21:14,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 19:21:14,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 16 minutes, 20 seconds)
2025-05-05 19:24:15,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 19:24:43,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 730.43030 ± 242.116
2025-05-05 19:24:43,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [173.91249, 888.16296, 863.7649, 888.9767, 873.30145, 548.59863, 879.2695, 431.7223, 885.81616, 870.77716]
2025-05-05 19:24:43,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [222.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 19:24:43,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 17 minutes, 44 seconds)
2025-05-05 19:27:46,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 19:28:18,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 870.57550 ± 41.475
2025-05-05 19:28:18,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [883.0699, 893.03687, 886.8286, 880.7458, 871.91907, 883.4305, 901.6554, 882.1303, 874.3968, 748.54193]
2025-05-05 19:28:18,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 19:28:18,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (870.58) for latency SparseU15
2025-05-05 19:28:18,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 19:28:18,511 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 19:28:18,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 16 minutes, 16 seconds)
2025-05-05 19:31:25,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 19:31:49,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 627.97858 ± 350.846
2025-05-05 19:31:49,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [913.8455, 447.0832, 903.6557, 200.3112, 906.8374, 898.5012, 909.42505, 117.53752, 897.1974, 85.3917]
2025-05-05 19:31:49,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 337.0, 1000.0, 1000.0, 1000.0, 160.0, 1000.0, 323.0]
2025-05-05 19:31:49,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 10 minutes, 41 seconds)
2025-05-05 19:34:54,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 19:35:17,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 594.67426 ± 354.761
2025-05-05 19:35:17,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [908.14685, 861.04584, 876.07446, 887.7773, 882.85645, 883.74, 129.60669, 125.20175, 236.7066, 155.5868]
2025-05-05 19:35:17,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 202.0, 143.0, 436.0, 463.0]
2025-05-05 19:35:17,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 4 minutes, 1 second)
2025-05-05 19:38:32,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 19:39:04,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 870.71368 ± 8.402
2025-05-05 19:39:04,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [868.69196, 872.9696, 873.0116, 877.7716, 877.13184, 851.7446, 882.2145, 872.45276, 870.6984, 860.45026]
2025-05-05 19:39:04,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 19:39:04,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (870.71) for latency SparseU15
2025-05-05 19:39:04,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 19:39:04,521 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 19:39:04,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 3 minutes, 17 seconds)
2025-05-05 19:42:14,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 19:42:40,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 739.33356 ± 294.248
2025-05-05 19:42:40,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [170.31284, 132.21046, 897.8101, 878.0051, 886.3575, 888.1488, 898.41754, 886.65515, 874.5281, 880.89]
2025-05-05 19:42:40,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [207.0, 196.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 19:42:40,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 1 minute, 30 seconds)
2025-05-05 19:45:48,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 19:46:20,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 863.60675 ± 79.961
2025-05-05 19:46:20,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [900.3115, 887.2683, 888.2088, 888.16797, 900.41235, 624.62885, 875.0426, 887.6434, 889.5074, 894.87646]
2025-05-05 19:46:20,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 19:46:20,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 59 minutes, 16 seconds)
2025-05-05 19:49:25,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 19:49:56,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 891.48260 ± 10.247
2025-05-05 19:49:56,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [906.93115, 887.7011, 880.9165, 882.3609, 889.25287, 902.38104, 887.2756, 904.97766, 897.2235, 875.8058]
2025-05-05 19:49:56,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 19:49:56,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (891.48) for latency SparseU15
2025-05-05 19:49:56,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 19:49:56,886 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 19:49:56,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 57 minutes, 13 seconds)
2025-05-05 19:52:49,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 19:53:18,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 799.07458 ± 268.129
2025-05-05 19:53:18,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [937.9878, 913.38586, 917.8022, 909.30347, 932.92804, 550.978, 66.58276, 923.04926, 919.6792, 919.0487]
2025-05-05 19:53:18,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 64.0, 1000.0, 1000.0, 1000.0]
2025-05-05 19:53:18,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 51 minutes, 55 seconds)
2025-05-05 19:56:22,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 19:56:54,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 850.54767 ± 33.881
2025-05-05 19:56:54,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [759.2464, 882.634, 850.082, 828.75305, 855.3723, 860.93524, 851.8629, 878.9589, 864.9018, 872.73047]
2025-05-05 19:56:54,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 19:56:54,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 45 minutes, 17 seconds)
2025-05-05 20:00:00,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 20:00:26,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 731.47900 ± 329.673
2025-05-05 20:00:26,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [85.84711, 880.3876, 887.31793, 887.7522, 886.12177, 894.8569, 912.50464, 907.4111, 913.24097, 59.34951]
2025-05-05 20:00:26,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [189.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 72.0]
2025-05-05 20:00:26,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 40 minutes, 45 seconds)
2025-05-05 20:03:41,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 20:04:12,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 885.58679 ± 7.463
2025-05-05 20:04:12,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [894.8195, 882.28986, 891.9693, 881.38837, 889.39667, 885.778, 886.0768, 866.8645, 891.688, 885.59686]
2025-05-05 20:04:12,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 20:04:13,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 38 minutes, 55 seconds)
2025-05-05 20:07:10,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 20:07:40,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 848.78204 ± 154.272
2025-05-05 20:07:40,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [900.1155, 890.2501, 916.09607, 889.33405, 902.2272, 893.9508, 898.69604, 909.6506, 386.55606, 900.9434]
2025-05-05 20:07:40,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 574.0, 1000.0]
2025-05-05 20:07:40,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 32 minutes, 55 seconds)
2025-05-05 20:10:44,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 20:11:05,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 557.67645 ± 363.043
2025-05-05 20:11:05,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [282.32565, 920.5006, 308.4928, 172.1032, 919.24854, 908.4324, 915.2119, 916.39984, 141.10184, 92.9479]
2025-05-05 20:11:05,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [465.0, 1000.0, 426.0, 235.0, 1000.0, 1000.0, 1000.0, 1000.0, 465.0, 98.0]
2025-05-05 20:11:05,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 30 minutes, 19 seconds)
2025-05-05 20:14:13,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 20:14:44,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 898.70087 ± 5.800
2025-05-05 20:14:44,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [907.8174, 894.13654, 900.12103, 903.1163, 892.91754, 896.0719, 908.1047, 897.81647, 889.82733, 897.08]
2025-05-05 20:14:44,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 20:14:44,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (898.70) for latency SparseU15
2025-05-05 20:14:44,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 20:14:44,829 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 20:14:44,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 27 minutes, 37 seconds)
2025-05-05 20:17:50,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 20:18:21,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 887.72229 ± 16.146
2025-05-05 20:18:21,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [896.54926, 898.4985, 872.2746, 910.48865, 897.44653, 860.4933, 870.35016, 882.81415, 908.49524, 879.81287]
2025-05-05 20:18:21,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 20:18:21,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 25 minutes, 12 seconds)
2025-05-05 20:21:28,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 20:21:55,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 779.78986 ± 280.506
2025-05-05 20:21:55,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [921.7779, 178.19199, 913.5634, 918.5256, 261.92606, 917.9241, 924.8442, 918.0061, 920.4894, 922.64984]
2025-05-05 20:21:55,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 253.0, 1000.0, 1000.0, 418.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 20:21:55,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 18 minutes, 32 seconds)
2025-05-05 20:25:04,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 20:25:34,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 851.72302 ± 129.399
2025-05-05 20:25:34,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [901.6457, 897.6874, 901.7385, 897.94617, 927.5654, 901.1575, 900.9795, 469.70044, 886.9486, 831.8611]
2025-05-05 20:25:34,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 585.0, 1000.0, 1000.0]
2025-05-05 20:25:34,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 17 minutes, 55 seconds)
2025-05-05 20:28:33,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 20:28:54,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 562.44012 ± 342.381
2025-05-05 20:28:54,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [227.47017, 693.3576, 925.8043, 924.09216, 578.7242, 166.74008, 147.23335, 120.82494, 920.758, 919.39685]
2025-05-05 20:28:54,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [464.0, 1000.0, 1000.0, 1000.0, 1000.0, 158.0, 128.0, 95.0, 1000.0, 1000.0]
2025-05-05 20:28:54,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 13 minutes, 2 seconds)
2025-05-05 20:32:13,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 20:32:45,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 887.82434 ± 47.085
2025-05-05 20:32:45,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [912.68915, 911.5651, 901.30865, 748.03467, 892.2994, 903.3018, 911.021, 900.66895, 892.8993, 904.4553]
2025-05-05 20:32:45,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 20:32:45,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 12 minutes, 3 seconds)
2025-05-05 20:35:42,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 20:36:09,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 753.91473 ± 287.236
2025-05-05 20:36:09,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [628.54767, 779.26514, 220.57503, 960.0413, 212.4311, 945.3045, 965.53076, 957.141, 927.08746, 943.2235]
2025-05-05 20:36:09,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 322.0, 1000.0, 208.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 20:36:09,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 5 minutes, 27 seconds)
2025-05-05 20:39:15,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 20:39:45,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 890.34149 ± 169.734
2025-05-05 20:39:45,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [955.11676, 939.34937, 946.95245, 947.47125, 932.79346, 956.71155, 941.09375, 945.8176, 381.6234, 956.4854]
2025-05-05 20:39:45,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 553.0, 1000.0]
2025-05-05 20:39:45,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 2 minutes, 36 seconds)
2025-05-05 20:42:51,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 20:43:15,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 644.16614 ± 374.652
2025-05-05 20:43:15,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [947.4222, 926.7571, 760.6075, 928.45135, 959.6762, 744.96875, 47.730556, 57.60477, 923.94604, 144.49724]
2025-05-05 20:43:15,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 62.0, 58.0, 1000.0, 273.0]
2025-05-05 20:43:15,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 56 minutes, 46 seconds)
2025-05-05 20:46:23,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 20:46:36,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 376.91489 ± 371.687
2025-05-05 20:46:36,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [43.195812, 44.279083, 100.04041, 103.832115, 78.3426, 933.9766, 277.65436, 907.32684, 939.34705, 341.15427]
2025-05-05 20:46:36,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [48.0, 43.0, 97.0, 88.0, 76.0, 1000.0, 371.0, 1000.0, 1000.0, 434.0]
2025-05-05 20:46:36,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 53 minutes, 35 seconds)
2025-05-05 20:49:46,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 20:50:09,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 690.39282 ± 364.944
2025-05-05 20:50:09,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [913.4653, 267.77347, 929.1253, 942.92114, 945.2555, 960.7645, 973.54224, 814.14056, 66.95294, 89.986595]
2025-05-05 20:50:09,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 256.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 63.0, 88.0]
2025-05-05 20:50:09,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 46 minutes, 21 seconds)
2025-05-05 20:53:19,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 20:53:41,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 595.78741 ± 364.008
2025-05-05 20:53:41,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [733.82227, 59.98946, 911.3843, 842.0117, 777.369, 912.18994, 48.48588, 46.20625, 691.85583, 934.55927]
2025-05-05 20:53:41,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 55.0, 1000.0, 1000.0, 1000.0, 1000.0, 50.0, 47.0, 1000.0, 1000.0]
2025-05-05 20:53:41,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 44 minutes, 30 seconds)
2025-05-05 20:56:27,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 20:56:45,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 500.03345 ± 334.802
2025-05-05 20:56:45,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [321.78253, 609.2232, 794.3681, 975.1185, 51.83405, 869.7093, 99.65218, 817.7517, 332.29224, 128.6028]
2025-05-05 20:56:45,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [339.0, 664.0, 1000.0, 1000.0, 45.0, 1000.0, 120.0, 1000.0, 397.0, 113.0]
2025-05-05 20:56:45,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 34 minutes, 2 seconds)
2025-05-05 20:59:59,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 21:00:18,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 587.96667 ± 378.090
2025-05-05 21:00:18,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [947.73285, 171.29218, 946.81067, 954.1976, 956.2135, 264.70316, 966.4419, 455.0067, 69.71179, 147.55617]
2025-05-05 21:00:18,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 159.0, 1000.0, 1000.0, 1000.0, 203.0, 1000.0, 477.0, 64.0, 115.0]
2025-05-05 21:00:18,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 31 minutes, 26 seconds)
2025-05-05 21:03:16,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 21:03:36,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 610.35803 ± 369.044
2025-05-05 21:03:36,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [875.8926, 54.622185, 294.71896, 990.48517, 814.8337, 761.6528, 179.91916, 184.27715, 826.8593, 1120.32]
2025-05-05 21:03:36,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 51.0, 275.0, 1000.0, 1000.0, 1000.0, 150.0, 145.0, 1000.0, 1000.0]
2025-05-05 21:03:36,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 27 minutes, 24 seconds)
2025-05-05 21:06:51,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 21:07:20,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 888.84650 ± 226.652
2025-05-05 21:07:20,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [215.17757, 950.9259, 925.27606, 979.98517, 988.071, 955.10815, 960.24066, 944.8723, 931.213, 1037.5946]
2025-05-05 21:07:20,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [175.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 21:07:20,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 26 minutes, 9 seconds)
2025-05-05 21:10:28,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 21:10:49,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 695.89685 ± 399.322
2025-05-05 21:10:49,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [915.97797, 986.8639, 959.6808, 1048.5359, 74.335075, 1032.2377, 1128.5919, 292.6285, 121.6339, 398.4835]
2025-05-05 21:10:49,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 66.0, 1000.0, 1000.0, 212.0, 118.0, 410.0]
2025-05-05 21:10:49,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 22 minutes, 8 seconds)
2025-05-05 21:13:43,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 21:13:54,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 430.10406 ± 421.496
2025-05-05 21:13:54,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [347.85565, 156.11723, 54.58386, 128.21375, 752.84827, 170.93048, 1192.2455, 105.989136, 1177.6969, 214.55975]
2025-05-05 21:13:54,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [399.0, 135.0, 57.0, 105.0, 620.0, 140.0, 1000.0, 86.0, 1000.0, 158.0]
2025-05-05 21:13:54,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 18 minutes, 58 seconds)
2025-05-05 21:17:07,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 21:17:21,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 529.03320 ± 429.209
2025-05-05 21:17:21,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1040.6946, 78.10055, 60.556606, 194.37138, 1078.1633, 1189.2827, 767.8282, 316.3804, 77.17213, 487.78183]
2025-05-05 21:17:21,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [853.0, 69.0, 55.0, 153.0, 921.0, 1000.0, 634.0, 294.0, 54.0, 473.0]
2025-05-05 21:17:21,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 14 minutes, 20 seconds)
2025-05-05 21:20:14,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 21:20:37,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 932.26038 ± 407.330
2025-05-05 21:20:37,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [433.3003, 825.8353, 449.63452, 611.47064, 1411.5479, 394.33908, 1248.3827, 1347.0026, 1320.1459, 1280.9446]
2025-05-05 21:20:37,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [319.0, 1000.0, 418.0, 477.0, 1000.0, 330.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 21:20:37,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (932.26) for latency SparseU15
2025-05-05 21:20:37,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 21:20:37,280 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 21:20:37,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 10 minutes, 33 seconds)
2025-05-05 21:23:53,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 21:24:10,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 797.10950 ± 504.025
2025-05-05 21:24:10,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [947.117, 301.88467, 1320.5897, 362.8219, 128.51631, 1320.9187, 1382.7607, 1346.9447, 724.2662, 135.27548]
2025-05-05 21:24:10,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [676.0, 269.0, 1000.0, 276.0, 108.0, 1000.0, 1000.0, 1000.0, 525.0, 102.0]
2025-05-05 21:24:10,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 5 minutes, 14 seconds)
2025-05-05 21:27:00,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 21:27:22,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 996.66779 ± 342.314
2025-05-05 21:27:22,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [936.2502, 990.0244, 741.6842, 1379.8231, 309.37607, 1482.2849, 1386.0354, 1125.5697, 742.52655, 873.10333]
2025-05-05 21:27:22,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 689.0, 476.0, 1000.0, 264.0, 1000.0, 1000.0, 1000.0, 522.0, 552.0]
2025-05-05 21:27:22,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (996.67) for latency SparseU15
2025-05-05 21:27:22,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 21:27:22,622 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 21:27:22,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 58 minutes, 46 seconds)
2025-05-05 21:30:26,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 21:30:44,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 802.64636 ± 408.632
2025-05-05 21:30:44,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1515.3954, 728.72626, 1004.7838, 52.753323, 893.65784, 926.289, 237.54636, 945.6807, 558.3345, 1163.2965]
2025-05-05 21:30:44,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 493.0, 649.0, 48.0, 595.0, 751.0, 211.0, 1000.0, 451.0, 714.0]
2025-05-05 21:30:44,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 58 minutes, 25 seconds)
2025-05-05 21:33:50,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 21:34:15,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1303.97437 ± 403.377
2025-05-05 21:34:15,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1611.8462, 394.60776, 1085.4625, 1629.1821, 1262.7234, 1580.1589, 1359.806, 1687.709, 822.0815, 1606.1659]
2025-05-05 21:34:15,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 232.0, 1000.0, 1000.0, 753.0, 1000.0, 828.0, 1000.0, 490.0, 1000.0]
2025-05-05 21:34:15,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (1303.97) for latency SparseU15
2025-05-05 21:34:15,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 21:34:15,338 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 21:34:15,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 55 minutes, 48 seconds)
2025-05-05 21:37:12,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 21:37:33,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1091.72156 ± 488.838
2025-05-05 21:37:33,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1463.7396, 1130.4248, 517.8442, 1172.5483, 1593.7253, 1339.2665, 1377.9194, 192.29912, 469.64084, 1659.8077]
2025-05-05 21:37:33,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 647.0, 327.0, 724.0, 1000.0, 1000.0, 854.0, 130.0, 323.0, 1000.0]
2025-05-05 21:37:33,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 52 minutes, 45 seconds)
2025-05-05 21:40:35,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 21:40:53,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 994.87048 ± 637.780
2025-05-05 21:40:53,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1774.825, 1101.5597, 1828.7933, 1794.6582, 1008.3251, 717.84265, 165.76253, 227.30371, 144.49773, 1185.1368]
2025-05-05 21:40:53,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 678.0, 1000.0, 1000.0, 624.0, 468.0, 118.0, 161.0, 106.0, 760.0]
2025-05-05 21:40:53,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 47 minutes, 2 seconds)
2025-05-05 21:44:09,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 21:44:22,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 646.43787 ± 602.117
2025-05-05 21:44:22,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [150.16182, 1562.5186, 547.5746, 1647.5063, 35.49753, 670.47534, 119.98285, 267.4712, 1331.1564, 132.03435]
2025-05-05 21:44:22,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [103.0, 1000.0, 436.0, 1000.0, 49.0, 457.0, 97.0, 171.0, 796.0, 95.0]
2025-05-05 21:44:22,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 46 minutes, 31 seconds)
2025-05-05 21:47:23,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 21:47:51,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1328.39587 ± 451.010
2025-05-05 21:47:51,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [954.6386, 1022.37634, 297.18622, 1652.5654, 1618.8717, 1642.4419, 1677.0062, 1480.7374, 1833.0459, 1105.0891]
2025-05-05 21:47:51,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 228.0, 1000.0, 1000.0, 1000.0, 1000.0, 801.0, 1000.0, 1000.0]
2025-05-05 21:47:51,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (1328.40) for latency SparseU15
2025-05-05 21:47:51,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 21:47:51,105 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 21:47:51,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 44 minutes, 18 seconds)
2025-05-05 21:50:48,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 21:51:08,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1150.62231 ± 681.341
2025-05-05 21:51:08,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1740.6631, 1636.8903, 1715.8223, 364.30646, 1802.6089, 72.48655, 1047.4608, 1295.1731, 1749.1476, 81.664986]
2025-05-05 21:51:08,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 931.0, 1000.0, 238.0, 1000.0, 61.0, 653.0, 745.0, 1000.0, 64.0]
2025-05-05 21:51:08,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 38 minutes, 42 seconds)
2025-05-05 21:54:16,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 21:54:42,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1457.90625 ± 378.797
2025-05-05 21:54:42,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1748.9756, 787.79614, 1709.8593, 1501.9735, 1799.8447, 1703.792, 1588.7417, 1762.5282, 757.1911, 1218.3594]
2025-05-05 21:54:42,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 459.0, 1000.0, 966.0, 1000.0, 1000.0, 1000.0, 1000.0, 464.0, 666.0]
2025-05-05 21:54:42,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (1457.91) for latency SparseU15
2025-05-05 21:54:42,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 21:54:42,398 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 21:54:42,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 37 minutes, 45 seconds)
2025-05-05 21:57:40,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 21:58:00,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1272.98267 ± 787.265
2025-05-05 21:58:00,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [360.2681, 508.27658, 2073.0293, 2001.451, 1188.3644, 2118.6162, 2062.5913, 430.1852, 1829.3022, 157.74342]
2025-05-05 21:58:00,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [203.0, 322.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 194.0, 1000.0, 93.0]
2025-05-05 21:58:00,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 34 minutes, 8 seconds)
2025-05-05 22:01:04,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 22:01:27,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1403.13782 ± 550.232
2025-05-05 22:01:27,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1008.0596, 903.38544, 2004.6782, 913.8824, 1948.5721, 1914.2064, 1829.0876, 389.52078, 1240.8234, 1879.1625]
2025-05-05 22:01:27,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [478.0, 489.0, 1000.0, 507.0, 1000.0, 1000.0, 1000.0, 256.0, 1000.0, 1000.0]
2025-05-05 22:01:27,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 30 minutes, 18 seconds)
2025-05-05 22:04:41,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 22:05:06,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1424.46411 ± 543.634
2025-05-05 22:05:06,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1413.0862, 1186.8278, 1317.0605, 55.026703, 1741.8845, 1940.9934, 1763.1434, 1711.8334, 1103.6874, 2011.0988]
2025-05-05 22:05:06,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [853.0, 1000.0, 612.0, 56.0, 1000.0, 1000.0, 1000.0, 874.0, 1000.0, 1000.0]
2025-05-05 22:05:06,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 28 minutes, 26 seconds)
2025-05-05 22:08:04,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 22:08:25,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1283.68823 ± 557.590
2025-05-05 22:08:25,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [564.0314, 756.4383, 756.8924, 1097.9543, 1866.9678, 1889.8915, 1832.2214, 1687.9166, 554.9222, 1829.6461]
2025-05-05 22:08:25,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [350.0, 409.0, 407.0, 525.0, 1000.0, 1000.0, 1000.0, 1000.0, 430.0, 1000.0]
2025-05-05 22:08:25,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 25 minutes, 12 seconds)
2025-05-05 22:11:29,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 22:11:46,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 948.23877 ± 683.545
2025-05-05 22:11:46,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [199.73933, 244.08847, 1886.2734, 1504.3503, 1471.5259, 301.22076, 422.4827, 1927.9897, 299.1091, 1225.6088]
2025-05-05 22:11:46,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [110.0, 142.0, 1000.0, 1000.0, 1000.0, 177.0, 186.0, 1000.0, 142.0, 1000.0]
2025-05-05 22:11:46,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 19 minutes, 56 seconds)
2025-05-05 22:14:48,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 22:15:13,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1590.36707 ± 394.860
2025-05-05 22:15:13,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1253.0566, 1112.1716, 1945.7192, 1415.7515, 940.11633, 1414.0918, 2147.7078, 1841.2058, 1818.9343, 2014.916]
2025-05-05 22:15:13,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [663.0, 560.0, 1000.0, 721.0, 1000.0, 643.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 22:15:13,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (1590.37) for latency SparseU15
2025-05-05 22:15:13,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 22:15:13,845 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 22:15:13,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 17 minutes, 44 seconds)
2025-05-05 22:18:26,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 22:18:47,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1060.40356 ± 627.516
2025-05-05 22:18:47,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [256.61472, 2036.4594, 969.60095, 955.19806, 451.07935, 2064.0957, 1435.8783, 1400.6388, 734.2763, 300.19342]
2025-05-05 22:18:47,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [207.0, 1000.0, 1000.0, 717.0, 290.0, 1000.0, 1000.0, 656.0, 1000.0, 177.0]
2025-05-05 22:18:47,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 15 minutes, 15 seconds)
2025-05-05 22:21:43,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 22:22:13,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2009.89771 ± 94.698
2025-05-05 22:22:13,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2161.7869, 2033.095, 2027.1946, 2022.0326, 1832.3918, 1973.1497, 1975.8376, 1885.4673, 2125.3987, 2062.6204]
2025-05-05 22:22:13,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 22:22:13,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (2009.90) for latency SparseU15
2025-05-05 22:22:13,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 22:22:13,667 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 22:22:13,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 10 minutes, 4 seconds)
2025-05-05 22:25:11,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 22:25:29,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1360.00110 ± 885.917
2025-05-05 22:25:29,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2264.5544, 71.28065, 2205.852, 638.3905, 2372.6653, 1977.5631, 2127.4385, 1282.7495, 263.8847, 395.63205]
2025-05-05 22:25:29,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 64.0, 1000.0, 316.0, 1000.0, 1000.0, 1000.0, 584.0, 134.0, 174.0]
2025-05-05 22:25:29,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 6 minutes, 19 seconds)
2025-05-05 22:28:39,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 22:29:05,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1744.40979 ± 592.079
2025-05-05 22:29:05,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1745.266, 1722.4515, 2128.18, 2181.235, 2336.4883, 2351.5254, 1296.2025, 2157.2673, 1035.8387, 489.64243]
2025-05-05 22:29:05,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 891.0, 1000.0, 1000.0, 1000.0, 1000.0, 557.0, 1000.0, 1000.0, 211.0]
2025-05-05 22:29:05,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 4 minutes, 42 seconds)
2025-05-05 22:32:08,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 22:32:17,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 694.11951 ± 679.456
2025-05-05 22:32:17,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [450.98788, 726.2862, 395.70267, 98.995735, 146.15063, 2480.621, 452.62704, 208.25932, 716.02423, 1265.54]
2025-05-05 22:32:17,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [208.0, 408.0, 241.0, 79.0, 84.0, 1000.0, 188.0, 130.0, 396.0, 565.0]
2025-05-05 22:32:17,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 59 minutes, 27 seconds)
2025-05-05 22:35:16,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 22:35:33,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1048.33337 ± 881.947
2025-05-05 22:35:33,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2465.8616, 568.83484, 776.39545, 129.7521, 156.50954, 2338.9592, 519.4308, 194.94377, 1225.6666, 2106.9802]
2025-05-05 22:35:33,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 356.0, 93.0, 86.0, 1000.0, 229.0, 111.0, 652.0, 1000.0]
2025-05-05 22:35:33,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 53 minutes, 59 seconds)
2025-05-05 22:38:49,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 22:39:11,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1424.91650 ± 839.993
2025-05-05 22:39:11,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2260.2761, 138.69653, 949.6769, 906.94025, 394.13553, 2285.8374, 2530.3862, 1838.9203, 762.24384, 2182.052]
2025-05-05 22:39:11,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 81.0, 1000.0, 1000.0, 182.0, 1000.0, 1000.0, 827.0, 339.0, 1000.0]
2025-05-05 22:39:11,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 51 minutes, 58 seconds)
2025-05-05 22:41:57,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 22:42:14,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1257.91846 ± 703.321
2025-05-05 22:42:14,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2259.331, 787.8222, 1640.1875, 312.07898, 1979.6061, 950.0376, 1922.8374, 866.93665, 137.5195, 1722.8271]
2025-05-05 22:42:14,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 433.0, 725.0, 143.0, 1000.0, 477.0, 993.0, 302.0, 83.0, 808.0]
2025-05-05 22:42:14,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 47 minutes, 13 seconds)
2025-05-05 22:45:27,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 22:45:53,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1863.07446 ± 441.812
2025-05-05 22:45:53,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [898.3807, 2271.8997, 2154.2715, 2013.8068, 2156.3542, 2104.5386, 1456.1272, 1978.4524, 2254.035, 1342.8794]
2025-05-05 22:45:53,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [464.0, 1000.0, 899.0, 1000.0, 1000.0, 1000.0, 665.0, 1000.0, 1000.0, 659.0]
2025-05-05 22:45:53,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 44 minutes, 9 seconds)
2025-05-05 22:48:53,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 22:49:20,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2005.19495 ± 746.754
2025-05-05 22:49:20,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2441.146, 793.6465, 2461.8032, 697.1979, 2578.2126, 1168.6064, 2586.5488, 2641.1516, 2372.5847, 2311.0525]
2025-05-05 22:49:20,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 420.0, 1000.0, 370.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 22:49:20,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 42 minutes, 14 seconds)
2025-05-05 22:52:21,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 22:52:43,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1694.08374 ± 893.119
2025-05-05 22:52:43,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [745.00195, 2638.275, 2389.5444, 2403.0574, 2152.0793, 281.59442, 2039.4255, 126.59667, 2322.379, 1842.885]
2025-05-05 22:52:43,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [329.0, 1000.0, 1000.0, 1000.0, 1000.0, 213.0, 1000.0, 80.0, 1000.0, 731.0]
2025-05-05 22:52:43,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 39 minutes, 35 seconds)
2025-05-05 22:55:55,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 22:56:23,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2046.92810 ± 634.225
2025-05-05 22:56:23,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2314.4043, 2142.1323, 2230.3777, 2331.2646, 2017.1011, 2539.8584, 194.07674, 2222.858, 2087.038, 2390.1682]
2025-05-05 22:56:23,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 112.0, 1000.0, 1000.0, 1000.0]
2025-05-05 22:56:23,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (2046.93) for latency SparseU15
2025-05-05 22:56:23,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 22:56:23,355 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 22:56:23,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 36 minutes, 17 seconds)
2025-05-05 22:59:17,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 22:59:39,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1685.04517 ± 806.790
2025-05-05 22:59:39,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2094.9094, 1868.8263, 2046.5676, 2242.2454, 141.69148, 2405.0176, 1159.1249, 318.51086, 2507.7905, 2065.768]
2025-05-05 22:59:39,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 846.0, 1000.0, 1000.0, 70.0, 1000.0, 588.0, 151.0, 1000.0, 878.0]
2025-05-05 22:59:39,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 34 minutes)
2025-05-05 23:02:42,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 23:03:05,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1759.28809 ± 806.825
2025-05-05 23:03:05,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2162.9019, 483.70422, 231.15744, 2613.031, 2410.4695, 2288.9639, 1600.5756, 2297.508, 2316.2092, 1188.3601]
2025-05-05 23:03:05,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 219.0, 114.0, 1000.0, 1000.0, 1000.0, 730.0, 1000.0, 1000.0, 578.0]
2025-05-05 23:03:05,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 29 minutes, 24 seconds)
2025-05-05 23:06:09,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 23:06:30,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1591.70654 ± 731.152
2025-05-05 23:06:30,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1090.6669, 2390.3425, 840.99677, 2524.243, 2156.149, 1475.0662, 302.31442, 2095.7446, 902.33167, 2139.2097]
2025-05-05 23:06:30,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [502.0, 1000.0, 397.0, 1000.0, 1000.0, 635.0, 149.0, 1000.0, 391.0, 1000.0]
2025-05-05 23:06:30,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 25 minutes, 48 seconds)
2025-05-05 23:09:33,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 23:09:51,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1292.23230 ± 731.763
2025-05-05 23:09:51,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1705.0555, 752.75446, 2458.5784, 354.99905, 959.1271, 1124.8148, 1183.0803, 195.2366, 2012.188, 2176.4893]
2025-05-05 23:09:51,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [684.0, 402.0, 1000.0, 192.0, 360.0, 480.0, 1000.0, 110.0, 965.0, 1000.0]
2025-05-05 23:09:51,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 22 minutes, 15 seconds)
2025-05-05 23:12:59,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 23:13:29,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2329.81494 ± 163.071
2025-05-05 23:13:29,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2181.298, 2534.084, 2184.4368, 2392.2065, 2018.4199, 2498.321, 2424.323, 2325.24, 2510.5916, 2229.2292]
2025-05-05 23:13:29,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 23:13:29,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (2329.81) for latency SparseU15
2025-05-05 23:13:29,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 23:13:29,475 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 23:13:29,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 18 minutes, 40 seconds)
2025-05-05 23:16:32,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 23:16:56,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1853.35474 ± 660.912
2025-05-05 23:16:56,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2451.4927, 2195.699, 1119.983, 1275.7908, 2349.7732, 2336.3694, 2351.2654, 1835.9708, 2230.61, 386.59305]
2025-05-05 23:16:56,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 456.0, 503.0, 1000.0, 1000.0, 1000.0, 767.0, 1000.0, 175.0]
2025-05-05 23:16:56,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 16 minutes, 1 second)
2025-05-05 23:20:11,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 23:20:38,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2011.59705 ± 568.154
2025-05-05 23:20:38,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2376.8909, 1166.2866, 2285.2952, 2414.372, 959.10364, 2333.4004, 2373.3474, 2397.8071, 1344.0646, 2465.4019]
2025-05-05 23:20:38,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 498.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 583.0, 1000.0]
2025-05-05 23:20:38,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 13 minutes, 43 seconds)
2025-05-05 23:23:31,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 23:23:59,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2142.83838 ± 539.816
2025-05-05 23:23:59,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2213.8125, 558.46655, 2352.4126, 2440.8997, 2488.9539, 2364.1768, 2291.925, 2204.2854, 2106.3127, 2407.1372]
2025-05-05 23:23:59,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 271.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 23:23:59,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 9 minutes, 56 seconds)
2025-05-05 23:27:06,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 23:27:32,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2116.66162 ± 549.558
2025-05-05 23:27:32,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2411.91, 2340.6125, 2425.3694, 2409.7188, 2029.6606, 2553.718, 1504.545, 2485.8118, 2289.406, 715.8616]
2025-05-05 23:27:32,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 910.0, 933.0, 1000.0, 1000.0, 1000.0, 615.0, 1000.0, 1000.0, 277.0]
2025-05-05 23:27:32,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 7 minutes, 12 seconds)
2025-05-05 23:30:35,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 23:31:03,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2043.80151 ± 553.802
2025-05-05 23:31:03,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2459.9846, 2332.7175, 2527.0886, 1197.3181, 1203.2197, 2263.9297, 2242.306, 1321.0145, 2777.677, 2112.7605]
2025-05-05 23:31:03,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 502.0, 1000.0, 1000.0, 645.0, 1000.0, 952.0]
2025-05-05 23:31:03,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 3 minutes, 12 seconds)
2025-05-05 23:34:10,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 23:34:33,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1918.61621 ± 630.950
2025-05-05 23:34:33,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1054.2424, 1291.478, 1178.4819, 2085.5942, 1175.9346, 2475.5264, 2674.0115, 2209.8972, 2412.9316, 2628.0637]
2025-05-05 23:34:33,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [425.0, 535.0, 507.0, 1000.0, 479.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 23:34:33,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 84/100 (estimated time remaining: 59 minutes, 55 seconds)
2025-05-05 23:37:40,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 23:38:08,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2152.61670 ± 669.208
2025-05-05 23:38:08,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2576.132, 2169.6135, 2137.9153, 186.54031, 2335.8962, 2439.2817, 2329.227, 2403.7153, 2382.8862, 2564.9607]
2025-05-05 23:38:08,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 119.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 23:38:08,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 85/100 (estimated time remaining: 55 minutes, 58 seconds)
2025-05-05 23:40:58,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 23:41:23,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1937.92383 ± 747.386
2025-05-05 23:41:23,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2421.2976, 2473.7686, 2355.668, 2389.7842, 1745.5361, 2312.2354, 574.20374, 2139.3796, 435.76138, 2531.6035]
2025-05-05 23:41:23,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 674.0, 1000.0, 259.0, 1000.0, 221.0, 1000.0]
2025-05-05 23:41:23,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 86/100 (estimated time remaining: 52 minutes, 11 seconds)
2025-05-05 23:44:26,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 23:44:53,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2195.58740 ± 481.033
2025-05-05 23:44:53,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1921.9747, 1009.99054, 2648.1296, 2109.9197, 2296.5276, 2523.065, 2637.711, 2519.547, 2459.3196, 1829.688]
2025-05-05 23:44:53,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 394.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 755.0]
2025-05-05 23:44:53,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 87/100 (estimated time remaining: 48 minutes, 34 seconds)
2025-05-05 23:48:04,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 23:48:34,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2421.32886 ± 102.637
2025-05-05 23:48:34,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2187.2349, 2543.993, 2422.8032, 2392.461, 2454.894, 2546.164, 2324.865, 2509.1985, 2395.039, 2436.6377]
2025-05-05 23:48:34,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [952.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-05 23:48:34,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (2421.33) for latency SparseU15
2025-05-05 23:48:34,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-05 23:48:34,551 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-05 23:48:34,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 88/100 (estimated time remaining: 45 minutes, 33 seconds)
2025-05-05 23:51:42,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 23:52:06,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1956.95837 ± 869.370
2025-05-05 23:52:06,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2475.1262, 2525.2693, 1832.8867, 2601.744, 2541.8777, 312.1926, 2413.5005, 222.5418, 2370.073, 2274.3733]
2025-05-05 23:52:06,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 774.0, 1000.0, 1000.0, 140.0, 1000.0, 121.0, 1000.0, 1000.0]
2025-05-05 23:52:06,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 89/100 (estimated time remaining: 42 minutes, 6 seconds)
2025-05-05 23:55:02,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 23:55:28,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1992.26099 ± 651.534
2025-05-05 23:55:28,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2501.9263, 2380.689, 2103.4158, 810.25244, 2580.6853, 2163.4429, 1229.362, 2715.2632, 2349.2507, 1088.3251]
2025-05-05 23:55:28,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 871.0, 341.0, 1000.0, 1000.0, 515.0, 1000.0, 1000.0, 1000.0]
2025-05-05 23:55:28,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 90/100 (estimated time remaining: 38 minutes, 9 seconds)
2025-05-05 23:58:42,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-05 23:59:03,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1790.38538 ± 786.786
2025-05-05 23:59:03,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2368.4524, 2639.4316, 2245.4648, 2545.0305, 263.9524, 2362.6716, 1845.845, 1889.6576, 881.48846, 861.8602]
2025-05-05 23:59:03,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 118.0, 1000.0, 692.0, 765.0, 419.0, 344.0]
2025-05-05 23:59:03,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 91/100 (estimated time remaining: 35 minutes, 21 seconds)
2025-05-06 00:01:51,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 00:02:09,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1460.72229 ± 851.322
2025-05-06 00:02:09,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [757.4948, 2447.1062, 2370.6812, 497.7212, 1286.0974, 2103.3708, 216.178, 604.1607, 2558.6719, 1765.7397]
2025-05-06 00:02:09,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [305.0, 1000.0, 964.0, 199.0, 532.0, 1000.0, 102.0, 250.0, 1000.0, 794.0]
2025-05-06 00:02:09,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 92/100 (estimated time remaining: 31 minutes, 4 seconds)
2025-05-06 00:05:25,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 00:05:55,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2468.09155 ± 136.441
2025-05-06 00:05:55,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2507.334, 2281.3367, 2457.1287, 2752.7117, 2398.0068, 2473.5193, 2385.804, 2477.9568, 2305.1152, 2642.002]
2025-05-06 00:05:55,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 00:05:55,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (2468.09) for latency SparseU15
2025-05-06 00:05:55,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-06 00:05:55,778 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 00:05:55,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 93/100 (estimated time remaining: 27 minutes, 45 seconds)
2025-05-06 00:08:50,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 00:09:18,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2002.98999 ± 706.701
2025-05-06 00:09:18,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2461.5479, 2097.2776, 2185.8816, 2354.5808, 2169.956, 2448.8848, 2469.6433, 2587.6987, 779.1599, 475.27158]
2025-05-06 00:09:18,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 202.0]
2025-05-06 00:09:18,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 94/100 (estimated time remaining: 24 minutes, 3 seconds)
2025-05-06 00:12:24,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 00:12:44,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1730.50720 ± 802.523
2025-05-06 00:12:44,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [1470.3059, 2482.913, 1294.7435, 1578.9974, 2660.786, 192.12212, 2745.223, 1325.1553, 1014.7323, 2540.093]
2025-05-06 00:12:44,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [587.0, 1000.0, 493.0, 668.0, 1000.0, 101.0, 1000.0, 574.0, 425.0, 1000.0]
2025-05-06 00:12:44,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 95/100 (estimated time remaining: 20 minutes, 42 seconds)
2025-05-06 00:15:48,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 00:16:19,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2140.93018 ± 586.451
2025-05-06 00:16:19,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [789.2553, 2302.4255, 2385.9902, 2329.3125, 2383.9517, 2535.7168, 2453.9014, 2623.4814, 1204.3088, 2400.958]
2025-05-06 00:16:19,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 00:16:19,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 96/100 (estimated time remaining: 17 minutes, 15 seconds)
2025-05-06 00:19:32,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 00:19:54,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1862.73560 ± 1002.646
2025-05-06 00:19:54,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2371.132, 2528.3286, 2504.689, 397.04184, 2405.6008, 543.77246, 2711.523, 2626.064, 2442.22, 96.98462]
2025-05-06 00:19:54,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 192.0, 1000.0, 298.0, 1000.0, 1000.0, 1000.0, 62.0]
2025-05-06 00:19:55,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 97/100 (estimated time remaining: 14 minutes, 12 seconds)
2025-05-06 00:22:59,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 00:23:27,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2285.99707 ± 451.530
2025-05-06 00:23:27,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2440.6492, 2411.5352, 2372.374, 2369.2493, 2473.6042, 2598.2488, 2505.1787, 954.2677, 2460.2222, 2274.6428]
2025-05-06 00:23:27,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 985.0, 1000.0, 1000.0, 1000.0, 1000.0, 477.0, 1000.0, 1000.0]
2025-05-06 00:23:27,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 98/100 (estimated time remaining: 10 minutes, 31 seconds)
2025-05-06 00:26:38,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 00:27:04,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1979.22034 ± 827.287
2025-05-06 00:27:04,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2508.5781, 2343.402, 1328.5818, 2690.8123, 2571.0344, 1686.1985, 36.597588, 1350.306, 2521.7566, 2754.9346]
2025-05-06 00:27:04,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 667.0, 52.0, 1000.0, 1000.0, 1000.0]
2025-05-06 00:27:04,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 99/100 (estimated time remaining: 7 minutes, 6 seconds)
2025-05-06 00:30:03,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 00:30:25,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 1735.28247 ± 837.181
2025-05-06 00:30:25,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2446.2837, 219.35527, 1654.2393, 2125.3645, 2188.555, 1171.6002, 2580.492, 2078.1406, 321.43735, 2567.356]
2025-05-06 00:30:25,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 111.0, 680.0, 1000.0, 1000.0, 488.0, 1000.0, 1000.0, 155.0, 1000.0]
2025-05-06 00:30:25,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1097 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 32 seconds)
2025-05-06 00:33:33,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 00:34:03,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1119 [DEBUG]: Total Reward: 2509.51953 ± 117.365
2025-05-06 00:34:03,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1120 [DEBUG]: All rewards: [2529.5554, 2559.293, 2671.8333, 2593.3647, 2625.3416, 2606.8699, 2315.0266, 2375.8655, 2372.2173, 2445.8281]
2025-05-06 00:34:03,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 00:34:03,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1124 [INFO]: New best (2509.52) for latency SparseU15
2025-05-06 00:34:03,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1127 [INFO]: saving network
2025-05-06 00:34:03,213 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-ant/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 00:34:03,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-ant):1149 [DEBUG]: Training session finished
