2025-05-11 10:48:47,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4
2025-05-11 10:48:47,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4
2025-05-11 10:48:47,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x787e649c5c70>}
2025-05-11 10:48:47,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1111 [DEBUG]: using device: cpu
2025-05-11 10:48:47,680 baseline-bpql-noisy-halfcheetah:77 [WARNING]: args.assumed_delay != args.horizon: 4 != 24
2025-05-11 10:48:47,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1133 [INFO]: Creating new trainer
2025-05-11 10:48:47,689 baseline-bpql-noisy-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=41, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-11 10:48:47,690 baseline-bpql-noisy-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-11 10:48:47,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1194 [DEBUG]: Starting training session...
2025-05-11 10:48:47,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 1/100
2025-05-11 10:51:29,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:51:43,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -318.71820 ± 39.112
2025-05-11 10:51:43,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-345.37582, -356.5628, -272.48505, -302.8314, -325.65747, -337.19968, -313.7019, -313.0227, -238.8595, -381.48593]
2025-05-11 10:51:43,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 10:51:43,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (-318.72) for latency MM1Queue_a033_s075
2025-05-11 10:51:43,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 10:51:43,252 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 10:51:43,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 49 minutes, 14 seconds)
2025-05-11 10:54:36,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:54:49,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -148.01289 ± 51.562
2025-05-11 10:54:49,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-243.86993, -117.57829, -105.48893, -54.19114, -130.95512, -141.58249, -147.41913, -210.31847, -187.93126, -140.79413]
2025-05-11 10:54:49,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 10:54:49,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (-148.01) for latency MM1Queue_a033_s075
2025-05-11 10:54:49,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 10:54:49,825 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 10:54:49,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 55 minutes, 31 seconds)
2025-05-11 10:57:47,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 10:58:00,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 31.08713 ± 85.073
2025-05-11 10:58:00,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [51.978344, 91.45249, -9.858147, 74.93501, 117.29424, -43.312096, 138.03091, -169.3546, 37.29048, 22.414667]
2025-05-11 10:58:00,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 10:58:00,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (31.09) for latency MM1Queue_a033_s075
2025-05-11 10:58:00,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 10:58:00,825 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 10:58:00,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 57 minutes, 56 seconds)
2025-05-11 11:00:54,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:01:08,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 346.38742 ± 187.190
2025-05-11 11:01:08,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [139.31796, 471.96472, 524.0256, 614.2863, 477.32938, 432.92392, 109.638504, 268.44308, 31.216137, 394.7285]
2025-05-11 11:01:08,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:01:08,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (346.39) for latency MM1Queue_a033_s075
2025-05-11 11:01:08,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:01:08,309 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:01:08,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 56 minutes, 8 seconds)
2025-05-11 11:03:55,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:04:09,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 758.01062 ± 295.271
2025-05-11 11:04:09,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [358.00366, 139.42519, 992.48706, 994.53827, 634.3878, 796.20294, 1045.8772, 923.52484, 654.0096, 1041.6489]
2025-05-11 11:04:09,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:04:09,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (758.01) for latency MM1Queue_a033_s075
2025-05-11 11:04:09,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:04:09,166 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:04:09,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 51 minutes, 43 seconds)
2025-05-11 11:07:05,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:07:20,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1091.87427 ± 217.916
2025-05-11 11:07:20,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1190.1931, 1218.0518, 1046.9214, 1218.6184, 472.03345, 1133.0212, 1026.7429, 1232.8165, 1159.5533, 1220.79]
2025-05-11 11:07:20,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:07:20,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (1091.87) for latency MM1Queue_a033_s075
2025-05-11 11:07:20,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:07:20,260 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:07:20,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 53 minutes, 35 seconds)
2025-05-11 11:10:16,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:10:31,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1455.92700 ± 451.706
2025-05-11 11:10:31,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1443.8772, 1846.4315, 1002.0907, 1721.1481, 1732.0271, 1392.395, 323.71634, 1607.2871, 1916.9858, 1573.3115]
2025-05-11 11:10:31,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:10:31,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (1455.93) for latency MM1Queue_a033_s075
2025-05-11 11:10:31,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:10:31,360 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:10:31,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 51 minutes, 52 seconds)
2025-05-11 11:13:27,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:13:42,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2216.18481 ± 126.841
2025-05-11 11:13:42,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2252.7654, 2250.6492, 2431.8477, 2232.9797, 2070.745, 1936.8403, 2294.8628, 2245.9407, 2273.7202, 2171.4958]
2025-05-11 11:13:42,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:13:42,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (2216.18) for latency MM1Queue_a033_s075
2025-05-11 11:13:42,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:13:42,811 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:13:42,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 48 minutes, 52 seconds)
2025-05-11 11:16:40,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:16:54,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2495.19482 ± 250.788
2025-05-11 11:16:54,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2152.7073, 2952.3555, 2252.641, 2457.0261, 2495.796, 2619.9727, 2301.0723, 2489.402, 2898.4502, 2332.526]
2025-05-11 11:16:54,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:16:54,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (2495.19) for latency MM1Queue_a033_s075
2025-05-11 11:16:54,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:16:54,865 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:16:54,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 47 minutes, 7 seconds)
2025-05-11 11:19:51,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:20:06,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2325.79736 ± 681.191
2025-05-11 11:20:06,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2937.8032, 1855.4149, 1696.8265, 1019.3045, 1744.4746, 2398.8308, 2885.9282, 2491.668, 3026.524, 3201.197]
2025-05-11 11:20:06,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:20:06,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 47 minutes, 14 seconds)
2025-05-11 11:23:03,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:23:18,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2651.63525 ± 373.225
2025-05-11 11:23:18,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2773.8826, 2654.768, 2875.6433, 2874.9995, 2472.397, 2662.8362, 1618.4053, 2797.2979, 2751.0415, 3035.0806]
2025-05-11 11:23:18,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:23:18,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (2651.64) for latency MM1Queue_a033_s075
2025-05-11 11:23:18,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:23:18,358 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:23:18,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 44 minutes, 14 seconds)
2025-05-11 11:26:15,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:26:29,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2966.34033 ± 548.166
2025-05-11 11:26:29,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2879.6897, 1528.9893, 2836.4568, 2879.0984, 3383.7246, 2825.9883, 3344.7087, 3235.1362, 3661.0625, 3088.5493]
2025-05-11 11:26:29,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:26:29,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (2966.34) for latency MM1Queue_a033_s075
2025-05-11 11:26:29,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:26:29,113 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:26:29,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 40 minutes, 56 seconds)
2025-05-11 11:29:20,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:29:34,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2361.87744 ± 1069.841
2025-05-11 11:29:34,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1000.42914, 3413.2063, 3371.472, 1154.9093, 3251.782, 2806.5925, 1347.7479, 3331.2532, 798.76825, 3142.615]
2025-05-11 11:29:34,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:29:34,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 35 minutes, 55 seconds)
2025-05-11 11:32:25,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:32:39,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2978.35132 ± 755.935
2025-05-11 11:32:39,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3277.7634, 3374.7998, 3343.8203, 3284.308, 784.5014, 3152.709, 2696.354, 3347.3635, 3157.032, 3364.8633]
2025-05-11 11:32:39,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:32:39,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (2978.35) for latency MM1Queue_a033_s075
2025-05-11 11:32:39,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:32:39,508 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:32:39,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 30 minutes, 47 seconds)
2025-05-11 11:35:30,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:35:44,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3111.18384 ± 398.816
2025-05-11 11:35:44,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1969.037, 3463.7961, 3221.7195, 3165.3516, 3040.576, 3433.8782, 3243.8596, 3170.8206, 3201.3093, 3201.491]
2025-05-11 11:35:44,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:35:44,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (3111.18) for latency MM1Queue_a033_s075
2025-05-11 11:35:44,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:35:44,721 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:35:44,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 25 minutes, 47 seconds)
2025-05-11 11:38:35,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:38:49,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2505.90454 ± 1007.578
2025-05-11 11:38:49,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3174.8335, 2753.596, 2542.915, 1089.0782, 3447.1267, 3764.8752, 1442.848, 2991.296, 676.0533, 3176.422]
2025-05-11 11:38:49,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:38:49,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 20 minutes, 43 seconds)
2025-05-11 11:41:40,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:41:54,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2702.16846 ± 1003.500
2025-05-11 11:41:54,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3470.313, 2941.4075, 3294.138, 569.33466, 931.34406, 3268.3594, 2958.1777, 3618.3044, 2953.2542, 3017.0515]
2025-05-11 11:41:54,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:41:54,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 15 minutes, 58 seconds)
2025-05-11 11:44:45,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:44:59,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2797.51025 ± 1017.427
2025-05-11 11:44:59,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3773.931, 3568.577, 3322.472, 3772.2854, 1756.6152, 3430.1355, 1092.083, 3646.5972, 2305.8945, 1306.5121]
2025-05-11 11:44:59,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:44:59,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 12 minutes, 51 seconds)
2025-05-11 11:47:47,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:48:01,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3271.85376 ± 594.097
2025-05-11 11:48:01,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3538.3174, 3158.9617, 3844.082, 3389.848, 1642.5569, 3330.392, 3853.2092, 3147.092, 3239.3005, 3574.7766]
2025-05-11 11:48:01,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:48:01,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (3271.85) for latency MM1Queue_a033_s075
2025-05-11 11:48:01,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:48:01,068 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:48:01,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 8 minutes, 49 seconds)
2025-05-11 11:50:47,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:51:00,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3108.34253 ± 667.533
2025-05-11 11:51:00,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3208.4126, 3713.0718, 3545.5305, 3594.1191, 2262.0676, 3361.9404, 3356.613, 3290.1575, 3298.9668, 1452.5479]
2025-05-11 11:51:00,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:51:00,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 4 minutes, 19 seconds)
2025-05-11 11:53:49,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:54:03,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2984.53760 ± 748.343
2025-05-11 11:54:03,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3486.2993, 3383.9746, 3311.7112, 3178.2314, 2779.9458, 3223.2876, 2319.3882, 1042.6251, 3746.7454, 3373.1677]
2025-05-11 11:54:03,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:54:03,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 43 seconds)
2025-05-11 11:56:52,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 11:57:06,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3678.11597 ± 282.249
2025-05-11 11:57:06,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3635.1702, 3661.015, 3226.851, 3131.1252, 3815.697, 3728.6934, 4019.8857, 3857.138, 4026.4314, 3679.1514]
2025-05-11 11:57:06,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:57:06,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (3678.12) for latency MM1Queue_a033_s075
2025-05-11 11:57:06,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:57:06,691 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 11:57:06,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 57 minutes, 12 seconds)
2025-05-11 11:59:57,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:00:11,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3420.52197 ± 428.252
2025-05-11 12:00:11,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3782.086, 3639.6082, 2234.1326, 3730.4421, 3460.7463, 3319.771, 3275.2297, 3686.02, 3639.7075, 3437.474]
2025-05-11 12:00:11,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:00:11,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 54 minutes, 3 seconds)
2025-05-11 12:03:02,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:03:16,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3697.27344 ± 310.589
2025-05-11 12:03:16,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3875.5754, 3759.2085, 3488.486, 3472.7847, 4174.428, 4049.434, 3472.4011, 3427.9172, 4044.6633, 3207.8364]
2025-05-11 12:03:16,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:03:16,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (3697.27) for latency MM1Queue_a033_s075
2025-05-11 12:03:16,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 12:03:16,811 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 12:03:16,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 51 minutes, 59 seconds)
2025-05-11 12:06:08,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:06:22,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3530.80078 ± 847.791
2025-05-11 12:06:22,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4028.8152, 3787.038, 1030.974, 3828.2173, 3560.996, 3942.2656, 3903.66, 3835.9883, 3495.129, 3894.9253]
2025-05-11 12:06:22,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:06:22,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 50 minutes, 19 seconds)
2025-05-11 12:09:13,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:09:27,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3690.76318 ± 167.891
2025-05-11 12:09:27,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3787.607, 3431.091, 3716.854, 3885.4077, 3817.5513, 3469.208, 3530.3267, 3846.8374, 3870.0994, 3552.6533]
2025-05-11 12:09:27,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:09:27,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 47 minutes, 50 seconds)
2025-05-11 12:12:19,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:12:33,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3670.03955 ± 164.674
2025-05-11 12:12:33,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3462.026, 3994.9438, 3667.1646, 3909.966, 3724.4956, 3481.6702, 3697.6814, 3645.62, 3554.0059, 3562.8193]
2025-05-11 12:12:33,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:12:33,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 45 minutes, 27 seconds)
2025-05-11 12:15:25,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:15:39,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3483.92261 ± 554.869
2025-05-11 12:15:39,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3591.2432, 3686.651, 1830.8738, 3652.273, 3617.4758, 3778.5999, 3735.537, 3563.9658, 3642.0017, 3740.6064]
2025-05-11 12:15:39,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:15:39,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 42 minutes, 45 seconds)
2025-05-11 12:18:31,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:18:45,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3661.60083 ± 258.400
2025-05-11 12:18:45,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3881.826, 3557.0403, 4079.4949, 3618.5479, 3099.8042, 3610.0037, 3515.3018, 3696.7334, 3600.4731, 3956.782]
2025-05-11 12:18:45,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:18:45,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 39 minutes, 45 seconds)
2025-05-11 12:21:36,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:21:50,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3764.33667 ± 296.741
2025-05-11 12:21:50,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3065.0945, 3490.9995, 4091.4956, 3549.8071, 3918.249, 4032.1682, 3860.0857, 3769.5142, 3967.2124, 3898.742]
2025-05-11 12:21:50,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:21:50,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (3764.34) for latency MM1Queue_a033_s075
2025-05-11 12:21:50,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 12:21:50,939 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 12:21:50,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 36 minutes, 41 seconds)
2025-05-11 12:24:42,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:24:56,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3716.68945 ± 193.848
2025-05-11 12:24:56,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3722.1038, 3788.8333, 3368.8247, 3787.847, 3917.8047, 3605.1829, 3959.9556, 3385.5393, 3855.736, 3775.0676]
2025-05-11 12:24:56,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:24:56,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 33 minutes, 44 seconds)
2025-05-11 12:27:48,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:28:02,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3802.43115 ± 277.449
2025-05-11 12:28:02,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3854.7244, 3716.8352, 4028.0994, 3345.0757, 4203.954, 3591.1707, 3797.1277, 4141.6587, 3942.1597, 3403.5059]
2025-05-11 12:28:02,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:28:02,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (3802.43) for latency MM1Queue_a033_s075
2025-05-11 12:28:02,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 12:28:02,330 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 12:28:02,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 30 minutes, 35 seconds)
2025-05-11 12:30:54,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:31:08,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3473.49072 ± 923.251
2025-05-11 12:31:08,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3996.6138, 4215.097, 2735.5645, 4180.0884, 4045.614, 3664.788, 1032.2778, 3723.9104, 3215.8586, 3925.093]
2025-05-11 12:31:08,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:31:08,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 27 minutes, 27 seconds)
2025-05-11 12:33:59,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:34:14,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3711.61646 ± 287.079
2025-05-11 12:34:14,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4176.18, 3309.0732, 3470.1226, 3757.2656, 3704.6633, 3493.7153, 3743.689, 3947.745, 3384.3826, 4129.3286]
2025-05-11 12:34:14,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:34:14,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 24 minutes, 18 seconds)
2025-05-11 12:37:05,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:37:20,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3723.90088 ± 225.475
2025-05-11 12:37:20,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3863.5994, 3346.26, 4171.99, 3833.3425, 3862.8076, 3661.1804, 3478.382, 3517.7937, 3697.2698, 3806.3838]
2025-05-11 12:37:20,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:37:20,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 21 minutes, 19 seconds)
2025-05-11 12:40:12,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:40:26,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3679.41528 ± 400.688
2025-05-11 12:40:26,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3787.4055, 3957.0212, 3723.3582, 3759.0742, 2506.239, 3639.212, 3921.6025, 3836.4219, 3851.0957, 3812.7214]
2025-05-11 12:40:26,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:40:26,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 18 minutes, 22 seconds)
2025-05-11 12:43:17,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:43:31,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3745.24487 ± 626.872
2025-05-11 12:43:31,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3787.7517, 4219.117, 4134.7725, 3636.35, 4310.613, 4055.9036, 3535.5508, 2019.8988, 4061.267, 3691.223]
2025-05-11 12:43:31,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:43:31,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 15 minutes, 7 seconds)
2025-05-11 12:46:22,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:46:36,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3534.94531 ± 724.389
2025-05-11 12:46:36,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3934.1685, 3843.4075, 3897.415, 1770.8594, 3956.1433, 4030.6995, 3871.2166, 2492.3953, 3707.522, 3845.629]
2025-05-11 12:46:36,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:46:36,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 11 minutes, 46 seconds)
2025-05-11 12:49:27,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:49:41,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3854.11011 ± 430.152
2025-05-11 12:49:41,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3813.9075, 3820.3313, 4322.0996, 4151.343, 3890.0576, 4370.2437, 3702.1418, 3771.7937, 2743.97, 3955.214]
2025-05-11 12:49:41,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:49:41,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (3854.11) for latency MM1Queue_a033_s075
2025-05-11 12:49:41,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 12:49:41,220 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 12:49:41,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 8 minutes, 31 seconds)
2025-05-11 12:52:31,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:52:45,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3615.60620 ± 604.117
2025-05-11 12:52:45,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4025.9468, 3402.9346, 3721.6748, 2170.891, 4116.1196, 3744.7322, 4219.1924, 3661.989, 2957.119, 4135.464]
2025-05-11 12:52:45,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:52:45,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 5 minutes, 9 seconds)
2025-05-11 12:55:36,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:55:50,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3538.40308 ± 896.417
2025-05-11 12:55:50,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3800.517, 4015.373, 1036.2087, 3237.9875, 4231.7354, 3870.7742, 3226.6038, 3818.9028, 3915.4792, 4230.451]
2025-05-11 12:55:50,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:55:50,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 1 minute, 42 seconds)
2025-05-11 12:58:41,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 12:58:55,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3883.38550 ± 219.706
2025-05-11 12:58:55,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3792.961, 3566.8772, 3614.0305, 3863.216, 4239.2676, 3909.4468, 3748.312, 4006.9053, 4255.4683, 3837.3726]
2025-05-11 12:58:55,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:58:55,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (3883.39) for latency MM1Queue_a033_s075
2025-05-11 12:58:55,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 12:58:55,430 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 12:58:55,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 58 minutes, 38 seconds)
2025-05-11 13:01:46,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:02:00,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3403.72192 ± 1081.038
2025-05-11 13:02:00,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3675.4512, 3924.1743, 1586.9215, 4052.5508, 3770.1086, 3693.339, 4143.215, 4063.5088, 4146.589, 981.3578]
2025-05-11 13:02:00,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:02:00,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 55 minutes, 35 seconds)
2025-05-11 13:04:50,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:05:05,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3699.94482 ± 334.331
2025-05-11 13:05:05,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3662.9167, 4447.371, 3512.112, 3690.4956, 3134.9229, 3922.4753, 3329.2947, 3735.5996, 3809.0894, 3755.1702]
2025-05-11 13:05:05,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:05:05,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 52 minutes, 27 seconds)
2025-05-11 13:07:56,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:08:10,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3870.65381 ± 434.957
2025-05-11 13:08:10,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3504.513, 4279.3535, 3850.0144, 2709.7869, 4150.0435, 3984.962, 4082.716, 4042.298, 3974.5063, 4128.345]
2025-05-11 13:08:10,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:08:10,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 49 minutes, 26 seconds)
2025-05-11 13:11:01,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:11:15,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3894.26440 ± 601.699
2025-05-11 13:11:15,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4409.364, 4234.5107, 4127.405, 4238.482, 3876.7493, 2209.0076, 3573.48, 4140.9355, 4105.754, 4026.9573]
2025-05-11 13:11:15,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:11:15,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (3894.26) for latency MM1Queue_a033_s075
2025-05-11 13:11:15,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 13:11:15,680 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 13:11:15,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 46 minutes, 31 seconds)
2025-05-11 13:14:06,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:14:21,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3615.02979 ± 610.880
2025-05-11 13:14:21,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2132.5938, 3822.7039, 3778.4302, 3805.247, 3996.6501, 3979.728, 2737.3103, 3961.2136, 4038.1172, 3898.3025]
2025-05-11 13:14:21,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:14:21,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 43 minutes, 31 seconds)
2025-05-11 13:17:11,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:17:25,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4069.80127 ± 228.086
2025-05-11 13:17:25,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3969.5247, 4354.118, 3978.3596, 4174.804, 3809.0586, 4198.939, 3950.8562, 3634.4019, 4279.8545, 4348.0923]
2025-05-11 13:17:25,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:17:25,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (4069.80) for latency MM1Queue_a033_s075
2025-05-11 13:17:25,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 13:17:25,617 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 13:17:25,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 40 minutes, 22 seconds)
2025-05-11 13:20:16,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:20:30,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3785.98584 ± 543.112
2025-05-11 13:20:30,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4014.5913, 3752.2644, 4077.3003, 4197.9727, 4061.5142, 3941.0354, 4048.0574, 3178.096, 4194.639, 2394.3862]
2025-05-11 13:20:30,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:20:30,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 37 minutes, 18 seconds)
2025-05-11 13:23:21,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:23:35,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4075.24731 ± 272.439
2025-05-11 13:23:35,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4184.2783, 4001.619, 3637.2097, 4420.7734, 4037.177, 3818.2485, 4393.6187, 4484.755, 3830.267, 3944.5286]
2025-05-11 13:23:35,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:23:35,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (4075.25) for latency MM1Queue_a033_s075
2025-05-11 13:23:35,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 13:23:35,298 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 13:23:35,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 34 minutes, 11 seconds)
2025-05-11 13:26:26,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:26:40,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3817.24463 ± 636.762
2025-05-11 13:26:40,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3681.7397, 4232.0586, 3867.58, 2189.7495, 3503.5825, 4137.6216, 3477.1458, 4286.755, 4375.492, 4420.7173]
2025-05-11 13:26:40,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:26:40,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 31 minutes)
2025-05-11 13:29:31,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:29:44,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3799.67529 ± 613.280
2025-05-11 13:29:44,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3987.5872, 4039.7688, 4109.8125, 3912.4067, 4340.2793, 4166.594, 2060.2563, 3534.0842, 3868.8105, 3977.1528]
2025-05-11 13:29:44,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:29:44,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 27 minutes, 43 seconds)
2025-05-11 13:32:35,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:32:48,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3894.24292 ± 236.805
2025-05-11 13:32:48,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3894.0208, 4058.3176, 3844.4875, 4055.3652, 3845.7644, 3582.7102, 4390.249, 4012.1, 3574.3184, 3685.0938]
2025-05-11 13:32:48,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:32:48,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 24 minutes, 30 seconds)
2025-05-11 13:35:39,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:35:51,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3864.69580 ± 844.614
2025-05-11 13:35:51,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4340.182, 4334.345, 4279.684, 3809.9558, 4486.211, 4202.362, 1457.2601, 3854.6475, 3621.768, 4260.5376]
2025-05-11 13:35:51,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:35:51,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 21 minutes, 16 seconds)
2025-05-11 13:38:43,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:38:56,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3711.59912 ± 838.549
2025-05-11 13:38:56,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4233.5103, 4213.2544, 3761.2332, 4033.182, 3948.2756, 1266.866, 4300.487, 3827.7622, 3803.3616, 3728.0574]
2025-05-11 13:38:56,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:38:56,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 18 minutes, 11 seconds)
2025-05-11 13:41:48,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:42:01,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3439.24561 ± 1201.057
2025-05-11 13:42:01,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1081.2483, 3112.7527, 4378.0396, 3816.0051, 3849.1802, 4190.481, 4116.821, 4268.419, 4372.891, 1206.6184]
2025-05-11 13:42:01,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:42:01,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 15 minutes, 10 seconds)
2025-05-11 13:45:00,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:45:14,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3616.52734 ± 630.744
2025-05-11 13:45:14,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3745.183, 2069.8782, 4350.258, 4238.828, 3122.3276, 3525.3167, 4073.0596, 4017.7507, 3510.0781, 3512.5923]
2025-05-11 13:45:14,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:45:14,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 13 minutes, 18 seconds)
2025-05-11 13:48:05,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:48:19,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4077.29736 ± 817.679
2025-05-11 13:48:19,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4648.89, 4417.7407, 4124.162, 4625.453, 3935.4292, 1704.3241, 4457.8667, 4327.514, 4191.4404, 4340.154]
2025-05-11 13:48:19,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:48:19,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (4077.30) for latency MM1Queue_a033_s075
2025-05-11 13:48:19,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 13:48:19,258 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 13:48:19,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 10 minutes, 22 seconds)
2025-05-11 13:51:09,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:51:23,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4247.81543 ± 222.852
2025-05-11 13:51:23,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4160.6143, 4162.6333, 4243.251, 4435.1006, 4508.672, 4388.2373, 4289.5537, 3912.0225, 3844.4336, 4533.6357]
2025-05-11 13:51:23,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:51:23,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (4247.82) for latency MM1Queue_a033_s075
2025-05-11 13:51:23,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 13:51:23,638 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 13:51:23,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 7 minutes, 20 seconds)
2025-05-11 13:54:13,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:54:27,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4152.67432 ± 273.554
2025-05-11 13:54:27,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3579.1829, 3976.3506, 4362.3633, 4093.5955, 4634.574, 4342.4863, 4272.9834, 3913.9268, 4168.945, 4182.3354]
2025-05-11 13:54:27,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:54:27,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 4 minutes, 10 seconds)
2025-05-11 13:57:18,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 13:57:32,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4008.48438 ± 816.432
2025-05-11 13:57:32,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4347.857, 4375.043, 4370.136, 4133.2783, 4343.097, 4297.6055, 3878.0046, 4240.992, 4492.229, 1606.5978]
2025-05-11 13:57:32,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:57:32,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 57 seconds)
2025-05-11 14:00:22,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:00:36,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3682.17432 ± 1009.216
2025-05-11 14:00:36,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4032.19, 4691.9346, 3833.8398, 4288.286, 4018.0095, 959.466, 2905.959, 4126.2905, 3712.5642, 4253.2017]
2025-05-11 14:00:36,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:00:36,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 56 minutes, 50 seconds)
2025-05-11 14:03:30,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:03:44,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3914.40503 ± 808.259
2025-05-11 14:03:44,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4587.6904, 4351.443, 3814.429, 4212.384, 4267.083, 1621.8599, 3602.8572, 4239.973, 4312.2246, 4134.1064]
2025-05-11 14:03:44,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:03:44,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 54 minutes, 8 seconds)
2025-05-11 14:06:39,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:06:54,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4076.17456 ± 350.331
2025-05-11 14:06:54,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3904.6182, 4075.0935, 4731.1616, 3854.068, 3562.7595, 4625.134, 4162.2695, 4199.3315, 3871.009, 3776.302]
2025-05-11 14:06:54,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:06:54,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 51 minutes, 38 seconds)
2025-05-11 14:09:47,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:10:01,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4122.21533 ± 193.658
2025-05-11 14:10:01,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4407.8633, 4085.7358, 3893.153, 4182.853, 4346.016, 3875.0518, 4101.6514, 3931.716, 4394.329, 4003.7861]
2025-05-11 14:10:01,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:10:01,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 48 minutes, 55 seconds)
2025-05-11 14:12:52,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:13:06,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3753.22192 ± 1017.594
2025-05-11 14:13:06,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3713.1104, 4052.2134, 1111.1757, 4034.1875, 4401.642, 4534.6504, 4003.8796, 4682.0454, 4247.7905, 2751.5237]
2025-05-11 14:13:06,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:13:06,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 45 minutes, 55 seconds)
2025-05-11 14:15:59,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:16:13,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3937.95044 ± 710.755
2025-05-11 14:16:13,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2422.775, 4195.5747, 3381.1067, 4301.6978, 4510.9336, 2945.4333, 4198.9717, 4510.0103, 4474.3467, 4438.651]
2025-05-11 14:16:13,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:16:13,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 43 minutes)
2025-05-11 14:19:05,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:19:19,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3816.36572 ± 342.548
2025-05-11 14:19:19,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3119.0242, 3354.199, 4043.5342, 4114.6665, 4114.7876, 3860.4875, 3915.5564, 4219.131, 3556.6873, 3865.584]
2025-05-11 14:19:19,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:19:19,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 39 minutes, 44 seconds)
2025-05-11 14:22:11,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:22:25,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3960.81396 ± 746.686
2025-05-11 14:22:25,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4571.3306, 4581.512, 4310.035, 4063.514, 3922.3918, 2755.448, 3914.8025, 4616.3945, 2376.0479, 4496.665]
2025-05-11 14:22:25,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:22:25,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 36 minutes, 13 seconds)
2025-05-11 14:25:15,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:25:30,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3823.49219 ± 502.576
2025-05-11 14:25:30,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3942.0789, 3996.2622, 4125.627, 4227.8774, 3711.7844, 3973.0305, 4252.376, 2494.169, 3415.221, 4096.4956]
2025-05-11 14:25:30,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:25:30,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 32 minutes, 50 seconds)
2025-05-11 14:28:20,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:28:34,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4321.51465 ± 294.001
2025-05-11 14:28:34,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4487.7534, 4074.9158, 4058.1226, 4137.9805, 4346.841, 3795.5642, 4555.8364, 4817.4834, 4629.087, 4311.5625]
2025-05-11 14:28:34,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:28:34,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (4321.51) for latency MM1Queue_a033_s075
2025-05-11 14:28:34,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 14:28:34,885 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 14:28:34,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 29 minutes, 42 seconds)
2025-05-11 14:31:25,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:31:39,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3931.30518 ± 392.026
2025-05-11 14:31:39,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3980.4968, 2947.822, 3799.3613, 4100.4355, 4236.8096, 4209.498, 3724.686, 3826.1013, 4018.9429, 4468.898]
2025-05-11 14:31:39,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:31:39,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 26 minutes, 24 seconds)
2025-05-11 14:34:30,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:34:44,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3724.44678 ± 487.208
2025-05-11 14:34:44,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3919.152, 3014.9084, 2823.9363, 3237.2427, 3880.6208, 4296.3765, 3898.8098, 4257.814, 4024.0862, 3891.5232]
2025-05-11 14:34:44,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:34:44,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 23 minutes, 15 seconds)
2025-05-11 14:37:35,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:37:49,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3779.92041 ± 1033.808
2025-05-11 14:37:49,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4632.529, 4440.318, 3949.9795, 2996.584, 993.3756, 4221.21, 4038.9385, 4589.015, 4223.0684, 3714.1865]
2025-05-11 14:37:49,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:37:49,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 20 minutes, 8 seconds)
2025-05-11 14:40:40,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:40:54,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4172.02588 ± 253.868
2025-05-11 14:40:54,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3923.2546, 4457.5845, 4257.8535, 3742.5037, 4546.9604, 4364.73, 4011.9067, 4331.3936, 4194.781, 3889.2908]
2025-05-11 14:40:54,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:40:54,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 17 minutes, 3 seconds)
2025-05-11 14:43:45,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:43:59,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4060.07422 ± 613.935
2025-05-11 14:43:59,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4537.474, 4393.2646, 4080.745, 2326.9006, 4316.5903, 4572.5645, 4297.081, 4041.151, 3880.8992, 4154.075]
2025-05-11 14:43:59,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:43:59,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 13 minutes, 59 seconds)
2025-05-11 14:46:51,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:47:05,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4282.09619 ± 155.572
2025-05-11 14:47:05,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4388.45, 4366.828, 4030.9297, 4013.3767, 4246.073, 4348.416, 4518.831, 4356.7935, 4377.703, 4173.56]
2025-05-11 14:47:05,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:47:05,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 11 minutes)
2025-05-11 14:49:55,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:50:10,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4189.37988 ± 282.961
2025-05-11 14:50:10,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4321.8677, 3447.7034, 4383.6055, 4300.303, 4231.7964, 4376.47, 3956.295, 4407.3604, 4376.2236, 4092.1724]
2025-05-11 14:50:10,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:50:10,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 7 minutes, 50 seconds)
2025-05-11 14:53:01,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:53:15,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3935.28052 ± 759.911
2025-05-11 14:53:15,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3563.1143, 4053.5134, 4187.3203, 3928.0503, 4612.391, 4510.3477, 4354.7466, 4185.932, 1816.6904, 4140.6997]
2025-05-11 14:53:15,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:53:15,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 4 minutes, 47 seconds)
2025-05-11 14:56:06,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:56:20,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3743.84570 ± 1180.745
2025-05-11 14:56:20,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4161.422, 5005.3457, 1631.3541, 1462.3235, 4745.235, 3403.4502, 4655.539, 4284.565, 4250.436, 3838.7854]
2025-05-11 14:56:20,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:56:20,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 1 minute, 42 seconds)
2025-05-11 14:59:11,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 14:59:25,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4167.89893 ± 407.587
2025-05-11 14:59:25,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4228.8228, 4343.605, 4252.0137, 4174.372, 4553.233, 4497.0996, 4343.457, 3015.4495, 4164.196, 4106.7407]
2025-05-11 14:59:25,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:59:25,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 58 minutes, 37 seconds)
2025-05-11 15:02:16,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:02:29,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4302.93262 ± 240.683
2025-05-11 15:02:29,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4469.871, 4120.364, 4073.2507, 4430.7954, 3970.8538, 4298.56, 4098.579, 4776.93, 4228.8213, 4561.3066]
2025-05-11 15:02:29,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:02:29,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 55 minutes, 28 seconds)
2025-05-11 15:05:22,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:05:36,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4316.99512 ± 352.096
2025-05-11 15:05:36,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3851.4055, 4538.6406, 4527.426, 4730.559, 4427.3735, 3698.6304, 4421.3984, 4268.033, 3936.0964, 4770.385]
2025-05-11 15:05:36,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:05:36,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 52 minutes, 30 seconds)
2025-05-11 15:08:29,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:08:43,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3986.40771 ± 617.037
2025-05-11 15:08:43,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4238.113, 4088.552, 4032.2134, 2217.4172, 4394.5957, 4284.934, 3765.5005, 4432.5264, 4243.2373, 4166.9883]
2025-05-11 15:08:43,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:08:43,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 49 minutes, 30 seconds)
2025-05-11 15:11:35,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:11:49,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4207.10059 ± 213.048
2025-05-11 15:11:49,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3838.2964, 4331.626, 4370.1284, 4184.884, 3851.1143, 4219.084, 4328.8193, 4077.257, 4515.432, 4354.3623]
2025-05-11 15:11:49,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:11:49,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 46 minutes, 28 seconds)
2025-05-11 15:14:41,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:14:56,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4316.37207 ± 310.722
2025-05-11 15:14:56,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4452.4, 4519.277, 4350.802, 4338.2334, 4536.2393, 4233.35, 3968.2646, 4195.629, 3688.4602, 4881.065]
2025-05-11 15:14:56,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:14:56,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 43 minutes, 25 seconds)
2025-05-11 15:17:47,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:18:01,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3694.30347 ± 897.788
2025-05-11 15:18:01,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2784.83, 4399.9644, 2097.0388, 4796.6313, 3419.1558, 4351.9688, 4172.682, 4304.2197, 2424.631, 4191.9136]
2025-05-11 15:18:01,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:18:01,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 40 minutes, 23 seconds)
2025-05-11 15:20:53,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:21:07,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4134.15967 ± 309.530
2025-05-11 15:21:07,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4299.981, 4322.6646, 3840.5884, 4034.5168, 3975.281, 4577.1606, 3863.5776, 4003.069, 3725.1357, 4699.624]
2025-05-11 15:21:07,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:21:07,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 37 minutes, 13 seconds)
2025-05-11 15:23:58,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:24:12,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4155.76172 ± 781.682
2025-05-11 15:24:12,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4011.8909, 4562.1743, 4792.2754, 3980.4468, 4565.605, 4366.6836, 4638.8237, 1928.0619, 4354.3457, 4357.306]
2025-05-11 15:24:12,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:24:12,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 34 minutes, 3 seconds)
2025-05-11 15:27:03,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:27:17,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4238.02637 ± 256.610
2025-05-11 15:27:17,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4666.7446, 4542.3213, 4245.4385, 4206.3994, 3980.7085, 3697.014, 4195.0957, 4339.4634, 4184.5503, 4322.524]
2025-05-11 15:27:17,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:27:17,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 30 minutes, 56 seconds)
2025-05-11 15:30:08,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:30:22,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3995.77271 ± 734.534
2025-05-11 15:30:22,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4298.355, 2189.92, 4605.1377, 4258.428, 3929.6797, 4116.3354, 3109.8977, 4711.242, 4431.4404, 4307.291]
2025-05-11 15:30:22,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:30:23,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 27 minutes, 48 seconds)
2025-05-11 15:33:13,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:33:27,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4413.71875 ± 349.039
2025-05-11 15:33:27,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4303.7876, 4753.3857, 4359.065, 4796.549, 3651.4834, 4621.9287, 4359.9663, 4856.469, 4360.152, 4074.4004]
2025-05-11 15:33:27,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:33:27,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (4413.72) for latency MM1Queue_a033_s075
2025-05-11 15:33:27,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 15:33:27,931 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 15:33:27,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 24 minutes, 41 seconds)
2025-05-11 15:36:19,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:36:33,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4478.63916 ± 193.821
2025-05-11 15:36:33,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4678.7534, 4601.134, 4418.257, 4410.6826, 4204.24, 4239.078, 4641.3315, 4840.0283, 4405.6523, 4347.2354]
2025-05-11 15:36:33,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:36:33,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (4478.64) for latency MM1Queue_a033_s075
2025-05-11 15:36:33,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 15:36:33,179 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem4/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-11 15:36:33,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 21 minutes, 36 seconds)
2025-05-11 15:39:24,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:39:38,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4354.74072 ± 261.701
2025-05-11 15:39:38,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4394.7534, 4241.8906, 3859.547, 4551.1714, 4675.724, 4300.249, 4726.9375, 4201.25, 4061.0947, 4534.787]
2025-05-11 15:39:38,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:39:38,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 18 minutes, 30 seconds)
2025-05-11 15:42:29,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:42:43,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4297.95801 ± 211.578
2025-05-11 15:42:43,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4378.6733, 4219.178, 3858.201, 4276.2656, 4518.459, 4000.9592, 4511.267, 4369.0977, 4528.7183, 4318.7627]
2025-05-11 15:42:43,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:42:43,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 15 minutes, 25 seconds)
2025-05-11 15:45:37,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:45:51,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4246.19189 ± 286.864
2025-05-11 15:45:51,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3931.6702, 4533.238, 4478.317, 4049.989, 4011.081, 3962.3623, 4383.6035, 4492.535, 3905.9648, 4713.155]
2025-05-11 15:45:51,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:45:51,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 12 minutes, 23 seconds)
2025-05-11 15:48:48,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:49:03,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4310.98584 ± 295.599
2025-05-11 15:49:03,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4252.1055, 4308.943, 4282.187, 4375.692, 4804.342, 4508.469, 4674.263, 3720.0703, 4145.49, 4038.2961]
2025-05-11 15:49:03,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:49:03,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 21 seconds)
2025-05-11 15:51:58,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:52:12,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4289.52832 ± 344.159
2025-05-11 15:52:12,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4051.5337, 4502.84, 4647.899, 4315.8457, 4543.7646, 4301.387, 4306.595, 3399.4365, 4593.781, 4232.1987]
2025-05-11 15:52:12,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:52:12,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 15 seconds)
2025-05-11 15:55:03,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:55:17,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3511.29932 ± 1150.867
2025-05-11 15:55:17,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4040.2712, 1106.8986, 1724.9406, 4538.3354, 4347.4526, 4019.0103, 3822.1917, 2861.2786, 4067.9104, 4584.7046]
2025-05-11 15:55:17,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:55:18,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 7 seconds)
2025-05-11 15:58:09,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-11 15:58:23,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4127.35791 ± 908.324
2025-05-11 15:58:23,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4621.4, 4053.4622, 4660.3193, 4577.176, 4462.385, 1471.6597, 4672.8945, 4271.636, 4190.2373, 4292.405]
2025-05-11 15:58:23,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:58:23,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1251 [DEBUG]: Training session finished
