2025-05-09 14:41:13,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16
2025-05-09 14:41:13,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16
2025-05-09 14:41:13,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x751bf7bc8c70>}
2025-05-09 14:41:13,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1111 [DEBUG]: using device: cpu
2025-05-09 14:41:13,628 baseline-bpql-noisy-halfcheetah:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-05-09 14:41:13,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1133 [INFO]: Creating new trainer
2025-05-09 14:41:13,634 baseline-bpql-noisy-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=113, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-09 14:41:13,634 baseline-bpql-noisy-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-09 14:41:13,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1194 [DEBUG]: Starting training session...
2025-05-09 14:41:13,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 1/100
2025-05-09 14:43:50,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:44:08,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -498.28998 ± 84.487
2025-05-09 14:44:08,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-497.14627, -399.49185, -569.6995, -576.9049, -359.34265, -522.1052, -585.259, -367.8975, -549.6737, -555.3789]
2025-05-09 14:44:08,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:44:08,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (-498.29) for latency MM1Queue_a033_s075
2025-05-09 14:44:08,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 14:44:08,423 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 14:44:08,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 48 minutes, 3 seconds)
2025-05-09 14:47:02,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:47:19,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -207.15295 ± 59.995
2025-05-09 14:47:19,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-268.50156, -209.311, -190.40417, -224.74016, -288.88126, -247.38751, -56.032642, -202.57545, -189.51959, -194.17636]
2025-05-09 14:47:19,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:47:19,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (-207.15) for latency MM1Queue_a033_s075
2025-05-09 14:47:19,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 14:47:19,659 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 14:47:19,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 58 minutes, 44 seconds)
2025-05-09 14:50:07,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:50:24,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -52.50746 ± 75.024
2025-05-09 14:50:24,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-92.49939, -2.365019, 8.519308, -165.72032, 40.139874, -104.45604, 59.40088, -45.31247, -160.58311, -62.198334]
2025-05-09 14:50:24,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:50:24,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (-52.51) for latency MM1Queue_a033_s075
2025-05-09 14:50:24,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 14:50:24,869 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 14:50:24,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 56 minutes, 56 seconds)
2025-05-09 14:53:12,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:53:29,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 66.99983 ± 159.142
2025-05-09 14:53:29,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-107.15287, 405.20908, 225.197, 26.883701, -60.900234, -45.737877, 23.948658, 183.58842, -114.57722, 133.53958]
2025-05-09 14:53:29,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:53:29,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (67.00) for latency MM1Queue_a033_s075
2025-05-09 14:53:29,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 14:53:29,931 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 14:53:29,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 54 minutes, 26 seconds)
2025-05-09 14:56:17,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:56:34,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 266.64014 ± 182.078
2025-05-09 14:56:34,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [373.2388, -59.709705, 193.59592, 387.14682, 311.7951, 270.84518, 200.83347, 295.05283, 52.052277, 641.55054]
2025-05-09 14:56:34,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:56:34,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (266.64) for latency MM1Queue_a033_s075
2025-05-09 14:56:34,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 14:56:34,846 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 14:56:34,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 51 minutes, 39 seconds)
2025-05-09 14:59:22,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:59:39,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 456.95425 ± 280.742
2025-05-09 14:59:39,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [70.25264, 624.3068, 883.7054, 752.1331, 280.78214, 133.29892, 86.43553, 687.9206, 472.4766, 578.2307]
2025-05-09 14:59:39,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 14:59:39,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (456.95) for latency MM1Queue_a033_s075
2025-05-09 14:59:39,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 14:59:39,753 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 14:59:39,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 51 minutes, 49 seconds)
2025-05-09 15:02:27,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:02:45,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 572.00989 ± 362.203
2025-05-09 15:02:45,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [385.6228, 1092.1106, 235.3504, 234.61313, 304.17957, 215.72932, 684.7788, 397.42004, 1010.97156, 1159.3221]
2025-05-09 15:02:45,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:02:45,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (572.01) for latency MM1Queue_a033_s075
2025-05-09 15:02:45,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 15:02:45,303 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:02:45,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 46 minutes, 56 seconds)
2025-05-09 15:05:33,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:05:50,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 690.32721 ± 219.453
2025-05-09 15:05:50,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [524.4594, 924.63086, 606.5904, 1136.6534, 955.2597, 516.4546, 482.85736, 587.52155, 675.25604, 493.58832]
2025-05-09 15:05:50,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:05:50,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (690.33) for latency MM1Queue_a033_s075
2025-05-09 15:05:50,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 15:05:50,846 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:05:50,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 43 minutes, 58 seconds)
2025-05-09 15:08:39,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:08:56,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1012.67297 ± 197.867
2025-05-09 15:08:56,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [932.8129, 900.66144, 1183.8423, 1497.3114, 1139.0234, 954.40106, 882.46375, 864.4643, 980.6928, 791.0567]
2025-05-09 15:08:56,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:08:56,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (1012.67) for latency MM1Queue_a033_s075
2025-05-09 15:08:56,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 15:08:56,679 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:08:56,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 41 minutes, 6 seconds)
2025-05-09 15:11:44,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:12:02,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1114.61377 ± 197.161
2025-05-09 15:12:02,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1207.6503, 1496.7865, 936.03754, 985.27673, 953.03723, 973.1526, 1093.8975, 1057.2568, 1458.9009, 984.14197]
2025-05-09 15:12:02,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:12:02,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (1114.61) for latency MM1Queue_a033_s075
2025-05-09 15:12:02,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 15:12:02,434 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:12:02,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 38 minutes, 16 seconds)
2025-05-09 15:14:50,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:15:08,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1231.85718 ± 281.289
2025-05-09 15:15:08,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1576.2172, 946.09705, 1157.321, 1609.5842, 893.025, 1454.3942, 1087.4271, 1071.5737, 1600.675, 922.2572]
2025-05-09 15:15:08,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:15:08,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (1231.86) for latency MM1Queue_a033_s075
2025-05-09 15:15:08,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 15:15:08,142 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:15:08,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 35 minutes, 25 seconds)
2025-05-09 15:17:56,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:18:13,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1238.70703 ± 238.815
2025-05-09 15:18:13,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1302.5583, 1673.3215, 1589.3623, 1224.2397, 1122.931, 830.81, 1306.7035, 1142.1732, 1198.9149, 996.05615]
2025-05-09 15:18:13,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:18:13,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (1238.71) for latency MM1Queue_a033_s075
2025-05-09 15:18:13,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 15:18:13,945 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:18:13,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 32 minutes, 24 seconds)
2025-05-09 15:21:02,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:21:19,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1144.67102 ± 81.407
2025-05-09 15:21:19,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1126.256, 1306.532, 1274.1113, 1095.0192, 1106.1748, 1063.9076, 1099.243, 1063.9569, 1198.6707, 1112.8381]
2025-05-09 15:21:19,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:21:19,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 29 minutes, 21 seconds)
2025-05-09 15:24:08,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:24:26,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1329.44336 ± 307.621
2025-05-09 15:24:26,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [999.14935, 1889.9518, 1014.36383, 1269.5145, 1197.6321, 1445.2662, 1851.0988, 1037.2804, 1173.5469, 1416.6294]
2025-05-09 15:24:26,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:24:26,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (1329.44) for latency MM1Queue_a033_s075
2025-05-09 15:24:26,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 15:24:26,370 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:24:26,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 26 minutes, 30 seconds)
2025-05-09 15:27:14,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:27:32,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1393.51794 ± 302.971
2025-05-09 15:27:32,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1692.442, 1692.3673, 1232.4365, 1312.3757, 2082.9846, 1239.6326, 1110.5487, 1145.955, 1187.7448, 1238.6927]
2025-05-09 15:27:32,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:27:32,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (1393.52) for latency MM1Queue_a033_s075
2025-05-09 15:27:32,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 15:27:32,124 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:27:32,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 23 minutes, 24 seconds)
2025-05-09 15:30:20,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:30:38,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1476.14172 ± 379.763
2025-05-09 15:30:38,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1604.8262, 2000.8239, 1025.0491, 1023.40155, 1283.8466, 1269.1609, 1807.0621, 1488.2526, 2132.179, 1126.8156]
2025-05-09 15:30:38,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:30:38,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (1476.14) for latency MM1Queue_a033_s075
2025-05-09 15:30:38,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 15:30:38,407 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:30:38,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 20 minutes, 28 seconds)
2025-05-09 15:33:27,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:33:44,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1537.49072 ± 349.471
2025-05-09 15:33:44,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1131.7599, 1411.2836, 1276.4553, 1912.428, 1103.0677, 2136.3765, 1238.6027, 1914.9673, 1761.9491, 1488.0165]
2025-05-09 15:33:44,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:33:44,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (1537.49) for latency MM1Queue_a033_s075
2025-05-09 15:33:44,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 15:33:44,403 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:33:44,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 17 minutes, 25 seconds)
2025-05-09 15:36:33,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:36:50,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2006.84155 ± 563.571
2025-05-09 15:36:50,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1805.7549, 1797.1558, 2813.3489, 1402.7228, 1628.0236, 1486.3733, 2023.7487, 2873.4675, 1453.529, 2784.2917]
2025-05-09 15:36:50,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:36:50,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (2006.84) for latency MM1Queue_a033_s075
2025-05-09 15:36:50,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 15:36:50,489 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:36:50,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 14 minutes, 25 seconds)
2025-05-09 15:39:40,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:39:57,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1646.25122 ± 290.986
2025-05-09 15:39:57,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1389.3251, 1909.8516, 1712.6968, 1250.7936, 1646.7701, 1502.7561, 1321.4119, 1589.8076, 1892.3906, 2246.707]
2025-05-09 15:39:57,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:39:57,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 11 minutes, 24 seconds)
2025-05-09 15:42:45,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:43:03,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1873.24646 ± 501.751
2025-05-09 15:43:03,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1419.1576, 2854.8086, 2004.7891, 2032.9137, 1802.2592, 1675.7444, 1228.8638, 2394.0232, 1168.2145, 2151.6907]
2025-05-09 15:43:03,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:43:03,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 8 minutes, 16 seconds)
2025-05-09 15:45:45,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:46:03,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2018.01038 ± 463.460
2025-05-09 15:46:03,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2028.9392, 2056.2847, 2606.906, 1300.0206, 2432.175, 2300.3848, 1694.9843, 2223.0918, 2385.9238, 1151.3945]
2025-05-09 15:46:03,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:46:03,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (2018.01) for latency MM1Queue_a033_s075
2025-05-09 15:46:03,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 15:46:03,250 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:46:03,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 3 minutes, 32 seconds)
2025-05-09 15:48:43,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:49:00,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1778.69409 ± 524.632
2025-05-09 15:49:00,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1888.5388, 1370.1708, 2649.3262, 1809.9908, 2162.1013, 1459.6729, 1027.7495, 2621.037, 1353.6548, 1444.7001]
2025-05-09 15:49:00,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:49:00,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 58 minutes, 13 seconds)
2025-05-09 15:51:40,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:51:58,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1785.43494 ± 503.982
2025-05-09 15:51:58,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2129.877, 1172.9681, 1403.6343, 1173.0321, 2750.949, 1755.2596, 1971.4645, 1356.0684, 1745.9366, 2395.1604]
2025-05-09 15:51:58,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:51:58,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 52 minutes, 59 seconds)
2025-05-09 15:54:38,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:54:55,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1579.71790 ± 506.370
2025-05-09 15:54:55,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1703.0062, 1218.2866, 1314.5638, 1355.8127, 1263.505, 3023.4639, 1285.4729, 1620.2131, 1421.357, 1591.4967]
2025-05-09 15:54:55,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:54:55,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 47 minutes, 31 seconds)
2025-05-09 15:57:35,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:57:53,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1824.63708 ± 379.499
2025-05-09 15:57:53,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1926.6355, 1779.4314, 1829.5983, 2667.587, 1901.605, 1381.3381, 1318.334, 2092.2683, 1408.4796, 1941.0957]
2025-05-09 15:57:53,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 15:57:53,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 42 minutes, 28 seconds)
2025-05-09 16:00:32,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:00:50,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1578.75525 ± 261.924
2025-05-09 16:00:50,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1298.1462, 1828.8824, 1559.0156, 1291.3689, 1878.3527, 1533.6165, 1363.7462, 1456.8398, 2120.9346, 1456.6494]
2025-05-09 16:00:50,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:00:50,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 38 minutes, 44 seconds)
2025-05-09 16:03:29,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:03:46,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1498.85925 ± 403.708
2025-05-09 16:03:46,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2361.7742, 1232.4536, 1323.9768, 1148.989, 1240.5061, 1644.5707, 1216.7118, 2152.4817, 1256.8673, 1410.263]
2025-05-09 16:03:46,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:03:46,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 35 minutes, 39 seconds)
2025-05-09 16:06:26,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:06:44,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2095.04932 ± 527.204
2025-05-09 16:06:44,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2760.8176, 2083.3079, 1502.1426, 2855.4438, 1548.161, 2057.7246, 1861.3873, 1967.528, 1440.2975, 2873.685]
2025-05-09 16:06:44,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:06:44,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (2095.05) for latency MM1Queue_a033_s075
2025-05-09 16:06:44,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 16:06:44,102 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 16:06:44,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 32 minutes, 36 seconds)
2025-05-09 16:09:24,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:09:41,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1763.16675 ± 479.082
2025-05-09 16:09:41,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1319.8608, 1172.6311, 2086.1113, 1787.638, 1253.2788, 1851.3354, 2371.454, 2685.8208, 1734.266, 1369.2725]
2025-05-09 16:09:41,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:09:41,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 29 minutes, 39 seconds)
2025-05-09 16:12:21,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:12:38,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1845.99548 ± 537.642
2025-05-09 16:12:38,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1291.5248, 1773.692, 1444.2213, 2679.0667, 2384.403, 2170.579, 1275.8762, 1284.2155, 1540.2281, 2616.148]
2025-05-09 16:12:38,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:12:38,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 26 minutes, 40 seconds)
2025-05-09 16:15:18,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:15:36,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1690.56506 ± 349.517
2025-05-09 16:15:36,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2182.5815, 1267.2898, 2237.0352, 1918.7853, 1893.9453, 1308.0925, 1780.9688, 1643.207, 1310.3602, 1363.3849]
2025-05-09 16:15:36,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:15:36,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 23 minutes, 49 seconds)
2025-05-09 16:18:16,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:18:33,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1700.59058 ± 502.298
2025-05-09 16:18:33,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1507.7223, 1245.0085, 1511.4528, 1462.5492, 1593.019, 1206.5563, 1356.671, 2862.9724, 2344.5093, 1915.4459]
2025-05-09 16:18:33,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:18:33,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 20 minutes, 56 seconds)
2025-05-09 16:21:14,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:21:31,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1791.27759 ± 575.042
2025-05-09 16:21:31,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2969.9456, 1465.0048, 1164.1172, 1280.9047, 2308.7483, 1466.0398, 1210.7002, 1678.4958, 1934.6503, 2434.1672]
2025-05-09 16:21:31,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:21:31,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 18 minutes, 11 seconds)
2025-05-09 16:24:11,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:24:28,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2385.64331 ± 617.218
2025-05-09 16:24:28,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1916.4877, 1883.5038, 2543.6345, 3161.262, 1780.5155, 2796.3845, 1188.0924, 2950.5508, 2974.1663, 2661.8384]
2025-05-09 16:24:28,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:24:28,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (2385.64) for latency MM1Queue_a033_s075
2025-05-09 16:24:28,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 16:24:28,729 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 16:24:28,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 15 minutes, 11 seconds)
2025-05-09 16:27:08,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:27:25,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1844.79333 ± 527.708
2025-05-09 16:27:25,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1723.3794, 2179.6643, 1226.5903, 2854.2434, 1493.9633, 2485.407, 2128.3833, 1774.5529, 1400.9576, 1180.7933]
2025-05-09 16:27:25,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:27:25,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 12 minutes, 13 seconds)
2025-05-09 16:30:05,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:30:23,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2526.14941 ± 872.602
2025-05-09 16:30:23,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3138.1812, 3519.7502, 2694.2993, 1276.2831, 2920.2112, 3457.9453, 3474.9688, 1772.2296, 1304.3685, 1703.2582]
2025-05-09 16:30:23,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:30:23,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (2526.15) for latency MM1Queue_a033_s075
2025-05-09 16:30:23,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 16:30:23,123 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 16:30:23,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 9 minutes, 12 seconds)
2025-05-09 16:33:02,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:33:20,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1775.90076 ± 550.597
2025-05-09 16:33:20,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2531.4194, 1247.1693, 2376.6577, 2158.3481, 2525.6423, 1877.3396, 1309.0212, 1152.8325, 1399.381, 1181.1953]
2025-05-09 16:33:20,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:33:20,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 6 minutes, 14 seconds)
2025-05-09 16:36:00,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:36:17,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1797.39087 ± 422.643
2025-05-09 16:36:17,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1363.6895, 1444.0941, 1974.4335, 1282.7583, 2386.8828, 2055.6743, 1654.3657, 2509.612, 1961.435, 1340.9623]
2025-05-09 16:36:17,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:36:17,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 3 minutes, 7 seconds)
2025-05-09 16:38:57,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:39:14,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1957.93945 ± 748.449
2025-05-09 16:39:14,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2041.0343, 3195.8806, 1060.8926, 2593.8608, 1524.1653, 1408.9415, 3000.34, 1051.7544, 1385.8735, 2316.6516]
2025-05-09 16:39:14,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:39:14,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 10 seconds)
2025-05-09 16:41:54,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:42:12,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1791.34961 ± 388.701
2025-05-09 16:42:12,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1610.0035, 1589.2805, 1463.5507, 2533.2053, 1868.6127, 2275.2944, 1277.6575, 1456.978, 2175.0781, 1663.8352]
2025-05-09 16:42:12,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:42:12,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 57 minutes, 14 seconds)
2025-05-09 16:44:52,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:45:09,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2303.74023 ± 624.873
2025-05-09 16:45:09,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2711.3503, 1813.962, 3023.7026, 1489.1268, 2477.794, 3054.058, 1508.8811, 3154.6428, 1752.4441, 2051.4397]
2025-05-09 16:45:09,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:45:09,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 54 minutes, 19 seconds)
2025-05-09 16:47:49,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:48:06,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2471.75342 ± 671.483
2025-05-09 16:48:06,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2333.457, 3301.5857, 1773.4808, 3272.662, 2932.4946, 2180.0752, 2216.08, 1367.9819, 1958.3192, 3381.399]
2025-05-09 16:48:06,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:48:06,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 51 minutes, 21 seconds)
2025-05-09 16:50:46,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:51:03,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2195.48535 ± 562.835
2025-05-09 16:51:03,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3103.322, 2880.3362, 2067.9714, 2462.6833, 1690.9761, 1927.0227, 1839.6375, 1844.1515, 1301.7429, 2837.0093]
2025-05-09 16:51:03,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:51:03,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 48 minutes, 22 seconds)
2025-05-09 16:53:43,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:54:00,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2468.99780 ± 743.766
2025-05-09 16:54:00,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2379.0605, 3252.5544, 1312.1077, 3631.5527, 1830.0979, 2988.716, 1588.2798, 1893.633, 2746.9495, 3067.027]
2025-05-09 16:54:00,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:54:00,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 45 minutes, 23 seconds)
2025-05-09 16:56:40,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:56:58,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2364.11865 ± 833.814
2025-05-09 16:56:58,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3465.3186, 3081.7, 3532.7554, 1909.4478, 3249.0696, 1264.4103, 1406.7117, 2183.837, 1858.4645, 1689.4723]
2025-05-09 16:56:58,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:56:58,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 42 minutes, 28 seconds)
2025-05-09 16:59:38,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:59:55,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2463.51245 ± 925.732
2025-05-09 16:59:55,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3376.3225, 3280.2737, 3695.8574, 1882.4972, 1596.7067, 1261.6931, 3256.7317, 3175.8333, 1892.75, 1216.4598]
2025-05-09 16:59:55,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 16:59:55,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 39 minutes, 32 seconds)
2025-05-09 17:02:35,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:02:52,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2235.19287 ± 693.419
2025-05-09 17:02:52,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1482.4674, 3108.4617, 1987.6058, 3269.0977, 2176.5925, 1605.1533, 1554.114, 3160.681, 1493.4567, 2514.2961]
2025-05-09 17:02:52,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:02:52,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 36 minutes, 35 seconds)
2025-05-09 17:05:32,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:05:50,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2449.39600 ± 852.610
2025-05-09 17:05:50,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2447.9587, 2066.1294, 1187.1731, 1285.6284, 1530.1403, 2754.5085, 3145.334, 3273.4045, 3761.866, 3041.8176]
2025-05-09 17:05:50,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:05:50,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 33 minutes, 38 seconds)
2025-05-09 17:08:29,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:08:47,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2685.79980 ± 1067.017
2025-05-09 17:08:47,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1283.7316, 3742.6345, 1862.735, 3762.4094, 1479.47, 1752.6703, 3938.208, 3436.69, 3795.1526, 1804.2972]
2025-05-09 17:08:47,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:08:47,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (2685.80) for latency MM1Queue_a033_s075
2025-05-09 17:08:47,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 17:08:47,197 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 17:08:47,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 30 minutes, 40 seconds)
2025-05-09 17:11:26,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:11:44,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2157.31763 ± 640.663
2025-05-09 17:11:44,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1214.3102, 2537.367, 2048.0127, 1796.9978, 1798.9978, 2681.197, 2787.858, 2139.2275, 3308.2876, 1260.9207]
2025-05-09 17:11:44,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:11:44,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 27 minutes, 38 seconds)
2025-05-09 17:14:23,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:14:41,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2938.77612 ± 1171.172
2025-05-09 17:14:41,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2625.3462, 1342.7643, 4531.779, 4369.062, 2462.3022, 4020.4773, 2664.2627, 1311.1062, 1946.5779, 4114.084]
2025-05-09 17:14:41,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:14:41,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (2938.78) for latency MM1Queue_a033_s075
2025-05-09 17:14:41,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 17:14:41,025 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 17:14:41,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 24 minutes, 35 seconds)
2025-05-09 17:17:20,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:17:37,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1928.54041 ± 661.250
2025-05-09 17:17:37,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1550.4814, 3641.09, 2349.867, 1933.5442, 1801.087, 2219.9026, 1490.4178, 1512.8564, 1193.1993, 1592.9581]
2025-05-09 17:17:37,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:17:37,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 21 minutes, 34 seconds)
2025-05-09 17:20:17,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:20:34,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3012.39453 ± 1059.357
2025-05-09 17:20:34,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2051.6462, 3372.901, 4248.356, 3838.9683, 2151.3997, 1894.4202, 3914.1577, 1225.5216, 3013.7207, 4412.8525]
2025-05-09 17:20:34,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:20:34,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (3012.39) for latency MM1Queue_a033_s075
2025-05-09 17:20:34,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 17:20:34,537 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 17:20:34,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 18 minutes, 33 seconds)
2025-05-09 17:23:13,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:23:31,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1886.82349 ± 376.733
2025-05-09 17:23:31,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2696.4739, 2062.094, 1829.2616, 1764.7655, 1894.2567, 1685.5842, 1942.5314, 1547.1813, 2224.1187, 1221.967]
2025-05-09 17:23:31,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:23:31,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 15 minutes, 34 seconds)
2025-05-09 17:26:10,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:26:28,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2510.62622 ± 1017.396
2025-05-09 17:26:28,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [659.64514, 3309.9795, 2171.7363, 1674.6499, 1669.605, 2729.7693, 2575.2168, 4112.783, 3952.2068, 2250.6704]
2025-05-09 17:26:28,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:26:28,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 12 minutes, 38 seconds)
2025-05-09 17:29:07,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:29:25,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2647.65649 ± 1076.024
2025-05-09 17:29:25,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1809.9805, 1699.2062, 3969.7622, 1498.2467, 2826.1094, 2047.6262, 4347.355, 1270.677, 3139.7825, 3867.818]
2025-05-09 17:29:25,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:29:25,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 9 minutes, 41 seconds)
2025-05-09 17:32:04,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:32:22,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3141.79004 ± 1272.685
2025-05-09 17:32:22,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2247.1123, 4343.9546, 2041.1042, 2757.7502, 4134.996, 4304.5977, 4472.165, 1267.1768, 1363.1808, 4485.861]
2025-05-09 17:32:22,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:32:22,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (3141.79) for latency MM1Queue_a033_s075
2025-05-09 17:32:22,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 17:32:22,188 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 17:32:22,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 6 minutes, 45 seconds)
2025-05-09 17:35:01,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:35:19,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2607.56592 ± 1004.888
2025-05-09 17:35:19,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3098.2566, 1523.2045, 1596.3257, 1339.146, 1995.5219, 3843.6514, 3056.4854, 3881.8667, 3915.5024, 1825.7006]
2025-05-09 17:35:19,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:35:19,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 3 minutes, 50 seconds)
2025-05-09 17:37:58,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:38:16,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3106.14136 ± 954.612
2025-05-09 17:38:16,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2855.5454, 3452.4187, 3919.0493, 2121.731, 1791.6207, 2308.8896, 4320.5083, 3359.3164, 4718.8945, 2213.4407]
2025-05-09 17:38:16,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:38:16,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 54 seconds)
2025-05-09 17:40:55,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:41:12,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2184.13721 ± 525.737
2025-05-09 17:41:12,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1647.844, 2540.7024, 1488.3015, 1900.9081, 2509.063, 1857.8759, 1620.1031, 2841.5476, 3073.5393, 2361.4885]
2025-05-09 17:41:12,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:41:12,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 57 minutes, 56 seconds)
2025-05-09 17:43:52,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:44:09,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3339.70386 ± 709.565
2025-05-09 17:44:09,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4152.339, 4059.8652, 2342.257, 3608.4375, 2220.7244, 4099.787, 2752.8132, 2802.5798, 3897.6501, 3460.5845]
2025-05-09 17:44:09,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:44:09,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (3339.70) for latency MM1Queue_a033_s075
2025-05-09 17:44:09,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 17:44:09,878 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 17:44:09,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 54 minutes, 59 seconds)
2025-05-09 17:46:49,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:47:06,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2835.82959 ± 1260.924
2025-05-09 17:47:06,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1912.6088, 3110.409, 1839.9406, 4832.9014, 1452.6759, 1819.3743, 3269.5867, 4574.9233, 4155.8125, 1390.0653]
2025-05-09 17:47:06,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:47:06,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 52 minutes, 2 seconds)
2025-05-09 17:49:46,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:50:03,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3092.42822 ± 1073.709
2025-05-09 17:50:03,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4011.8672, 4053.0308, 1511.3448, 3932.8796, 1203.6842, 2018.664, 3513.6746, 3944.782, 3991.2566, 2743.0999]
2025-05-09 17:50:03,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:50:03,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 49 minutes, 4 seconds)
2025-05-09 17:52:43,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:53:00,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2962.93359 ± 1123.438
2025-05-09 17:53:00,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3430.4573, 3796.8674, 2592.8774, 3321.5815, 4785.9756, 1448.9595, 4326.985, 1786.7887, 1367.5461, 2771.2961]
2025-05-09 17:53:00,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:53:00,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 46 minutes, 9 seconds)
2025-05-09 17:55:40,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:55:57,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2064.87524 ± 894.489
2025-05-09 17:55:57,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3165.6565, 4011.1516, 1765.3708, 1324.9701, 1671.8085, 2850.4792, 1255.5698, 1314.6793, 1541.0786, 1747.9879]
2025-05-09 17:55:57,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:55:57,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 43 minutes, 14 seconds)
2025-05-09 17:58:37,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:58:54,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2760.04004 ± 733.911
2025-05-09 17:58:54,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1927.058, 3497.825, 3242.3948, 2239.2393, 3989.8823, 1403.9126, 3205.295, 2524.2158, 2922.0986, 2648.4792]
2025-05-09 17:58:54,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 17:58:54,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 40 minutes, 18 seconds)
2025-05-09 18:01:34,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:01:52,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2104.36255 ± 701.542
2025-05-09 18:01:52,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2695.9436, 1556.4613, 1611.619, 1806.4895, 1840.696, 3309.932, 2585.857, 3037.7063, 1203.7849, 1395.135]
2025-05-09 18:01:52,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:01:52,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 37 minutes, 23 seconds)
2025-05-09 18:04:32,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:04:49,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3097.97827 ± 1512.042
2025-05-09 18:04:49,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4585.084, 1320.1339, 1441.0441, 4283.4883, 2096.5347, 1323.6769, 4766.8604, 4722.7983, 1853.7771, 4586.387]
2025-05-09 18:04:49,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:04:49,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 34 minutes, 29 seconds)
2025-05-09 18:07:29,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:07:46,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3268.49487 ± 1156.121
2025-05-09 18:07:46,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1917.9398, 4396.0815, 1988.8395, 4307.299, 4503.6553, 1746.5685, 3942.5784, 4306.3906, 1860.859, 3714.739]
2025-05-09 18:07:46,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:07:46,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 31 minutes, 32 seconds)
2025-05-09 18:10:26,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:10:43,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3172.76709 ± 1302.702
2025-05-09 18:10:43,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4936.27, 1349.2346, 4815.5615, 4436.203, 2426.5195, 2229.77, 3136.276, 2058.6772, 4502.1313, 1837.0277]
2025-05-09 18:10:43,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:10:43,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 28 minutes, 36 seconds)
2025-05-09 18:13:24,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:13:41,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2616.08057 ± 1230.516
2025-05-09 18:13:41,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2098.7314, 2695.5684, 4416.495, 1591.962, 1682.6769, 2927.079, 1126.5967, 4804.947, 1287.3319, 3529.418]
2025-05-09 18:13:41,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:13:41,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 25 minutes, 43 seconds)
2025-05-09 18:16:21,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:16:38,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2345.89087 ± 1102.842
2025-05-09 18:16:38,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1260.4188, 1995.7471, 2250.3923, 2281.4229, 4242.761, 4536.183, 2516.6494, 1604.3142, 1506.3875, 1264.6305]
2025-05-09 18:16:38,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:16:38,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 22 minutes, 44 seconds)
2025-05-09 18:19:18,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:19:35,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3545.35547 ± 1081.610
2025-05-09 18:19:35,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3906.0361, 4441.0396, 1593.8407, 2042.2095, 4424.603, 4414.387, 4512.1694, 2778.3992, 4569.3433, 2771.5264]
2025-05-09 18:19:35,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:19:35,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (3545.36) for latency MM1Queue_a033_s075
2025-05-09 18:19:35,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 18:19:35,044 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 18:19:35,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 19 minutes, 42 seconds)
2025-05-09 18:22:14,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:22:31,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4437.22803 ± 616.023
2025-05-09 18:22:31,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4740.7837, 2649.8066, 4762.3477, 4599.976, 4727.588, 4701.1353, 4682.901, 4660.0684, 4187.926, 4659.7515]
2025-05-09 18:22:31,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:22:31,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (4437.23) for latency MM1Queue_a033_s075
2025-05-09 18:22:31,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 18:22:31,339 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 18:22:31,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 16 minutes, 40 seconds)
2025-05-09 18:25:10,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:25:27,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3562.32886 ± 1184.786
2025-05-09 18:25:27,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4236.4565, 4455.002, 4921.678, 2032.7268, 2936.294, 3889.515, 2695.9539, 4612.583, 4568.4336, 1274.6456]
2025-05-09 18:25:27,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:25:27,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 13 minutes, 38 seconds)
2025-05-09 18:28:07,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:28:23,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3601.93115 ± 971.703
2025-05-09 18:28:23,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4286.3906, 4799.479, 2042.3103, 4386.7407, 4594.766, 3934.8809, 4144.318, 2925.4014, 2439.6018, 2465.4243]
2025-05-09 18:28:23,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:28:23,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 10 minutes, 34 seconds)
2025-05-09 18:31:03,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:31:20,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2738.21997 ± 1037.439
2025-05-09 18:31:20,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3822.4556, 1176.4716, 3286.5603, 1977.3083, 1349.3977, 3244.4326, 2237.3813, 2874.4885, 4691.11, 2722.5933]
2025-05-09 18:31:20,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:31:20,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 7 minutes, 34 seconds)
2025-05-09 18:33:59,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:34:15,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3944.37061 ± 1353.724
2025-05-09 18:34:15,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1959.7423, 4144.4536, 5044.391, 4811.7236, 1907.819, 4984.6855, 4764.6895, 1880.1722, 5201.808, 4744.222]
2025-05-09 18:34:15,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:34:15,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 4 minutes, 36 seconds)
2025-05-09 18:36:55,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:37:12,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3518.19482 ± 1211.419
2025-05-09 18:37:12,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1596.6841, 2476.5356, 1741.5103, 2599.2043, 4933.578, 4497.4707, 4426.328, 4656.1274, 3936.014, 4318.4966]
2025-05-09 18:37:12,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:37:12,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 1 minute, 39 seconds)
2025-05-09 18:39:51,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:40:08,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3272.26636 ± 1191.697
2025-05-09 18:40:08,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2475.983, 2531.2534, 4594.575, 1221.0983, 4397.3936, 3574.703, 4006.751, 4914.169, 1701.075, 3305.6648]
2025-05-09 18:40:08,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:40:08,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 58 minutes, 43 seconds)
2025-05-09 18:42:47,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:43:04,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4102.69629 ± 796.231
2025-05-09 18:43:04,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4773.087, 2231.7336, 4651.5312, 4571.128, 4325.841, 3022.9016, 4719.222, 4582.3267, 4219.1284, 3930.062]
2025-05-09 18:43:04,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:43:04,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 55 minutes, 47 seconds)
2025-05-09 18:45:44,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:46:01,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3685.61011 ± 1407.486
2025-05-09 18:46:01,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1218.5011, 3830.3518, 5050.3335, 4848.635, 2488.2598, 4817.7153, 4842.409, 1447.1241, 3456.412, 4856.3574]
2025-05-09 18:46:01,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:46:01,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 52 minutes, 51 seconds)
2025-05-09 18:48:40,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:48:57,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3970.08472 ± 508.666
2025-05-09 18:48:57,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4576.9683, 3758.3164, 3680.438, 4408.177, 3806.4841, 3270.4832, 4625.299, 3675.4082, 4607.38, 3291.8936]
2025-05-09 18:48:57,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:48:57,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 49 minutes, 56 seconds)
2025-05-09 18:51:36,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:51:53,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4093.23755 ± 1154.512
2025-05-09 18:51:53,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4878.8354, 4738.9834, 4844.924, 1552.3395, 2581.944, 4172.7227, 4961.8164, 3266.9443, 5065.474, 4868.391]
2025-05-09 18:51:53,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:51:53,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 47 minutes)
2025-05-09 18:54:33,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:54:50,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3994.34814 ± 1267.911
2025-05-09 18:54:50,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4942.843, 4771.0073, 4893.487, 2116.723, 4331.8394, 1217.1665, 4759.424, 4808.142, 4812.4927, 3290.3547]
2025-05-09 18:54:50,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:54:50,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 44 minutes, 6 seconds)
2025-05-09 18:57:30,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:57:46,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4660.92871 ± 448.104
2025-05-09 18:57:46,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4180.1743, 5134.2534, 4790.7354, 5094.334, 4745.654, 3590.4143, 4453.52, 4881.631, 4959.799, 4778.7715]
2025-05-09 18:57:46,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 18:57:46,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (4660.93) for latency MM1Queue_a033_s075
2025-05-09 18:57:46,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-09 18:57:46,721 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc8/noisy-halfcheetah/MM1Queue_a033_s075-bpql-mem16/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 18:57:46,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 41 minutes, 9 seconds)
2025-05-09 19:00:26,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:00:43,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3797.12891 ± 1419.606
2025-05-09 19:00:43,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1252.0444, 1382.3489, 5104.5024, 4514.286, 4818.1416, 2529.838, 4812.095, 5015.595, 4208.7114, 4333.7266]
2025-05-09 19:00:43,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:00:43,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 38 minutes, 13 seconds)
2025-05-09 19:03:22,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:03:39,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3419.38818 ± 1055.775
2025-05-09 19:03:39,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4670.742, 2002.3368, 3012.7761, 3305.2664, 2303.3672, 4545.9644, 4466.974, 4263.6924, 3890.7397, 1732.0232]
2025-05-09 19:03:39,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:03:39,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 35 minutes, 17 seconds)
2025-05-09 19:06:20,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:06:36,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3264.23096 ± 1485.610
2025-05-09 19:06:36,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2081.62, 2845.0244, 3787.6523, 5210.2427, 1353.7719, 5002.9517, 2008.5004, 3881.9797, 5210.379, 1260.189]
2025-05-09 19:06:36,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:06:37,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 32 minutes, 23 seconds)
2025-05-09 19:09:17,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:09:34,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4272.45508 ± 677.573
2025-05-09 19:09:34,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4895.0293, 4582.566, 4233.71, 4597.876, 2546.015, 4218.963, 4405.556, 4893.4297, 3630.55, 4720.857]
2025-05-09 19:09:34,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:09:34,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 29 minutes, 28 seconds)
2025-05-09 19:12:14,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:12:31,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3881.12305 ± 1255.472
2025-05-09 19:12:31,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4646.5273, 3200.7393, 5145.4243, 1373.2528, 4586.0527, 4794.983, 1953.119, 4737.121, 3532.6604, 4841.3516]
2025-05-09 19:12:31,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:12:31,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 26 minutes, 32 seconds)
2025-05-09 19:15:11,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:15:28,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3772.92578 ± 1237.409
2025-05-09 19:15:28,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2674.734, 4032.985, 4415.81, 5259.907, 4374.5044, 5169.8057, 3212.486, 2258.229, 1469.2251, 4861.569]
2025-05-09 19:15:28,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:15:28,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 23 minutes, 36 seconds)
2025-05-09 19:18:09,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:18:25,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4249.43896 ± 1077.764
2025-05-09 19:18:25,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4949.4614, 4285.925, 4732.535, 4770.695, 4957.0747, 1379.4769, 4932.681, 4246.333, 4951.246, 3288.9622]
2025-05-09 19:18:25,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:18:25,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 20 minutes, 40 seconds)
2025-05-09 19:21:06,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:21:22,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4169.09229 ± 1031.030
2025-05-09 19:21:22,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5015.078, 3874.4666, 4916.56, 3369.3904, 4741.4287, 4977.8975, 2987.3596, 5176.788, 1941.1655, 4690.7886]
2025-05-09 19:21:22,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:21:23,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 17 minutes, 43 seconds)
2025-05-09 19:24:03,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:24:19,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3774.62451 ± 981.639
2025-05-09 19:24:19,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4488.3975, 3144.214, 4820.7534, 4923.6807, 4513.856, 2531.7107, 4229.7886, 3022.729, 1949.6726, 4121.4463]
2025-05-09 19:24:19,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:24:19,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 14 minutes, 45 seconds)
2025-05-09 19:27:00,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:27:17,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3780.81519 ± 1383.853
2025-05-09 19:27:17,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1228.8921, 4649.611, 4908.8267, 4894.6353, 2141.632, 4642.9375, 4217.615, 4439.9966, 4911.9956, 1772.0099]
2025-05-09 19:27:17,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:27:17,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 11 minutes, 48 seconds)
2025-05-09 19:29:57,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:30:14,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4366.00537 ± 807.043
2025-05-09 19:30:14,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4810.6323, 4629.8203, 4972.1772, 4243.6504, 4715.8877, 4847.8066, 4792.7173, 3193.764, 2474.694, 4978.9014]
2025-05-09 19:30:14,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:30:14,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 8 minutes, 51 seconds)
2025-05-09 19:32:55,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:33:11,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4384.81543 ± 1044.076
2025-05-09 19:33:11,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5194.669, 4054.116, 4779.244, 3367.9663, 5178.85, 1721.7345, 4516.009, 5052.2534, 4881.041, 5102.272]
2025-05-09 19:33:11,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:33:11,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 54 seconds)
2025-05-09 19:35:53,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:36:10,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4274.54541 ± 1180.853
2025-05-09 19:36:10,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5071.292, 2788.44, 5145.5845, 4723.557, 4861.855, 3825.087, 4700.8594, 5141.733, 1436.0679, 5050.9795]
2025-05-09 19:36:10,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:36:10,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 57 seconds)
2025-05-09 19:38:51,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:39:08,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3704.13477 ± 991.241
2025-05-09 19:39:08,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4126.5737, 4009.376, 1628.5116, 3279.8198, 4545.624, 2796.8977, 4874.8184, 5084.3203, 3297.5579, 3397.8462]
2025-05-09 19:39:08,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 19:39:08,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1251 [DEBUG]: Training session finished
