2025-09-14 08:43:01,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.075-delay_9
2025-09-14 08:43:01,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.075-delay_9
2025-09-14 08:43:01,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'9': <latency_env.delayed_mdp.ConstantDelay object at 0x7f7997423da0>}
2025-09-14 08:43:01,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 08:43:01,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 08:43:01,628 baseline-bpql-noisepromille75-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=71, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 08:43:01,628 baseline-bpql-noisepromille75-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 08:43:03,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 08:43:03,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 08:45:34,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 08:45:40,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -402.79828 ± 44.673
2025-09-14 08:45:40,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-329.4656), np.float32(-409.12115), np.float32(-350.45572), np.float32(-475.04352), np.float32(-454.60773), np.float32(-374.8716), np.float32(-425.67413), np.float32(-367.49945), np.float32(-401.76578), np.float32(-439.478)]
2025-09-14 08:45:40,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:45:40,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-402.80) for latency 9
2025-09-14 08:45:40,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 18 minutes, 35 seconds)
2025-09-14 08:48:12,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 08:48:19,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -244.64372 ± 46.065
2025-09-14 08:48:19,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-250.71262), np.float32(-238.40279), np.float32(-327.10504), np.float32(-243.16379), np.float32(-266.46045), np.float32(-239.28049), np.float32(-189.88574), np.float32(-298.80594), np.float32(-156.10448), np.float32(-236.5158)]
2025-09-14 08:48:19,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:48:19,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-244.64) for latency 9
2025-09-14 08:48:19,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 17 minutes, 41 seconds)
2025-09-14 08:51:00,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 08:51:06,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -131.31361 ± 76.921
2025-09-14 08:51:06,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-137.88045), np.float32(-106.51088), np.float32(48.150074), np.float32(-111.43138), np.float32(-238.18593), np.float32(-220.91135), np.float32(-187.2002), np.float32(-93.158905), np.float32(-163.60484), np.float32(-102.40225)]
2025-09-14 08:51:06,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:51:06,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-131.31) for latency 9
2025-09-14 08:51:06,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 20 minutes, 31 seconds)
2025-09-14 08:53:47,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 08:53:54,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 182.44327 ± 189.112
2025-09-14 08:53:54,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(209.41234), np.float32(113.75819), np.float32(306.71576), np.float32(65.596176), np.float32(431.81143), np.float32(527.35223), np.float32(-17.605726), np.float32(254.85562), np.float32(6.3776712), np.float32(-73.84095)]
2025-09-14 08:53:54,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:53:54,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (182.44) for latency 9
2025-09-14 08:53:54,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 20 minutes, 19 seconds)
2025-09-14 08:56:41,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 08:56:49,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 355.24182 ± 198.792
2025-09-14 08:56:49,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(722.11487), np.float32(211.05565), np.float32(56.936012), np.float32(553.9553), np.float32(211.51656), np.float32(263.0184), np.float32(445.25043), np.float32(215.4569), np.float32(292.86182), np.float32(580.2522)]
2025-09-14 08:56:49,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:56:49,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (355.24) for latency 9
2025-09-14 08:56:49,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 21 minutes, 35 seconds)
2025-09-14 09:00:01,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:00:09,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 981.87549 ± 515.784
2025-09-14 09:00:09,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1555.4662), np.float32(1407.3225), np.float32(1312.5604), np.float32(608.9777), np.float32(1260.354), np.float32(1102.026), np.float32(247.4426), np.float32(369.9749), np.float32(1646.6719), np.float32(307.9589)]
2025-09-14 09:00:09,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:00:09,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (981.88) for latency 9
2025-09-14 09:00:09,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 32 minutes, 28 seconds)
2025-09-14 09:03:25,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:03:33,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1431.12769 ± 337.008
2025-09-14 09:03:33,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1887.1461), np.float32(1240.1747), np.float32(986.98865), np.float32(1800.9608), np.float32(769.9059), np.float32(1370.5194), np.float32(1441.2135), np.float32(1551.2366), np.float32(1527.8282), np.float32(1735.3025)]
2025-09-14 09:03:33,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:03:33,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1431.13) for latency 9
2025-09-14 09:03:33,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 43 minutes, 35 seconds)
2025-09-14 09:06:44,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:06:52,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1365.81482 ± 330.127
2025-09-14 09:06:52,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1340.4231), np.float32(1491.6309), np.float32(1102.1493), np.float32(1135.423), np.float32(1854.3861), np.float32(1507.2521), np.float32(1032.4209), np.float32(1741.9844), np.float32(778.7809), np.float32(1673.697)]
2025-09-14 09:06:52,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:06:52,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 49 minutes, 59 seconds)
2025-09-14 09:10:01,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:10:09,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1028.93188 ± 277.586
2025-09-14 09:10:09,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(847.28375), np.float32(1038.4988), np.float32(1196.6151), np.float32(704.5333), np.float32(648.96875), np.float32(993.87177), np.float32(872.27155), np.float32(1352.3837), np.float32(1032.0922), np.float32(1602.8003)]
2025-09-14 09:10:09,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:10:09,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 55 minutes, 52 seconds)
2025-09-14 09:13:17,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:13:25,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1307.58228 ± 540.496
2025-09-14 09:13:25,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(735.7593), np.float32(934.1691), np.float32(1184.8137), np.float32(2021.4232), np.float32(1266.8146), np.float32(1675.5309), np.float32(794.2692), np.float32(818.4055), np.float32(1205.6515), np.float32(2438.9856)]
2025-09-14 09:13:25,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:13:25,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 58 minutes, 55 seconds)
2025-09-14 09:16:34,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:16:42,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1281.24072 ± 384.767
2025-09-14 09:16:42,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1690.4587), np.float32(2009.5687), np.float32(830.43756), np.float32(777.0495), np.float32(830.6419), np.float32(1323.1058), np.float32(1620.1956), np.float32(1224.9583), np.float32(1234.8439), np.float32(1271.1472)]
2025-09-14 09:16:42,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:16:42,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 54 minutes, 34 seconds)
2025-09-14 09:19:51,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:20:00,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1723.53650 ± 599.617
2025-09-14 09:20:00,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1139.0005), np.float32(2482.3066), np.float32(2356.8), np.float32(1366.1166), np.float32(1045.4983), np.float32(1469.3904), np.float32(2489.7083), np.float32(1635.1592), np.float32(2336.6487), np.float32(914.73645)]
2025-09-14 09:20:00,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:20:00,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1723.54) for latency 9
2025-09-14 09:20:00,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 49 minutes, 25 seconds)
2025-09-14 09:23:21,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:23:31,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1602.02747 ± 496.341
2025-09-14 09:23:31,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(979.41187), np.float32(2294.7773), np.float32(1979.716), np.float32(1034.369), np.float32(1247.8396), np.float32(1553.2913), np.float32(1309.5254), np.float32(1493.5134), np.float32(1574.388), np.float32(2553.4424)]
2025-09-14 09:23:31,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:23:31,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 49 minutes, 33 seconds)
2025-09-14 09:26:52,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:27:01,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1357.15479 ± 745.941
2025-09-14 09:27:01,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1987.3394), np.float32(1509.4607), np.float32(1680.9993), np.float32(1808.801), np.float32(2782.5461), np.float32(1042.8708), np.float32(969.98816), np.float32(905.0209), np.float32(1023.71906), np.float32(-139.19771)]
2025-09-14 09:27:01,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:27:01,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 50 minutes, 3 seconds)
2025-09-14 09:30:21,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:30:30,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2068.59961 ± 597.273
2025-09-14 09:30:30,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1813.4985), np.float32(1218.28), np.float32(1553.2716), np.float32(2644.0376), np.float32(1211.9152), np.float32(2393.8691), np.float32(1860.6538), np.float32(2290.8418), np.float32(2728.1636), np.float32(2971.4656)]
2025-09-14 09:30:30,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:30:30,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2068.60) for latency 9
2025-09-14 09:30:30,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 50 minutes, 24 seconds)
2025-09-14 09:33:41,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:33:48,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1263.01392 ± 250.927
2025-09-14 09:33:48,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1381.8242), np.float32(1197.9171), np.float32(981.9003), np.float32(1736.0283), np.float32(1102.9136), np.float32(1014.77686), np.float32(1228.6975), np.float32(1134.9028), np.float32(1150.9813), np.float32(1700.1963)]
2025-09-14 09:33:48,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:33:48,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 47 minutes, 20 seconds)
2025-09-14 09:36:36,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:36:42,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1748.61389 ± 692.256
2025-09-14 09:36:42,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2990.1077), np.float32(1665.198), np.float32(1249.1763), np.float32(1451.9877), np.float32(2902.2742), np.float32(1181.2532), np.float32(2333.084), np.float32(1063.0514), np.float32(1528.931), np.float32(1121.0756)]
2025-09-14 09:36:42,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:36:42,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 37 minutes, 20 seconds)
2025-09-14 09:39:15,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:39:21,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1972.43298 ± 483.486
2025-09-14 09:39:21,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1863.021), np.float32(1632.5535), np.float32(2956.5679), np.float32(1852.7603), np.float32(2061.1067), np.float32(1633.2802), np.float32(1164.996), np.float32(1754.8496), np.float32(2534.8687), np.float32(2270.3257)]
2025-09-14 09:39:21,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:39:21,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 19 minutes, 49 seconds)
2025-09-14 09:41:53,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:42:00,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1840.13867 ± 704.440
2025-09-14 09:42:00,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3054.3086), np.float32(2845.3823), np.float32(1062.7728), np.float32(2687.211), np.float32(1329.0115), np.float32(1240.0024), np.float32(1473.4641), np.float32(1416.5161), np.float32(1376.747), np.float32(1915.9713)]
2025-09-14 09:42:00,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:42:00,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 2 minutes, 38 seconds)
2025-09-14 09:44:49,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:44:58,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1966.23047 ± 628.162
2025-09-14 09:44:58,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1882.8254), np.float32(1779.4156), np.float32(2914.9065), np.float32(2047.7728), np.float32(1506.9264), np.float32(2750.737), np.float32(2830.818), np.float32(1187.4001), np.float32(1124.5107), np.float32(1636.9932)]
2025-09-14 09:44:58,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:44:58,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 51 minutes, 21 seconds)
2025-09-14 09:48:21,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:48:30,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1445.80542 ± 460.944
2025-09-14 09:48:30,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1020.3609), np.float32(1932.64), np.float32(1536.0968), np.float32(1202.3176), np.float32(552.8537), np.float32(1205.9312), np.float32(1364.0883), np.float32(2209.3145), np.float32(1563.3057), np.float32(1871.1443)]
2025-09-14 09:48:30,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:48:30,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 52 minutes, 13 seconds)
2025-09-14 09:51:54,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:52:04,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1994.60083 ± 656.461
2025-09-14 09:52:04,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3161.5522), np.float32(1692.6123), np.float32(1887.2582), np.float32(1123.6064), np.float32(3105.988), np.float32(1443.6459), np.float32(2270.8315), np.float32(1529.9352), np.float32(1532.207), np.float32(2198.3704)]
2025-09-14 09:52:04,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:52:04,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 59 minutes, 31 seconds)
2025-09-14 09:55:27,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:55:37,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1946.19360 ± 544.419
2025-09-14 09:55:37,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1500.8718), np.float32(1078.9291), np.float32(2669.1223), np.float32(2021.837), np.float32(2044.9753), np.float32(2905.1348), np.float32(1838.2505), np.float32(2366.0703), np.float32(1488.5306), np.float32(1548.2136)]
2025-09-14 09:55:37,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:55:37,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 10 minutes, 24 seconds)
2025-09-14 09:59:01,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:59:10,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1945.35486 ± 670.194
2025-09-14 09:59:10,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1293.8776), np.float32(2885.8152), np.float32(1524.6388), np.float32(2838.9978), np.float32(2216.2063), np.float32(1311.3618), np.float32(1940.641), np.float32(1460.3639), np.float32(1126.9838), np.float32(2854.662)]
2025-09-14 09:59:10,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:59:10,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 21 minutes, 3 seconds)
2025-09-14 10:02:35,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:02:44,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1908.36450 ± 637.855
2025-09-14 10:02:44,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1669.014), np.float32(1417.6687), np.float32(1482.2643), np.float32(1303.8434), np.float32(2069.781), np.float32(1408.018), np.float32(2686.9438), np.float32(3363.678), np.float32(1514.1278), np.float32(2168.3064)]
2025-09-14 10:02:44,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:02:44,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 26 minutes, 33 seconds)
2025-09-14 10:06:08,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:06:18,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1782.22266 ± 562.319
2025-09-14 10:06:18,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1516.499), np.float32(1998.0012), np.float32(1174.8601), np.float32(1127.7205), np.float32(2353.4734), np.float32(1903.0747), np.float32(1883.6343), np.float32(1314.5532), np.float32(1503.3242), np.float32(3047.085)]
2025-09-14 10:06:18,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:06:18,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 23 minutes, 16 seconds)
2025-09-14 10:09:42,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:09:51,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1765.45996 ± 682.880
2025-09-14 10:09:51,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2215.4133), np.float32(3077.962), np.float32(1142.1436), np.float32(1194.1039), np.float32(1139.0994), np.float32(1442.1759), np.float32(1504.6174), np.float32(1216.7279), np.float32(2834.1567), np.float32(1888.2001)]
2025-09-14 10:09:51,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:09:51,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 19 minutes, 43 seconds)
2025-09-14 10:13:15,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:13:24,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1943.16016 ± 711.064
2025-09-14 10:13:24,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2778.3218), np.float32(1195.4904), np.float32(1562.8896), np.float32(1737.4275), np.float32(1514.5583), np.float32(1835.4945), np.float32(1177.9845), np.float32(3004.035), np.float32(3159.2605), np.float32(1466.1412)]
2025-09-14 10:13:24,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:13:24,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 16 minutes, 14 seconds)
2025-09-14 10:16:49,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:16:59,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1746.11487 ± 533.626
2025-09-14 10:16:59,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2992.3071), np.float32(1495.1804), np.float32(1356.115), np.float32(2143.5605), np.float32(1974.8843), np.float32(1225.1133), np.float32(1294.9641), np.float32(1600.3907), np.float32(1261.2188), np.float32(2117.415)]
2025-09-14 10:16:59,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:16:59,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 12 minutes, 49 seconds)
2025-09-14 10:20:23,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:20:32,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1806.62622 ± 697.872
2025-09-14 10:20:32,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1484.3804), np.float32(2124.1042), np.float32(1420.303), np.float32(3386.0203), np.float32(2723.1902), np.float32(1209.8694), np.float32(1662.0708), np.float32(1098.4601), np.float32(1268.563), np.float32(1689.3018)]
2025-09-14 10:20:32,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:20:32,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 9 minutes, 8 seconds)
2025-09-14 10:23:56,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:24:06,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1821.30273 ± 767.201
2025-09-14 10:24:06,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2384.4973), np.float32(1269.1582), np.float32(3403.2214), np.float32(1917.9193), np.float32(313.81607), np.float32(2132.4478), np.float32(1963.95), np.float32(1744.1027), np.float32(1225.3976), np.float32(1858.5178)]
2025-09-14 10:24:06,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:24:06,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 5 minutes, 38 seconds)
2025-09-14 10:27:29,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:27:38,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2387.89917 ± 808.481
2025-09-14 10:27:38,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1858.993), np.float32(2075.4026), np.float32(3354.603), np.float32(2433.9917), np.float32(3320.8833), np.float32(1848.0741), np.float32(3250.3267), np.float32(3224.8638), np.float32(1419.4641), np.float32(1092.3885)]
2025-09-14 10:27:38,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:27:38,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2387.90) for latency 9
2025-09-14 10:27:38,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 1 minute, 52 seconds)
2025-09-14 10:31:01,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:31:11,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2776.71606 ± 707.934
2025-09-14 10:31:11,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3527.796), np.float32(2664.5793), np.float32(1501.1018), np.float32(2588.0522), np.float32(3412.0957), np.float32(2928.8674), np.float32(1993.6438), np.float32(3658.6438), np.float32(2056.081), np.float32(3436.3008)]
2025-09-14 10:31:11,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:31:11,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2776.72) for latency 9
2025-09-14 10:31:11,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 58 minutes, 6 seconds)
2025-09-14 10:34:35,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:34:44,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2334.32861 ± 756.005
2025-09-14 10:34:44,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3591.9448), np.float32(3563.0515), np.float32(1888.4398), np.float32(2167.7017), np.float32(2007.7894), np.float32(2102.6577), np.float32(3092.0837), np.float32(1252.4672), np.float32(1815.7968), np.float32(1861.3518)]
2025-09-14 10:34:44,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:34:44,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 54 minutes, 28 seconds)
2025-09-14 10:38:09,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:38:18,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2569.06128 ± 863.459
2025-09-14 10:38:18,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1738.7208), np.float32(3365.204), np.float32(1898.5616), np.float32(3116.523), np.float32(1302.8636), np.float32(3312.097), np.float32(2746.9902), np.float32(3222.9377), np.float32(3667.452), np.float32(1319.263)]
2025-09-14 10:38:18,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:38:18,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 50 minutes, 59 seconds)
2025-09-14 10:41:42,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:41:51,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1769.60547 ± 600.992
2025-09-14 10:41:51,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3080.1648), np.float32(2783.2659), np.float32(1432.3364), np.float32(1477.9136), np.float32(1202.8027), np.float32(1363.54), np.float32(1511.1381), np.float32(1675.6787), np.float32(1722.0299), np.float32(1447.1846)]
2025-09-14 10:41:51,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:41:51,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 47 minutes, 21 seconds)
2025-09-14 10:45:15,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:45:24,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2302.37646 ± 911.886
2025-09-14 10:45:24,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3390.9146), np.float32(1437.4653), np.float32(1683.6238), np.float32(1776.4481), np.float32(3422.7083), np.float32(3413.6013), np.float32(1648.1552), np.float32(3400.9978), np.float32(1595.4232), np.float32(1254.4282)]
2025-09-14 10:45:24,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:45:24,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 43 minutes, 52 seconds)
2025-09-14 10:48:47,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:48:56,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2283.38965 ± 676.737
2025-09-14 10:48:56,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1456.6848), np.float32(3026.244), np.float32(1320.2524), np.float32(2996.3406), np.float32(2468.201), np.float32(1724.5846), np.float32(1641.7335), np.float32(3055.7131), np.float32(2132.057), np.float32(3012.0857)]
2025-09-14 10:48:56,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:48:56,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 40 minutes, 9 seconds)
2025-09-14 10:52:18,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:52:28,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2079.80518 ± 924.037
2025-09-14 10:52:28,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1750.2035), np.float32(1147.2074), np.float32(3756.9546), np.float32(3789.0115), np.float32(2037.0243), np.float32(1451.3091), np.float32(2163.5645), np.float32(2201.7986), np.float32(1124.821), np.float32(1376.1556)]
2025-09-14 10:52:28,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:52:28,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 36 minutes, 12 seconds)
2025-09-14 10:55:46,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:55:55,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2313.19019 ± 1004.606
2025-09-14 10:55:55,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4052.403), np.float32(1079.8701), np.float32(1730.5453), np.float32(1760.3986), np.float32(1400.5035), np.float32(3681.6338), np.float32(2605.2188), np.float32(3443.9043), np.float32(1527.5167), np.float32(1849.9092)]
2025-09-14 10:55:55,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:55:55,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 31 minutes, 22 seconds)
2025-09-14 10:59:13,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:59:22,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2489.73975 ± 644.799
2025-09-14 10:59:22,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3750.7073), np.float32(1428.9768), np.float32(2701.1333), np.float32(2628.5552), np.float32(3339.3523), np.float32(2051.1057), np.float32(2059.6602), np.float32(2632.9324), np.float32(2027.4839), np.float32(2277.4878)]
2025-09-14 10:59:22,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:59:22,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 26 minutes, 38 seconds)
2025-09-14 11:02:28,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:02:36,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2733.86304 ± 667.313
2025-09-14 11:02:36,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2133.5186), np.float32(2399.2327), np.float32(3370.6653), np.float32(3560.592), np.float32(2695.6567), np.float32(3369.997), np.float32(1920.9829), np.float32(1919.0038), np.float32(2265.0645), np.float32(3703.9175)]
2025-09-14 11:02:36,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:02:37,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 19 minutes, 34 seconds)
2025-09-14 11:05:40,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:05:49,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3245.88867 ± 573.403
2025-09-14 11:05:49,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3609.901), np.float32(3791.5925), np.float32(2008.6012), np.float32(3462.2214), np.float32(3544.8354), np.float32(3486.471), np.float32(2721.4658), np.float32(3694.2732), np.float32(3615.0034), np.float32(2524.52)]
2025-09-14 11:05:49,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:05:49,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3245.89) for latency 9
2025-09-14 11:05:49,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 12 minutes, 28 seconds)
2025-09-14 11:08:42,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:08:49,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2939.71362 ± 790.308
2025-09-14 11:08:49,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2711.2954), np.float32(1315.671), np.float32(3593.4688), np.float32(1746.6744), np.float32(2905.5713), np.float32(3854.8328), np.float32(3761.0032), np.float32(3019.6116), np.float32(3236.364), np.float32(3252.6448)]
2025-09-14 11:08:49,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:08:49,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 3 minutes, 15 seconds)
2025-09-14 11:11:24,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:11:31,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2845.20093 ± 765.108
2025-09-14 11:11:31,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2489.7114), np.float32(2938.2725), np.float32(2416.6775), np.float32(1145.6395), np.float32(2945.7034), np.float32(3609.2646), np.float32(3429.8027), np.float32(2198.659), np.float32(3766.3772), np.float32(3511.9016)]
2025-09-14 11:11:31,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:11:31,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 51 minutes, 31 seconds)
2025-09-14 11:14:00,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:14:07,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2522.31445 ± 814.713
2025-09-14 11:14:07,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1762.9493), np.float32(3314.7952), np.float32(1748.4553), np.float32(1362.6827), np.float32(2226.3606), np.float32(3495.6255), np.float32(3400.0327), np.float32(3018.093), np.float32(1602.4204), np.float32(3291.7285)]
2025-09-14 11:14:07,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:14:07,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 39 minutes, 14 seconds)
2025-09-14 11:16:37,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:16:44,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2941.71143 ± 959.225
2025-09-14 11:16:44,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3785.9893), np.float32(3466.5164), np.float32(2199.9644), np.float32(2615.4185), np.float32(1196.8276), np.float32(3952.0984), np.float32(3574.7605), np.float32(3553.5237), np.float32(3628.3657), np.float32(1443.6521)]
2025-09-14 11:16:44,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:16:44,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 29 minutes, 38 seconds)
2025-09-14 11:19:13,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:19:20,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2827.24634 ± 759.767
2025-09-14 11:19:20,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1347.9174), np.float32(2272.2031), np.float32(2658.0383), np.float32(2073.8254), np.float32(3884.903), np.float32(3208.4192), np.float32(3416.9104), np.float32(2530.8445), np.float32(3778.826), np.float32(3100.5745)]
2025-09-14 11:19:20,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:19:20,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 20 minutes, 35 seconds)
2025-09-14 11:21:50,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:21:56,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2714.18921 ± 964.771
2025-09-14 11:21:56,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2408.365), np.float32(1841.133), np.float32(1425.492), np.float32(1273.1725), np.float32(3582.3843), np.float32(3642.1204), np.float32(3431.3025), np.float32(2045.0182), np.float32(3804.8335), np.float32(3688.0706)]
2025-09-14 11:21:56,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:21:56,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 13 minutes, 47 seconds)
2025-09-14 11:24:26,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:24:33,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3156.89893 ± 790.846
2025-09-14 11:24:33,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3400.4456), np.float32(3815.7542), np.float32(3214.3335), np.float32(3747.3562), np.float32(2021.7773), np.float32(1626.7494), np.float32(3841.4495), np.float32(3862.9834), np.float32(3641.1697), np.float32(2396.972)]
2025-09-14 11:24:33,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:24:33,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 10 minutes, 22 seconds)
2025-09-14 11:27:02,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:27:09,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2861.25269 ± 717.317
2025-09-14 11:27:09,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1537.9567), np.float32(3640.8289), np.float32(3836.0342), np.float32(2018.364), np.float32(2251.5984), np.float32(3241.364), np.float32(2697.0364), np.float32(3018.0886), np.float32(3602.616), np.float32(2768.6416)]
2025-09-14 11:27:09,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:27:09,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 7 minutes, 45 seconds)
2025-09-14 11:29:38,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:29:45,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2329.82886 ± 891.671
2025-09-14 11:29:45,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2269.781), np.float32(1722.6501), np.float32(1233.1805), np.float32(1578.6328), np.float32(2683.2996), np.float32(3659.0847), np.float32(3137.556), np.float32(2039.4926), np.float32(3738.9802), np.float32(1235.6299)]
2025-09-14 11:29:45,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:29:45,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 5 minutes, 3 seconds)
2025-09-14 11:32:15,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:32:21,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2964.91650 ± 957.835
2025-09-14 11:32:21,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3825.2986), np.float32(2494.5874), np.float32(1293.9745), np.float32(2449.9243), np.float32(1338.5679), np.float32(3558.2188), np.float32(3437.39), np.float32(3743.0774), np.float32(3548.5503), np.float32(3959.5745)]
2025-09-14 11:32:21,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:32:21,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 2 minutes, 24 seconds)
2025-09-14 11:34:51,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:34:58,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3272.34521 ± 887.587
2025-09-14 11:34:58,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3462.996), np.float32(3744.5137), np.float32(1304.7024), np.float32(3737.5688), np.float32(3805.0842), np.float32(3799.8896), np.float32(3802.5496), np.float32(3618.57), np.float32(3715.2097), np.float32(1732.368)]
2025-09-14 11:34:58,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:34:58,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3272.35) for latency 9
2025-09-14 11:34:58,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 59 minutes, 51 seconds)
2025-09-14 11:37:28,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:37:35,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3272.91235 ± 933.436
2025-09-14 11:37:35,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3357.746), np.float32(3995.9695), np.float32(3896.473), np.float32(3672.6343), np.float32(3572.2026), np.float32(3754.4775), np.float32(1526.0112), np.float32(1361.8531), np.float32(3619.6174), np.float32(3972.1382)]
2025-09-14 11:37:35,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:37:35,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3272.91) for latency 9
2025-09-14 11:37:35,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 57 minutes, 19 seconds)
2025-09-14 11:40:05,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:40:12,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2580.98560 ± 1203.138
2025-09-14 11:40:12,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2346.5754), np.float32(127.277596), np.float32(1416.7157), np.float32(2639.7476), np.float32(4203.8125), np.float32(3986.716), np.float32(3792.5732), np.float32(3047.4067), np.float32(1718.9749), np.float32(2530.0574)]
2025-09-14 11:40:12,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:40:12,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 54 minutes, 47 seconds)
2025-09-14 11:42:41,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:42:48,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2955.70215 ± 817.650
2025-09-14 11:42:48,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3757.0237), np.float32(3566.6409), np.float32(2152.1843), np.float32(2705.13), np.float32(3113.9517), np.float32(1760.8026), np.float32(1811.306), np.float32(2733.019), np.float32(4148.048), np.float32(3808.9163)]
2025-09-14 11:42:48,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:42:48,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 52 minutes, 12 seconds)
2025-09-14 11:45:18,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:45:24,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3487.35303 ± 1005.996
2025-09-14 11:45:24,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4079.6738), np.float32(2872.0132), np.float32(3901.6785), np.float32(4201.5503), np.float32(1255.7627), np.float32(4205.9067), np.float32(3657.3103), np.float32(2120.012), np.float32(4175.0293), np.float32(4404.5947)]
2025-09-14 11:45:24,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:45:24,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3487.35) for latency 9
2025-09-14 11:45:24,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 49 minutes, 37 seconds)
2025-09-14 11:47:54,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:48:01,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2778.72144 ± 815.308
2025-09-14 11:48:01,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1648.7561), np.float32(3575.3064), np.float32(3678.929), np.float32(1872.9181), np.float32(3352.388), np.float32(3561.1868), np.float32(1810.5072), np.float32(2228.4636), np.float32(2397.7861), np.float32(3660.9724)]
2025-09-14 11:48:01,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:48:01,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 47 minutes)
2025-09-14 11:50:31,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:50:38,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3208.81201 ± 876.050
2025-09-14 11:50:38,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2888.6038), np.float32(3921.6118), np.float32(1563.6377), np.float32(2635.2295), np.float32(3722.6912), np.float32(4428.0254), np.float32(2224.3452), np.float32(2875.0747), np.float32(4231.833), np.float32(3597.0684)]
2025-09-14 11:50:38,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:50:38,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 44 minutes, 24 seconds)
2025-09-14 11:53:08,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:53:15,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3262.35425 ± 725.889
2025-09-14 11:53:15,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3896.8533), np.float32(2996.2095), np.float32(2487.8025), np.float32(4282.369), np.float32(2742.5356), np.float32(4139.4707), np.float32(4056.0117), np.float32(3047.2146), np.float32(2141.8787), np.float32(2833.1982)]
2025-09-14 11:53:15,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:53:15,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 41 minutes, 46 seconds)
2025-09-14 11:55:44,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:55:51,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3385.00049 ± 1034.991
2025-09-14 11:55:51,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4251.0156), np.float32(4499.591), np.float32(1408.3823), np.float32(2364.4749), np.float32(4088.4827), np.float32(4284.5664), np.float32(2077.148), np.float32(3894.8665), np.float32(3003.7224), np.float32(3977.7554)]
2025-09-14 11:55:51,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:55:51,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 39 minutes, 10 seconds)
2025-09-14 11:58:20,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:58:27,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3566.55811 ± 895.639
2025-09-14 11:58:27,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4302.6855), np.float32(1688.6937), np.float32(3960.6726), np.float32(4022.9973), np.float32(4399.7627), np.float32(3635.8328), np.float32(4099.4277), np.float32(2260.647), np.float32(4313.8765), np.float32(2980.9854)]
2025-09-14 11:58:27,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:58:27,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3566.56) for latency 9
2025-09-14 11:58:27,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 36 minutes, 31 seconds)
2025-09-14 12:00:56,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:01:03,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3347.20264 ± 834.457
2025-09-14 12:01:03,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2092.5886), np.float32(4324.498), np.float32(2295.2366), np.float32(2866.9475), np.float32(4409.6636), np.float32(2543.3325), np.float32(3690.1638), np.float32(3655.1062), np.float32(4390.6284), np.float32(3203.8606)]
2025-09-14 12:01:03,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:01:03,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 33 minutes, 50 seconds)
2025-09-14 12:03:33,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:03:39,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3617.21948 ± 1112.945
2025-09-14 12:03:39,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2111.834), np.float32(2487.8745), np.float32(1349.2489), np.float32(4140.338), np.float32(4432.5684), np.float32(3927.523), np.float32(4322.4414), np.float32(4465.032), np.float32(4511.1562), np.float32(4424.1787)]
2025-09-14 12:03:39,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:03:39,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3617.22) for latency 9
2025-09-14 12:03:39,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 31 minutes, 10 seconds)
2025-09-14 12:06:09,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:06:16,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3931.17432 ± 663.477
2025-09-14 12:06:16,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3615.5774), np.float32(4476.2925), np.float32(4373.8433), np.float32(2403.7808), np.float32(4429.5576), np.float32(4342.165), np.float32(4457.2046), np.float32(3866.0027), np.float32(3105.5493), np.float32(4241.768)]
2025-09-14 12:06:16,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:06:16,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3931.17) for latency 9
2025-09-14 12:06:16,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 28 minutes, 30 seconds)
2025-09-14 12:08:45,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:08:52,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3864.81787 ± 594.460
2025-09-14 12:08:52,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4121.1724), np.float32(3944.1433), np.float32(4093.4531), np.float32(4029.2175), np.float32(4295.002), np.float32(4596.2246), np.float32(2278.7483), np.float32(4030.5178), np.float32(3696.9585), np.float32(3562.7441)]
2025-09-14 12:08:52,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:08:52,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 25 minutes, 54 seconds)
2025-09-14 12:11:22,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:11:28,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3621.39771 ± 1318.173
2025-09-14 12:11:28,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2047.8296), np.float32(4248.687), np.float32(4493.079), np.float32(4355.144), np.float32(4603.617), np.float32(1388.2897), np.float32(4558.1724), np.float32(4591.4653), np.float32(4475.983), np.float32(1451.7059)]
2025-09-14 12:11:28,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:11:28,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 23 minutes, 19 seconds)
2025-09-14 12:13:58,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:14:04,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3984.35693 ± 657.154
2025-09-14 12:14:04,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4342.698), np.float32(3878.3547), np.float32(4627.399), np.float32(4097.311), np.float32(2176.844), np.float32(4007.0774), np.float32(3969.1064), np.float32(4416.8774), np.float32(4505.3994), np.float32(3822.5051)]
2025-09-14 12:14:04,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:14:04,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3984.36) for latency 9
2025-09-14 12:14:04,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 20 minutes, 43 seconds)
2025-09-14 12:16:34,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:16:41,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3799.93115 ± 1018.656
2025-09-14 12:16:41,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1534.8164), np.float32(4061.383), np.float32(4450.4834), np.float32(4676.3105), np.float32(2192.913), np.float32(3627.7861), np.float32(4447.868), np.float32(4313.1274), np.float32(4133.252), np.float32(4561.375)]
2025-09-14 12:16:41,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:16:41,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 18 minutes, 7 seconds)
2025-09-14 12:19:10,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:19:17,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3965.48560 ± 703.280
2025-09-14 12:19:17,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3165.738), np.float32(4700.697), np.float32(4725.69), np.float32(3383.0413), np.float32(4544.941), np.float32(4271.541), np.float32(4700.9146), np.float32(2686.2466), np.float32(3474.8535), np.float32(4001.1912)]
2025-09-14 12:19:17,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:19:17,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 15 minutes, 33 seconds)
2025-09-14 12:21:47,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:21:54,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3630.87378 ± 1317.590
2025-09-14 12:21:54,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1281.5747), np.float32(4747.394), np.float32(2077.5242), np.float32(4693.165), np.float32(4278.269), np.float32(4753.399), np.float32(4376.9146), np.float32(4426.586), np.float32(4046.0288), np.float32(1627.8839)]
2025-09-14 12:21:54,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:21:54,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 12 minutes, 59 seconds)
2025-09-14 12:24:24,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:24:30,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3726.10864 ± 1156.346
2025-09-14 12:24:30,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4403.2197), np.float32(4460.3096), np.float32(3726.9592), np.float32(1564.6759), np.float32(3532.2212), np.float32(4490.0923), np.float32(4360.618), np.float32(4587.8867), np.float32(4659.425), np.float32(1475.6803)]
2025-09-14 12:24:30,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:24:30,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 10 minutes, 23 seconds)
2025-09-14 12:27:00,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:27:07,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4316.94141 ± 705.818
2025-09-14 12:27:07,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4751.437), np.float32(4418.5107), np.float32(4590.7705), np.float32(4587.3643), np.float32(4603.1665), np.float32(4400.924), np.float32(4328.38), np.float32(2231.9312), np.float32(4583.1294), np.float32(4673.8)]
2025-09-14 12:27:07,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:27:07,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4316.94) for latency 9
2025-09-14 12:27:07,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 7 minutes, 51 seconds)
2025-09-14 12:29:37,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:29:43,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3880.59106 ± 875.020
2025-09-14 12:29:43,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4204.4624), np.float32(4429.5513), np.float32(4300.343), np.float32(2072.9524), np.float32(3975.4749), np.float32(2236.6123), np.float32(4491.193), np.float32(4453.6235), np.float32(4375.985), np.float32(4265.7134)]
2025-09-14 12:29:43,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:29:43,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 5 minutes, 13 seconds)
2025-09-14 12:32:13,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:32:20,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3449.59692 ± 1125.837
2025-09-14 12:32:20,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3795.8496), np.float32(2229.1746), np.float32(4186.573), np.float32(4306.0073), np.float32(4504.0894), np.float32(4567.205), np.float32(3309.5027), np.float32(1442.2609), np.float32(4323.139), np.float32(1832.1658)]
2025-09-14 12:32:20,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:32:20,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 2 minutes, 36 seconds)
2025-09-14 12:34:50,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:34:56,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3692.06201 ± 1071.712
2025-09-14 12:34:56,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3093.0623), np.float32(4719.416), np.float32(2915.4922), np.float32(4264.5615), np.float32(4404.733), np.float32(4502.2114), np.float32(4343.19), np.float32(1645.3728), np.float32(2231.4448), np.float32(4801.1357)]
2025-09-14 12:34:56,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:34:56,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 59 minutes, 58 seconds)
2025-09-14 12:37:26,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:37:33,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4254.21777 ± 987.602
2025-09-14 12:37:33,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4454.1387), np.float32(4786.1045), np.float32(4348.324), np.float32(4077.6062), np.float32(4707.324), np.float32(4606.959), np.float32(4682.151), np.float32(4781.071), np.float32(4736.8965), np.float32(1361.5977)]
2025-09-14 12:37:33,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:37:33,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 57 minutes, 21 seconds)
2025-09-14 12:40:03,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:40:09,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4226.66895 ± 437.991
2025-09-14 12:40:09,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4665.419), np.float32(4728.9775), np.float32(4333.1895), np.float32(4758.5166), np.float32(3476.1208), np.float32(3656.9094), np.float32(4560.9434), np.float32(3924.9329), np.float32(3929.0635), np.float32(4232.6187)]
2025-09-14 12:40:09,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:40:09,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 54 minutes, 44 seconds)
2025-09-14 12:42:39,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:42:46,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3841.90234 ± 1129.616
2025-09-14 12:42:46,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4510.2197), np.float32(4511.973), np.float32(4563.0557), np.float32(2294.4314), np.float32(1501.1113), np.float32(4703.1787), np.float32(4692.2773), np.float32(4294.1304), np.float32(4608.412), np.float32(2740.233)]
2025-09-14 12:42:46,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:42:46,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 52 minutes, 9 seconds)
2025-09-14 12:45:16,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:45:22,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3733.39331 ± 921.910
2025-09-14 12:45:22,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1910.2488), np.float32(4191.1846), np.float32(4454.657), np.float32(2906.7915), np.float32(4479.1772), np.float32(2733.7288), np.float32(3130.3774), np.float32(4358.443), np.float32(4624.09), np.float32(4545.233)]
2025-09-14 12:45:22,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:45:22,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 49 minutes, 32 seconds)
2025-09-14 12:47:52,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:47:58,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4032.29492 ± 830.496
2025-09-14 12:47:58,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4708.0728), np.float32(2584.0212), np.float32(4223.1333), np.float32(4457.617), np.float32(4732.948), np.float32(2460.6797), np.float32(4574.6836), np.float32(4307.262), np.float32(4758.963), np.float32(3515.565)]
2025-09-14 12:47:58,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:47:58,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 46 minutes, 55 seconds)
2025-09-14 12:50:28,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:50:34,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4412.19482 ± 251.881
2025-09-14 12:50:34,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4490.269), np.float32(4281.527), np.float32(4612.4033), np.float32(4679.6147), np.float32(4533.891), np.float32(3824.0598), np.float32(4252.6763), np.float32(4257.887), np.float32(4487.5596), np.float32(4702.062)]
2025-09-14 12:50:34,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:50:34,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4412.19) for latency 9
2025-09-14 12:50:34,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 44 minutes, 17 seconds)
2025-09-14 12:53:04,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:53:10,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4539.75781 ± 113.104
2025-09-14 12:53:10,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4658.346), np.float32(4473.1587), np.float32(4618.645), np.float32(4657.709), np.float32(4367.5156), np.float32(4598.3306), np.float32(4529.2383), np.float32(4370.1094), np.float32(4676.911), np.float32(4447.613)]
2025-09-14 12:53:10,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:53:10,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4539.76) for latency 9
2025-09-14 12:53:10,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 41 minutes, 39 seconds)
2025-09-14 12:55:39,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:55:46,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3641.55591 ± 873.708
2025-09-14 12:55:46,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4724.5825), np.float32(3338.3567), np.float32(3375.0962), np.float32(3263.171), np.float32(4656.4214), np.float32(3130.3386), np.float32(4094.2178), np.float32(2063.571), np.float32(4897.3076), np.float32(2872.4978)]
2025-09-14 12:55:46,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:55:46,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 39 minutes, 1 second)
2025-09-14 12:58:16,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:58:22,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4074.21997 ± 1081.663
2025-09-14 12:58:22,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4543.0317), np.float32(4534.274), np.float32(4733.6353), np.float32(3916.955), np.float32(4765.236), np.float32(4680.3755), np.float32(3329.1853), np.float32(4512.329), np.float32(1090.7738), np.float32(4636.4053)]
2025-09-14 12:58:22,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:58:22,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 36 minutes, 24 seconds)
2025-09-14 13:00:52,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:00:59,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4014.89404 ± 1079.128
2025-09-14 13:00:59,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1991.5637), np.float32(4661.163), np.float32(4397.3535), np.float32(4821.133), np.float32(3964.2903), np.float32(4814.855), np.float32(4573.016), np.float32(1822.3685), np.float32(4545.365), np.float32(4557.832)]
2025-09-14 13:00:59,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:00:59,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 33 minutes, 49 seconds)
2025-09-14 13:03:29,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:03:36,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4307.90283 ± 920.189
2025-09-14 13:03:36,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4736.451), np.float32(4499.312), np.float32(4747.4077), np.float32(4647.516), np.float32(4496.4805), np.float32(4523.8047), np.float32(4614.8726), np.float32(1558.8853), np.float32(4601.119), np.float32(4653.1772)]
2025-09-14 13:03:36,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:03:36,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 31 minutes, 15 seconds)
2025-09-14 13:06:06,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:06:12,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3703.74023 ± 1206.003
2025-09-14 13:06:12,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4646.547), np.float32(2510.386), np.float32(2064.2231), np.float32(4312.6685), np.float32(4781.662), np.float32(1815.6862), np.float32(4592.2983), np.float32(4838.7036), np.float32(4825.2725), np.float32(2649.9558)]
2025-09-14 13:06:12,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:06:12,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 28 minutes, 40 seconds)
2025-09-14 13:08:42,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:08:49,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4399.33154 ± 408.910
2025-09-14 13:08:49,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4165.0425), np.float32(4680.397), np.float32(4439.1016), np.float32(4690.3413), np.float32(3247.5444), np.float32(4571.914), np.float32(4605.447), np.float32(4555.3447), np.float32(4521.13), np.float32(4517.054)]
2025-09-14 13:08:49,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:08:49,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 26 minutes, 5 seconds)
2025-09-14 13:11:19,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:11:25,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4665.25391 ± 159.320
2025-09-14 13:11:25,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4811.897), np.float32(4757.0225), np.float32(4590.2617), np.float32(4639.8647), np.float32(4422.0654), np.float32(4863.133), np.float32(4651.2905), np.float32(4420.0293), np.float32(4898.7407), np.float32(4598.229)]
2025-09-14 13:11:25,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:11:25,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4665.25) for latency 9
2025-09-14 13:11:25,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 23 minutes, 29 seconds)
2025-09-14 13:13:55,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:14:02,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3949.49756 ± 1109.721
2025-09-14 13:14:02,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4825.84), np.float32(4428.426), np.float32(4347.0513), np.float32(4625.9), np.float32(4417.5576), np.float32(4094.7007), np.float32(1826.5533), np.float32(1716.9087), np.float32(4838.7593), np.float32(4373.283)]
2025-09-14 13:14:02,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:14:02,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 20 minutes, 52 seconds)
2025-09-14 13:16:31,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:16:38,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4062.67114 ± 1015.188
2025-09-14 13:16:38,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1962.9667), np.float32(4660.1367), np.float32(4721.5713), np.float32(4747.502), np.float32(3098.6912), np.float32(4871.311), np.float32(4900.937), np.float32(2677.809), np.float32(4314.9395), np.float32(4670.844)]
2025-09-14 13:16:38,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:16:38,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 18 minutes, 15 seconds)
2025-09-14 13:19:08,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:19:14,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4564.78076 ± 141.113
2025-09-14 13:19:14,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4685.9565), np.float32(4579.3896), np.float32(4738.4995), np.float32(4391.3384), np.float32(4413.8013), np.float32(4347.7437), np.float32(4565.6855), np.float32(4503.5947), np.float32(4649.1562), np.float32(4772.645)]
2025-09-14 13:19:14,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:19:14,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 15 minutes, 38 seconds)
2025-09-14 13:21:44,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:21:51,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3849.73242 ± 1114.779
2025-09-14 13:21:51,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2637.6934), np.float32(3133.4146), np.float32(4467.6797), np.float32(1450.197), np.float32(4797.562), np.float32(3222.59), np.float32(4749.4424), np.float32(4401.3003), np.float32(4713.1997), np.float32(4924.246)]
2025-09-14 13:21:51,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:21:51,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 13 minutes, 2 seconds)
2025-09-14 13:24:21,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:24:28,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3714.88721 ± 1107.259
2025-09-14 13:24:28,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4691.323), np.float32(3123.2244), np.float32(4528.8604), np.float32(3180.9878), np.float32(4112.5747), np.float32(4298.864), np.float32(1561.58), np.float32(4601.567), np.float32(4941.78), np.float32(2108.1082)]
2025-09-14 13:24:28,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:24:28,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 10 minutes, 25 seconds)
2025-09-14 13:26:57,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:27:04,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4333.27539 ± 737.583
2025-09-14 13:27:04,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4499.414), np.float32(2290.5632), np.float32(4562.234), np.float32(4599.6733), np.float32(4633.5474), np.float32(4720.577), np.float32(4682.8506), np.float32(4875.3223), np.float32(3764.7957), np.float32(4703.7764)]
2025-09-14 13:27:04,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:27:04,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 49 seconds)
2025-09-14 13:29:34,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:29:41,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4060.90234 ± 1142.611
2025-09-14 13:29:41,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4669.29), np.float32(4724.0557), np.float32(2719.0405), np.float32(4761.615), np.float32(4212.0884), np.float32(4804.7354), np.float32(4660.885), np.float32(1117.1636), np.float32(4376.4404), np.float32(4563.707)]
2025-09-14 13:29:41,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:29:41,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 13 seconds)
2025-09-14 13:32:11,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:32:18,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4047.48511 ± 1174.984
2025-09-14 13:32:18,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4646.096), np.float32(1673.8546), np.float32(4777.3774), np.float32(4512.371), np.float32(4788.5913), np.float32(4621.2686), np.float32(1742.157), np.float32(4698.475), np.float32(4618.7817), np.float32(4395.881)]
2025-09-14 13:32:18,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:32:18,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 36 seconds)
2025-09-14 13:34:45,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:34:51,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4162.60840 ± 864.846
2025-09-14 13:34:51,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1747.421), np.float32(4548.9), np.float32(4640.5205), np.float32(4566.6904), np.float32(4757.3257), np.float32(4659.065), np.float32(4024.1213), np.float32(4281.539), np.float32(4684.6196), np.float32(3715.8792)]
2025-09-14 13:34:51,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:34:51,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1251 [DEBUG]: Training session finished
