2025-09-14 08:43:01,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.200-delay_6
2025-09-14 08:43:01,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.200-delay_6
2025-09-14 08:43:01,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'6': <latency_env.delayed_mdp.ConstantDelay object at 0x7fe28ad7bbf0>}
2025-09-14 08:43:01,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 08:43:01,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 08:43:01,626 baseline-bpql-noisepromille200-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=53, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 08:43:01,626 baseline-bpql-noisepromille200-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 08:43:03,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 08:43:03,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 09:05:15,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:05:23,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -403.50946 ± 60.887
2025-09-14 09:05:23,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-401.8765), np.float32(-391.34225), np.float32(-527.3246), np.float32(-364.6218), np.float32(-388.24124), np.float32(-336.7163), np.float32(-510.06204), np.float32(-376.44113), np.float32(-391.01196), np.float32(-347.4567)]
2025-09-14 09:05:23,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:05:23,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-403.51) for latency 6
2025-09-14 09:05:23,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 36 hours, 51 minutes, 27 seconds)
2025-09-14 09:25:15,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:25:23,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -192.47043 ± 51.811
2025-09-14 09:25:23,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-105.52709), np.float32(-184.71017), np.float32(-194.6557), np.float32(-245.81813), np.float32(-253.80164), np.float32(-173.35222), np.float32(-176.74625), np.float32(-278.32144), np.float32(-124.63982), np.float32(-187.13202)]
2025-09-14 09:25:23,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:25:23,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-192.47) for latency 6
2025-09-14 09:25:23,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 34 hours, 34 minutes, 23 seconds)
2025-09-14 09:44:50,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:44:58,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -67.14751 ± 80.152
2025-09-14 09:44:58,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-51.960533), np.float32(57.017635), np.float32(-208.27792), np.float32(-84.274284), np.float32(27.984623), np.float32(-59.9347), np.float32(-51.448353), np.float32(-151.58246), np.float32(2.3083372), np.float32(-151.30745)]
2025-09-14 09:44:58,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:44:58,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-67.15) for latency 6
2025-09-14 09:44:58,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 33 hours, 22 minutes, 16 seconds)
2025-09-14 10:06:44,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:06:52,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 116.89197 ± 172.590
2025-09-14 10:06:52,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-192.53275), np.float32(192.66574), np.float32(325.69363), np.float32(403.45834), np.float32(-55.436115), np.float32(-8.679055), np.float32(227.77328), np.float32(72.75322), np.float32(29.171883), np.float32(174.05153)]
2025-09-14 10:06:52,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:06:52,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (116.89) for latency 6
2025-09-14 10:06:52,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 33 hours, 31 minutes, 44 seconds)
2025-09-14 10:28:28,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:28:38,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 279.05713 ± 238.722
2025-09-14 10:28:38,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(182.35625), np.float32(466.19583), np.float32(544.6931), np.float32(662.0746), np.float32(77.702354), np.float32(-86.85292), np.float32(0.9268343), np.float32(476.59186), np.float32(314.91296), np.float32(151.97025)]
2025-09-14 10:28:38,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:28:38,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (279.06) for latency 6
2025-09-14 10:28:38,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 33 hours, 26 minutes, 16 seconds)
2025-09-14 10:48:19,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:48:27,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 370.81982 ± 249.085
2025-09-14 10:48:27,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(548.1347), np.float32(260.8535), np.float32(177.46005), np.float32(83.99529), np.float32(630.2678), np.float32(295.01367), np.float32(648.11084), np.float32(630.10205), np.float32(-93.37463), np.float32(527.6349)]
2025-09-14 10:48:27,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:48:27,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (370.82) for latency 6
2025-09-14 10:48:27,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 32 hours, 17 minutes, 43 seconds)
2025-09-14 11:04:52,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:05:01,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 745.55554 ± 324.220
2025-09-14 11:05:01,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(904.49255), np.float32(113.02058), np.float32(894.675), np.float32(206.35992), np.float32(767.02466), np.float32(960.6042), np.float32(727.6756), np.float32(669.92346), np.float32(1120.4811), np.float32(1091.2986)]
2025-09-14 11:05:01,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:05:01,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (745.56) for latency 6
2025-09-14 11:05:01,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 30 hours, 53 minutes, 4 seconds)
2025-09-14 11:08:31,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:08:39,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 748.87433 ± 363.424
2025-09-14 11:08:39,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(772.4105), np.float32(779.1896), np.float32(774.35754), np.float32(438.1414), np.float32(497.27188), np.float32(395.7319), np.float32(775.03345), np.float32(1584.5472), np.float32(326.93738), np.float32(1145.1221)]
2025-09-14 11:08:39,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:08:39,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (748.87) for latency 6
2025-09-14 11:08:39,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 25 hours, 39 minutes, 47 seconds)
2025-09-14 11:13:20,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:13:29,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1062.43994 ± 369.502
2025-09-14 11:13:29,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1173.9293), np.float32(1122.1438), np.float32(686.44366), np.float32(768.94965), np.float32(1388.3706), np.float32(587.09796), np.float32(1083.018), np.float32(1637.1417), np.float32(1566.9581), np.float32(610.34674)]
2025-09-14 11:13:29,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:13:29,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1062.44) for latency 6
2025-09-14 11:13:29,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 20 hours, 12 minutes, 13 seconds)
2025-09-14 11:16:57,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:17:05,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1486.36768 ± 579.380
2025-09-14 11:17:05,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1718.5225), np.float32(2088.7861), np.float32(2049.8481), np.float32(770.46533), np.float32(772.7157), np.float32(2254.3923), np.float32(1540.45), np.float32(537.9037), np.float32(1784.2197), np.float32(1346.3727)]
2025-09-14 11:17:05,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:17:05,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1486.37) for latency 6
2025-09-14 11:17:05,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 14 hours, 32 minutes, 2 seconds)
2025-09-14 11:21:43,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:21:51,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1589.92615 ± 599.879
2025-09-14 11:21:51,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1946.209), np.float32(762.3574), np.float32(1839.3922), np.float32(2168.2302), np.float32(2158.4595), np.float32(698.97754), np.float32(1815.5139), np.float32(1721.6608), np.float32(643.15936), np.float32(2145.3013)]
2025-09-14 11:21:51,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:21:51,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1589.93) for latency 6
2025-09-14 11:21:51,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 9 hours, 54 minutes, 27 seconds)
2025-09-14 11:26:37,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:26:46,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1414.31860 ± 457.238
2025-09-14 11:26:46,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1397.1042), np.float32(1297.8193), np.float32(785.4365), np.float32(1753.8341), np.float32(1960.098), np.float32(1210.2871), np.float32(2088.9038), np.float32(1866.8934), np.float32(990.8165), np.float32(791.9918)]
2025-09-14 11:26:46,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:26:46,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 6 hours, 22 minutes, 50 seconds)
2025-09-14 11:33:28,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:33:36,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2049.65942 ± 509.665
2025-09-14 11:33:36,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2482.5547), np.float32(1720.0015), np.float32(2704.9478), np.float32(2096.069), np.float32(2475.3787), np.float32(1260.3698), np.float32(2345.785), np.float32(1211.9902), np.float32(2468.0854), np.float32(1731.4114)]
2025-09-14 11:33:36,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:33:36,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (2049.66) for latency 6
2025-09-14 11:33:36,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 7 hours, 14 minutes, 8 seconds)
2025-09-14 11:38:04,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:38:13,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2271.79639 ± 351.328
2025-09-14 11:38:13,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1900.675), np.float32(2413.7961), np.float32(2530.516), np.float32(2723.5652), np.float32(2337.5752), np.float32(2196.4265), np.float32(1438.1915), np.float32(2509.6636), np.float32(2468.348), np.float32(2199.2078)]
2025-09-14 11:38:13,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:38:13,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (2271.80) for latency 6
2025-09-14 11:38:13,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 7 hours, 5 minutes, 40 seconds)
2025-09-14 11:43:21,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:43:29,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1993.77417 ± 514.037
2025-09-14 11:43:29,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2551.109), np.float32(1493.6327), np.float32(1629.7737), np.float32(1524.361), np.float32(2437.4297), np.float32(2339.8933), np.float32(2769.667), np.float32(2068.8335), np.float32(1097.4214), np.float32(2025.6198)]
2025-09-14 11:43:29,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:43:29,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 7 hours, 28 minutes, 51 seconds)
2025-09-14 11:48:15,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:48:23,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2389.95972 ± 547.065
2025-09-14 11:48:23,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2703.4019), np.float32(2360.7378), np.float32(2948.1675), np.float32(1790.1908), np.float32(2449.4226), np.float32(1764.5596), np.float32(1312.722), np.float32(2897.5095), np.float32(2910.544), np.float32(2762.3442)]
2025-09-14 11:48:23,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:48:23,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (2389.96) for latency 6
2025-09-14 11:48:23,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 7 hours, 25 minutes, 49 seconds)
2025-09-14 11:54:07,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:54:15,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2141.42163 ± 662.472
2025-09-14 11:54:15,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2881.1555), np.float32(2819.9797), np.float32(2429.9353), np.float32(1694.5865), np.float32(1487.0455), np.float32(2214.4634), np.float32(1145.9812), np.float32(1216.7498), np.float32(2917.161), np.float32(2607.1594)]
2025-09-14 11:54:15,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:54:15,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 7 hours, 36 minutes, 18 seconds)
2025-09-14 11:59:07,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:59:16,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2424.94189 ± 597.810
2025-09-14 11:59:16,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2925.004), np.float32(2753.9927), np.float32(2397.118), np.float32(2798.1787), np.float32(1841.3057), np.float32(2716.584), np.float32(873.7441), np.float32(2623.6392), np.float32(2443.6726), np.float32(2876.1812)]
2025-09-14 11:59:16,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:59:16,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (2424.94) for latency 6
2025-09-14 11:59:16,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 7 hours, 43 seconds)
2025-09-14 12:03:35,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:03:43,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2765.17163 ± 357.006
2025-09-14 12:03:43,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3085.0845), np.float32(3424.1858), np.float32(2553.0544), np.float32(2713.0986), np.float32(2815.7446), np.float32(2009.7096), np.float32(2865.0793), np.float32(2488.2927), np.float32(2938.986), np.float32(2758.4805)]
2025-09-14 12:03:43,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:03:43,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (2765.17) for latency 6
2025-09-14 12:03:43,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 6 hours, 53 minutes, 4 seconds)
2025-09-14 12:08:15,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:08:23,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2689.22876 ± 545.010
2025-09-14 12:08:23,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2983.196), np.float32(2662.7764), np.float32(2811.8372), np.float32(2353.8945), np.float32(3065.6313), np.float32(1170.597), np.float32(2970.3828), np.float32(3026.5374), np.float32(2873.4822), np.float32(2973.9539)]
2025-09-14 12:08:23,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:08:23,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 6 hours, 38 minutes, 21 seconds)
2025-09-14 12:13:40,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:13:48,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2679.74341 ± 568.896
2025-09-14 12:13:48,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2878.334), np.float32(1154.2943), np.float32(2846.096), np.float32(2978.6792), np.float32(2715.2612), np.float32(3037.8916), np.float32(3119.0122), np.float32(3103.7573), np.float32(2188.3567), np.float32(2775.7502)]
2025-09-14 12:13:48,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:13:48,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 6 hours, 41 minutes, 28 seconds)
2025-09-14 12:18:42,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:18:51,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2625.49097 ± 666.984
2025-09-14 12:18:51,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3054.123), np.float32(928.51025), np.float32(2999.5764), np.float32(2510.5261), np.float32(2894.2117), np.float32(3125.0798), np.float32(2512.3657), np.float32(3315.9954), np.float32(2030.6163), np.float32(2883.904)]
2025-09-14 12:18:51,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:18:51,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 6 hours, 23 minutes, 39 seconds)
2025-09-14 12:23:10,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:23:18,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2723.39893 ± 459.874
2025-09-14 12:23:18,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3092.9482), np.float32(2562.485), np.float32(2298.6064), np.float32(3149.9578), np.float32(3110.1455), np.float32(1634.0189), np.float32(2864.8142), np.float32(3153.579), np.float32(2834.4678), np.float32(2532.9668)]
2025-09-14 12:23:18,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:23:18,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 6 hours, 10 minutes, 7 seconds)
2025-09-14 12:27:29,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:27:37,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2952.00000 ± 123.843
2025-09-14 12:27:37,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2868.9048), np.float32(2820.6987), np.float32(3078.7844), np.float32(2982.0464), np.float32(2908.1843), np.float32(2686.0237), np.float32(3006.2673), np.float32(3011.977), np.float32(3101.028), np.float32(3056.0867)]
2025-09-14 12:27:37,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:27:37,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (2952.00) for latency 6
2025-09-14 12:27:37,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 6 hours, 3 minutes, 6 seconds)
2025-09-14 12:32:37,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:32:45,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2886.80518 ± 613.993
2025-09-14 12:32:45,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3021.2253), np.float32(3185.2664), np.float32(3239.3245), np.float32(3277.4827), np.float32(3320.1414), np.float32(2540.6833), np.float32(3211.36), np.float32(3103.8757), np.float32(1179.0735), np.float32(2789.6174)]
2025-09-14 12:32:45,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:32:45,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 6 hours, 5 minutes, 29 seconds)
2025-09-14 12:39:28,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:39:36,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2771.28662 ± 734.320
2025-09-14 12:39:36,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3176.196), np.float32(992.86383), np.float32(2936.1611), np.float32(3006.6333), np.float32(2977.5095), np.float32(2963.0986), np.float32(3337.9163), np.float32(3226.0698), np.float32(3347.0225), np.float32(1749.393)]
2025-09-14 12:39:36,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:39:36,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 6 hours, 21 minutes, 50 seconds)
2025-09-14 12:44:14,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:44:22,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2442.60229 ± 644.741
2025-09-14 12:44:22,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2696.4568), np.float32(1016.8454), np.float32(2519.9282), np.float32(2751.2002), np.float32(3103.4395), np.float32(2776.7083), np.float32(2947.5718), np.float32(2593.0745), np.float32(1394.3995), np.float32(2626.398)]
2025-09-14 12:44:22,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:44:22,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 6 hours, 12 minutes, 39 seconds)
2025-09-14 12:49:00,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:49:08,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2941.95239 ± 570.187
2025-09-14 12:49:08,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3354.8474), np.float32(1331.0211), np.float32(3377.0637), np.float32(2705.0), np.float32(3173.861), np.float32(3060.0547), np.float32(3272.2178), np.float32(3163.1104), np.float32(2922.179), np.float32(3060.1682)]
2025-09-14 12:49:08,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:49:08,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 6 hours, 12 minutes, 4 seconds)
2025-09-14 12:53:48,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:53:56,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2436.80054 ± 843.705
2025-09-14 12:53:56,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2778.6865), np.float32(3297.5654), np.float32(3071.9045), np.float32(3125.6304), np.float32(3031.0442), np.float32(1353.3768), np.float32(2772.1267), np.float32(857.425), np.float32(1356.6644), np.float32(2723.5813)]
2025-09-14 12:53:56,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:53:56,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 6 hours, 13 minutes, 45 seconds)
2025-09-14 12:59:07,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:59:15,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2727.02466 ± 583.711
2025-09-14 12:59:15,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2920.983), np.float32(2642.1604), np.float32(2461.1794), np.float32(2956.518), np.float32(3146.1797), np.float32(1083.8503), np.float32(2925.482), np.float32(3026.0188), np.float32(2951.9846), np.float32(3155.89)]
2025-09-14 12:59:15,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:59:15,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 6 hours, 10 minutes, 50 seconds)
2025-09-14 13:04:00,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:04:08,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2779.19849 ± 620.451
2025-09-14 13:04:08,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2948.196), np.float32(3225.9785), np.float32(2930.8826), np.float32(3037.2302), np.float32(2203.8584), np.float32(2948.3967), np.float32(2489.7273), np.float32(3442.1143), np.float32(1244.5065), np.float32(3321.0964)]
2025-09-14 13:04:08,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:04:08,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 5 hours, 38 minutes, 35 seconds)
2025-09-14 13:08:28,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:08:36,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3012.33813 ± 159.519
2025-09-14 13:08:36,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3202.7664), np.float32(3260.2202), np.float32(2838.099), np.float32(2842.0261), np.float32(3211.6143), np.float32(2883.1184), np.float32(2858.4917), np.float32(3079.6565), np.float32(2903.7197), np.float32(3043.6692)]
2025-09-14 13:08:36,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:08:36,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (3012.34) for latency 6
2025-09-14 13:08:36,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 5 hours, 29 minutes, 35 seconds)
2025-09-14 13:12:34,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:12:42,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2690.87280 ± 747.092
2025-09-14 13:12:42,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1592.7632), np.float32(934.6151), np.float32(3181.0347), np.float32(2752.0415), np.float32(3113.3425), np.float32(2847.5881), np.float32(3021.994), np.float32(3332.3445), np.float32(2923.636), np.float32(3209.3665)]
2025-09-14 13:12:42,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:12:42,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 5 hours, 15 minutes, 52 seconds)
2025-09-14 13:16:46,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:16:54,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2862.99072 ± 527.183
2025-09-14 13:16:54,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3147.969), np.float32(2803.0588), np.float32(2785.5684), np.float32(2890.6807), np.float32(3155.4648), np.float32(3141.8984), np.float32(3282.4001), np.float32(3106.6672), np.float32(1353.3318), np.float32(2962.8699)]
2025-09-14 13:16:54,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:16:54,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 5 hours, 3 minutes, 9 seconds)
2025-09-14 13:21:32,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:21:39,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2053.64209 ± 680.287
2025-09-14 13:21:39,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1871.4392), np.float32(2776.559), np.float32(1498.621), np.float32(1696.0608), np.float32(1733.6013), np.float32(2797.559), np.float32(2755.7996), np.float32(1051.7913), np.float32(1325.1631), np.float32(3029.8254)]
2025-09-14 13:21:39,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:21:39,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 51 minutes, 22 seconds)
2025-09-14 13:26:06,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:26:14,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2941.47729 ± 404.148
2025-09-14 13:26:14,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3160.6184), np.float32(3177.6875), np.float32(3164.0276), np.float32(2736.192), np.float32(3270.7026), np.float32(2854.1433), np.float32(3098.0383), np.float32(3163.941), np.float32(1824.6486), np.float32(2964.7742)]
2025-09-14 13:26:14,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:26:14,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 42 minutes, 50 seconds)
2025-09-14 13:30:26,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:30:34,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2694.42480 ± 889.954
2025-09-14 13:30:34,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(923.5607), np.float32(3442.32), np.float32(2821.4775), np.float32(952.6425), np.float32(3225.364), np.float32(3119.2507), np.float32(3084.2737), np.float32(3073.914), np.float32(3142.2903), np.float32(3159.1519)]
2025-09-14 13:30:34,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:30:34,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 36 minutes, 48 seconds)
2025-09-14 13:34:48,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:34:56,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2781.43115 ± 432.098
2025-09-14 13:34:56,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2080.7869), np.float32(3137.698), np.float32(2222.51), np.float32(2974.4304), np.float32(2178.3962), np.float32(3059.7517), np.float32(2847.9736), np.float32(3321.8235), np.float32(3183.1167), np.float32(2807.8215)]
2025-09-14 13:34:56,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:34:56,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 35 minutes, 37 seconds)
2025-09-14 13:38:52,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:39:00,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2861.47192 ± 312.286
2025-09-14 13:39:00,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2918.5518), np.float32(2524.0505), np.float32(2910.77), np.float32(3031.7324), np.float32(3019.2178), np.float32(2843.9702), np.float32(2081.3728), np.float32(3234.8247), np.float32(3024.559), np.float32(3025.6707)]
2025-09-14 13:39:00,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:39:00,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 29 minutes, 40 seconds)
2025-09-14 13:43:14,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:43:22,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2784.85059 ± 652.846
2025-09-14 13:43:22,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2213.6472), np.float32(3284.2869), np.float32(3232.1125), np.float32(3234.2153), np.float32(3144.432), np.float32(3282.4546), np.float32(2803.0063), np.float32(3251.527), np.float32(2114.752), np.float32(1288.0729)]
2025-09-14 13:43:22,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:43:22,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 20 minutes, 28 seconds)
2025-09-14 13:48:12,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:48:20,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2697.70190 ± 591.098
2025-09-14 13:48:20,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2865.9575), np.float32(3095.642), np.float32(1205.7369), np.float32(2966.3657), np.float32(2933.9924), np.float32(1942.5486), np.float32(2872.5706), np.float32(3131.6199), np.float32(3022.2756), np.float32(2940.3108)]
2025-09-14 13:48:20,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:48:21,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 20 minutes, 53 seconds)
2025-09-14 13:52:03,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:52:11,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3004.90479 ± 240.056
2025-09-14 13:52:11,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2663.2273), np.float32(3248.5193), np.float32(2879.3188), np.float32(2599.4617), np.float32(3320.5005), np.float32(3041.985), np.float32(3039.751), np.float32(3238.3494), np.float32(3186.8718), np.float32(2831.0618)]
2025-09-14 13:52:11,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:52:11,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 10 minutes, 38 seconds)
2025-09-14 13:56:23,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:56:31,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2942.47021 ± 424.492
2025-09-14 13:56:31,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3290.1958), np.float32(3342.201), np.float32(3158.5425), np.float32(3298.9006), np.float32(1822.4747), np.float32(2887.139), np.float32(2690.3323), np.float32(2904.9639), np.float32(3100.3), np.float32(2929.6506)]
2025-09-14 13:56:31,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:56:31,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 6 minutes, 2 seconds)
2025-09-14 14:00:49,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 14:00:57,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3058.88672 ± 116.826
2025-09-14 14:00:57,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3188.828), np.float32(3203.441), np.float32(3046.5369), np.float32(3178.2498), np.float32(2988.0354), np.float32(2917.395), np.float32(2861.2751), np.float32(3032.522), np.float32(3181.198), np.float32(2991.3884)]
2025-09-14 14:00:57,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:00:57,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (3058.89) for latency 6
2025-09-14 14:00:57,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 4 hours, 5 minutes, 47 seconds)
2025-09-14 14:04:39,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 14:04:46,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2965.91455 ± 525.708
2025-09-14 14:04:46,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3128.5771), np.float32(3310.7554), np.float32(1417.972), np.float32(3305.582), np.float32(3028.5706), np.float32(3124.2344), np.float32(3189.2937), np.float32(3089.766), np.float32(2988.6265), np.float32(3075.7678)]
2025-09-14 14:04:46,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:04:46,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 55 minutes, 30 seconds)
2025-09-14 14:09:02,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 14:09:10,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2898.90503 ± 504.048
2025-09-14 14:09:10,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2457.2358), np.float32(3148.6626), np.float32(3269.345), np.float32(2886.1294), np.float32(3117.785), np.float32(3337.9539), np.float32(1637.0197), np.float32(3219.4373), np.float32(3276.4177), np.float32(2639.0654)]
2025-09-14 14:09:10,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:09:10,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 44 minutes, 56 seconds)
2025-09-14 14:14:47,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 14:14:54,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2639.55249 ± 735.335
2025-09-14 14:14:54,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3157.8113), np.float32(3087.8054), np.float32(2764.3083), np.float32(1192.6006), np.float32(3024.1858), np.float32(1257.6548), np.float32(3171.7122), np.float32(3109.5205), np.float32(3135.301), np.float32(2494.627)]
2025-09-14 14:14:54,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:14:54,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 4 hours, 52 seconds)
2025-09-14 14:19:38,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 14:19:45,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3033.43945 ± 331.644
2025-09-14 14:19:45,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3181.9312), np.float32(2090.4692), np.float32(3179.2844), np.float32(3039.2478), np.float32(3109.6914), np.float32(3244.8945), np.float32(2921.2783), np.float32(3186.2715), np.float32(3065.3135), np.float32(3316.0112)]
2025-09-14 14:19:45,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:19:45,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 4 hours, 1 minute, 34 seconds)
2025-09-14 14:23:56,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 14:24:03,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2988.54639 ± 356.774
2025-09-14 14:24:03,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2983.293), np.float32(3147.754), np.float32(3091.25), np.float32(2947.385), np.float32(3091.813), np.float32(3173.8892), np.float32(1962.3555), np.float32(3267.221), np.float32(3232.5083), np.float32(2987.9937)]
2025-09-14 14:24:03,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:24:03,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 55 minutes, 42 seconds)
2025-09-14 14:28:39,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 14:28:48,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2577.27832 ± 738.378
2025-09-14 14:28:48,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1034.7935), np.float32(2616.0713), np.float32(1472.5647), np.float32(3150.375), np.float32(3352.8838), np.float32(2744.0034), np.float32(3189.735), np.float32(2879.5154), np.float32(2213.2537), np.float32(3119.5886)]
2025-09-14 14:28:48,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:28:48,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 4 hours, 11 seconds)
2025-09-14 14:32:58,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 14:33:06,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2981.33472 ± 110.365
2025-09-14 14:33:06,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2988.9348), np.float32(3077.175), np.float32(3069.8008), np.float32(3062.5225), np.float32(2807.888), np.float32(3050.696), np.float32(3081.286), np.float32(2850.785), np.float32(3027.5364), np.float32(2796.7236)]
2025-09-14 14:33:06,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:33:06,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 54 minutes, 29 seconds)
2025-09-14 14:37:52,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 14:37:59,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2834.44287 ± 683.633
2025-09-14 14:37:59,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3170.425), np.float32(1957.0496), np.float32(3318.587), np.float32(3214.01), np.float32(1178.2104), np.float32(3337.4778), np.float32(3290.4214), np.float32(3202.9988), np.float32(3027.9585), np.float32(2647.2903)]
2025-09-14 14:37:59,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:37:59,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 41 minutes, 38 seconds)
2025-09-14 14:41:35,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 14:41:43,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2951.79565 ± 450.720
2025-09-14 14:41:43,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3281.0479), np.float32(3095.4019), np.float32(3381.9465), np.float32(3128.9495), np.float32(2938.819), np.float32(1790.4453), np.float32(2844.3003), np.float32(3453.9653), np.float32(2920.4014), np.float32(2682.6802)]
2025-09-14 14:41:43,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:41:43,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 26 minutes, 29 seconds)
2025-09-14 14:46:05,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 14:46:13,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2974.92529 ± 573.497
2025-09-14 14:46:13,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3102.1821), np.float32(3218.8057), np.float32(1318.3137), np.float32(3189.7324), np.float32(2885.3193), np.float32(3231.3547), np.float32(2965.8323), np.float32(3458.4868), np.float32(3082.5283), np.float32(3296.7)]
2025-09-14 14:46:13,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:46:13,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 23 minutes, 51 seconds)
2025-09-14 14:50:19,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 14:50:27,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3137.60791 ± 248.934
2025-09-14 14:50:27,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3009.8274), np.float32(3182.271), np.float32(3301.6997), np.float32(2831.094), np.float32(3306.0696), np.float32(3276.6946), np.float32(2553.968), np.float32(3250.2512), np.float32(3375.569), np.float32(3288.6355)]
2025-09-14 14:50:27,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:50:27,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (3137.61) for latency 6
2025-09-14 14:50:27,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 14 minutes, 58 seconds)
2025-09-14 14:54:38,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 14:54:46,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2617.22095 ± 878.604
2025-09-14 14:54:46,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3272.4133), np.float32(1074.5332), np.float32(2948.4048), np.float32(2913.7463), np.float32(3298.488), np.float32(3222.711), np.float32(1895.476), np.float32(1025.9088), np.float32(3289.692), np.float32(3230.8354)]
2025-09-14 14:54:46,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:54:46,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 10 minutes, 40 seconds)
2025-09-14 14:58:47,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 14:58:55,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2873.11865 ± 509.318
2025-09-14 14:58:55,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3001.1194), np.float32(2886.1406), np.float32(3004.5605), np.float32(3045.497), np.float32(3164.479), np.float32(3067.0146), np.float32(1388.6359), np.float32(2815.1335), np.float32(3261.3762), np.float32(3097.229)]
2025-09-14 14:58:55,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:58:55,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 1 second)
2025-09-14 15:02:39,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 15:02:47,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2894.20874 ± 669.692
2025-09-14 15:02:47,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(936.6292), np.float32(2787.1526), np.float32(3199.4822), np.float32(3256.9287), np.float32(3250.1042), np.float32(3179.153), np.float32(2906.5098), np.float32(3065.23), np.float32(3092.663), np.float32(3268.2322)]
2025-09-14 15:02:47,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:02:47,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 56 minutes, 57 seconds)
2025-09-14 15:06:45,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 15:06:53,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2716.38477 ± 720.162
2025-09-14 15:06:53,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3026.8481), np.float32(3237.904), np.float32(3433.7634), np.float32(1681.3047), np.float32(3143.7595), np.float32(3078.797), np.float32(1132.0792), np.float32(3094.2961), np.float32(3028.334), np.float32(2306.7607)]
2025-09-14 15:06:53,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:06:53,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 49 minutes, 29 seconds)
2025-09-14 15:11:04,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 15:11:12,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3192.01929 ± 142.298
2025-09-14 15:11:12,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2961.135), np.float32(3055.4949), np.float32(3154.953), np.float32(3381.289), np.float32(3335.1543), np.float32(3168.7803), np.float32(3373.3062), np.float32(3320.3533), np.float32(3087.818), np.float32(3081.9102)]
2025-09-14 15:11:12,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:11:12,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (3192.02) for latency 6
2025-09-14 15:11:12,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 45 minutes, 59 seconds)
2025-09-14 15:15:49,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 15:15:58,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2919.46362 ± 456.459
2025-09-14 15:15:58,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3132.595), np.float32(3211.9062), np.float32(3228.589), np.float32(2975.5127), np.float32(3262.6033), np.float32(1770.9845), np.float32(2451.0435), np.float32(2749.233), np.float32(3253.2546), np.float32(3158.9167)]
2025-09-14 15:15:58,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:15:58,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 45 minutes, 18 seconds)
2025-09-14 15:20:30,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 15:20:38,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2957.92603 ± 436.824
2025-09-14 15:20:38,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3424.4585), np.float32(3143.8506), np.float32(2951.059), np.float32(3057.272), np.float32(1986.8341), np.float32(3047.735), np.float32(3347.8306), np.float32(2294.168), np.float32(3258.417), np.float32(3067.6345)]
2025-09-14 15:20:38,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:20:38,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 45 minutes)
2025-09-14 15:24:51,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 15:24:59,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3096.41431 ± 134.714
2025-09-14 15:24:59,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3170.1355), np.float32(3188.4663), np.float32(3123.0818), np.float32(3211.2998), np.float32(3232.3616), np.float32(3172.0547), np.float32(2961.2834), np.float32(2886.5593), np.float32(3170.7456), np.float32(2848.1536)]
2025-09-14 15:24:59,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:24:59,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 44 minutes, 16 seconds)
2025-09-14 15:29:49,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 15:29:57,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2573.81445 ± 772.216
2025-09-14 15:29:57,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1421.1508), np.float32(3090.424), np.float32(2890.489), np.float32(2960.5876), np.float32(3068.3774), np.float32(3291.9346), np.float32(2973.1348), np.float32(1649.1226), np.float32(3210.9053), np.float32(1182.0205)]
2025-09-14 15:29:57,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:29:57,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 46 minutes, 6 seconds)
2025-09-14 15:36:00,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 15:36:08,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3057.70581 ± 434.280
2025-09-14 15:36:08,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3326.9753), np.float32(1782.8878), np.float32(3120.5474), np.float32(3161.4248), np.float32(3292.933), np.float32(3204.8105), np.float32(3285.172), np.float32(3176.8613), np.float32(3001.6924), np.float32(3223.7534)]
2025-09-14 15:36:08,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:36:08,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 54 minutes, 29 seconds)
2025-09-14 15:39:59,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 15:40:07,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3034.47705 ± 329.771
2025-09-14 15:40:07,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3266.3318), np.float32(2919.4077), np.float32(3057.988), np.float32(3099.6946), np.float32(3133.8965), np.float32(3241.8296), np.float32(3216.375), np.float32(3433.497), np.float32(2781.9456), np.float32(2193.8054)]
2025-09-14 15:40:07,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:40:07,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 44 minutes, 14 seconds)
2025-09-14 15:45:15,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 15:45:23,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3154.95825 ± 145.163
2025-09-14 15:45:23,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2950.5344), np.float32(3214.2815), np.float32(3028.1035), np.float32(3075.2876), np.float32(3256.7224), np.float32(3122.6604), np.float32(2954.9053), np.float32(3219.2969), np.float32(3356.548), np.float32(3371.2412)]
2025-09-14 15:45:23,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:45:23,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 43 minutes, 19 seconds)
2025-09-14 15:49:40,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 15:49:48,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3129.60693 ± 120.226
2025-09-14 15:49:48,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3176.4568), np.float32(3170.6404), np.float32(2988.0928), np.float32(3251.8164), np.float32(3249.2495), np.float32(3140.1672), np.float32(3266.7888), np.float32(2951.8918), np.float32(2931.4172), np.float32(3169.5474)]
2025-09-14 15:49:48,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:49:48,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 38 minutes, 53 seconds)
2025-09-14 15:53:42,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 15:53:50,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2992.57031 ± 506.343
2025-09-14 15:53:50,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3131.1765), np.float32(1617.7749), np.float32(2795.267), np.float32(3262.9424), np.float32(3370.4084), np.float32(3382.2305), np.float32(3411.5728), np.float32(3145.3716), np.float32(2775.0623), np.float32(3033.899)]
2025-09-14 15:53:50,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:53:50,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 28 minutes, 2 seconds)
2025-09-14 15:57:32,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 15:57:40,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2985.99341 ± 491.299
2025-09-14 15:57:40,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3054.6267), np.float32(2125.0425), np.float32(3304.383), np.float32(3361.4385), np.float32(3308.541), np.float32(3148.678), np.float32(3213.0156), np.float32(2945.7805), np.float32(3433.1528), np.float32(1965.2769)]
2025-09-14 15:57:40,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:57:40,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 9 minutes, 10 seconds)
2025-09-14 16:01:45,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 16:01:53,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3025.92920 ± 562.444
2025-09-14 16:01:53,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1453.6958), np.float32(3151.2273), np.float32(3117.8286), np.float32(2982.2), np.float32(3077.3445), np.float32(3026.956), np.float32(2971.2078), np.float32(3521.5127), np.float32(3432.3542), np.float32(3524.9648)]
2025-09-14 16:01:53,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:01:53,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 6 minutes, 17 seconds)
2025-09-14 16:05:29,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 16:05:37,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3071.23193 ± 242.338
2025-09-14 16:05:37,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3324.3762), np.float32(3398.6792), np.float32(3129.4495), np.float32(3181.041), np.float32(3247.0947), np.float32(2772.2695), np.float32(2581.3838), np.float32(2915.1562), np.float32(2992.8313), np.float32(3170.038)]
2025-09-14 16:05:37,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:05:37,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 53 minutes, 17 seconds)
2025-09-14 16:09:07,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 16:09:15,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2992.52100 ± 454.456
2025-09-14 16:09:15,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2878.768), np.float32(3240.9707), np.float32(3346.0576), np.float32(3312.9084), np.float32(2910.6013), np.float32(3246.2527), np.float32(3245.4678), np.float32(1719.3411), np.float32(2956.9978), np.float32(3067.8447)]
2025-09-14 16:09:15,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:09:15,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 45 minutes)
2025-09-14 16:13:44,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 16:13:52,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2877.84814 ± 618.915
2025-09-14 16:13:52,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3235.1365), np.float32(3020.1567), np.float32(2994.352), np.float32(2928.4329), np.float32(3055.0002), np.float32(3135.2725), np.float32(3246.6003), np.float32(2701.8745), np.float32(3363.2278), np.float32(1098.4287)]
2025-09-14 16:13:52,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:13:52,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 44 minutes, 11 seconds)
2025-09-14 16:19:18,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 16:19:26,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2972.20190 ± 260.107
2025-09-14 16:19:26,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2348.73), np.float32(2715.045), np.float32(3372.7664), np.float32(3040.4858), np.float32(3096.0413), np.float32(2953.8662), np.float32(2968.172), np.float32(3026.7876), np.float32(3151.202), np.float32(3048.923)]
2025-09-14 16:19:26,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:19:26,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 48 minutes, 50 seconds)
2025-09-14 16:23:16,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 16:23:24,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3147.62842 ± 143.758
2025-09-14 16:23:24,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3159.1025), np.float32(3208.844), np.float32(2963.4019), np.float32(3086.1384), np.float32(2987.1208), np.float32(3430.6328), np.float32(3151.3992), np.float32(2993.435), np.float32(3335.365), np.float32(3160.8438)]
2025-09-14 16:23:24,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:23:24,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 43 minutes, 16 seconds)
2025-09-14 16:27:18,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 16:27:26,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2896.06641 ± 524.911
2025-09-14 16:27:26,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3339.2373), np.float32(3201.5686), np.float32(3102.7415), np.float32(3103.8364), np.float32(2858.1062), np.float32(3209.618), np.float32(1624.3315), np.float32(3240.149), np.float32(2185.7585), np.float32(3095.3154)]
2025-09-14 16:27:26,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:27:26,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 40 minutes, 21 seconds)
2025-09-14 16:31:02,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 16:31:11,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3188.75049 ± 116.294
2025-09-14 16:31:11,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3134.4822), np.float32(3128.856), np.float32(3416.7346), np.float32(3120.7227), np.float32(3105.3694), np.float32(3418.5518), np.float32(3114.2314), np.float32(3167.37), np.float32(3169.7156), np.float32(3111.4739)]
2025-09-14 16:31:11,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:31:11,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 36 minutes, 28 seconds)
2025-09-14 16:34:38,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 16:34:46,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3066.53003 ± 363.851
2025-09-14 16:34:46,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3209.0952), np.float32(3231.7314), np.float32(3354.019), np.float32(2962.0835), np.float32(3139.2415), np.float32(3226.6743), np.float32(3278.9385), np.float32(3215.6013), np.float32(3021.7803), np.float32(2026.1321)]
2025-09-14 16:34:46,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:34:46,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 27 minutes, 46 seconds)
2025-09-14 16:38:36,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 16:38:44,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2925.08716 ± 376.597
2025-09-14 16:38:44,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3023.8706), np.float32(2755.5232), np.float32(3007.581), np.float32(3283.0266), np.float32(3185.181), np.float32(2921.9307), np.float32(3163.706), np.float32(1903.9956), np.float32(3183.921), np.float32(2822.136)]
2025-09-14 16:38:44,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:38:44,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 17 minutes, 13 seconds)
2025-09-14 16:42:38,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 16:42:46,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3179.78198 ± 151.158
2025-09-14 16:42:46,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2991.328), np.float32(3091.714), np.float32(3462.0798), np.float32(3132.4468), np.float32(3087.353), np.float32(3045.0889), np.float32(3328.7598), np.float32(3061.6118), np.float32(3376.6516), np.float32(3220.7847)]
2025-09-14 16:42:46,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:42:46,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 13 minutes, 36 seconds)
2025-09-14 16:48:13,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 16:48:21,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3025.47241 ± 570.414
2025-09-14 16:48:21,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1347.2922), np.float32(3144.963), np.float32(3115.828), np.float32(3177.408), np.float32(2999.8933), np.float32(3350.9556), np.float32(3366.2654), np.float32(3146.66), np.float32(3321.9937), np.float32(3283.4675)]
2025-09-14 16:48:21,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:48:21,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 15 minutes, 19 seconds)
2025-09-14 16:53:39,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 16:53:47,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3134.06958 ± 83.808
2025-09-14 16:53:47,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3136.643), np.float32(3130.755), np.float32(3227.8657), np.float32(3183.5964), np.float32(3057.5244), np.float32(3125.7915), np.float32(3275.1543), np.float32(3125.274), np.float32(2952.4412), np.float32(3125.6511)]
2025-09-14 16:53:47,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:53:47,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 16 minutes, 52 seconds)
2025-09-14 16:57:46,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 16:57:54,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2944.91968 ± 590.298
2025-09-14 16:57:54,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3175.2336), np.float32(3099.0732), np.float32(3102.258), np.float32(3118.3171), np.float32(3217.1333), np.float32(3231.2305), np.float32(2975.8594), np.float32(3257.2957), np.float32(1190.124), np.float32(3082.6692)]
2025-09-14 16:57:54,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:57:54,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 14 minutes, 2 seconds)
2025-09-14 17:02:14,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 17:02:22,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3071.23291 ± 191.375
2025-09-14 17:02:22,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3255.1128), np.float32(2992.3806), np.float32(3240.6848), np.float32(3036.9863), np.float32(3286.466), np.float32(3086.5383), np.float32(2986.085), np.float32(3022.1855), np.float32(3205.8232), np.float32(2600.0664)]
2025-09-14 17:02:22,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:02:22,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 10 minutes, 53 seconds)
2025-09-14 17:06:15,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 17:06:22,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3117.58252 ± 187.541
2025-09-14 17:06:22,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3029.809), np.float32(3014.4402), np.float32(3138.7812), np.float32(2641.8083), np.float32(3124.447), np.float32(3274.956), np.float32(3316.9092), np.float32(3287.2173), np.float32(3114.8262), np.float32(3232.6335)]
2025-09-14 17:06:22,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:06:22,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 6 minutes, 5 seconds)
2025-09-14 17:10:41,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 17:10:50,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3071.73389 ± 294.833
2025-09-14 17:10:50,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3189.7239), np.float32(3074.9475), np.float32(3205.281), np.float32(2978.0068), np.float32(3248.1494), np.float32(2230.7554), np.float32(3138.4194), np.float32(3338.0679), np.float32(3171.558), np.float32(3142.4297)]
2025-09-14 17:10:50,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:10:50,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 58 minutes, 25 seconds)
2025-09-14 17:15:08,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 17:15:16,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3165.25635 ± 147.515
2025-09-14 17:15:16,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3207.6633), np.float32(3155.2512), np.float32(3205.477), np.float32(3308.015), np.float32(3237.0662), np.float32(2909.38), np.float32(2993.7126), np.float32(3036.5576), np.float32(3442.7778), np.float32(3156.6624)]
2025-09-14 17:15:16,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:15:16,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 51 minutes, 33 seconds)
2025-09-14 17:19:17,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 17:19:25,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3134.91797 ± 124.352
2025-09-14 17:19:25,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3209.7476), np.float32(2933.9382), np.float32(3191.653), np.float32(2983.8005), np.float32(3315.8894), np.float32(3261.484), np.float32(3130.2031), np.float32(2982.5068), np.float32(3106.3113), np.float32(3233.6467)]
2025-09-14 17:19:25,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:19:25,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 47 minutes, 18 seconds)
2025-09-14 17:22:57,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 17:23:05,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3147.01758 ± 145.118
2025-09-14 17:23:05,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3209.2766), np.float32(3027.6047), np.float32(3303.9338), np.float32(3144.2346), np.float32(3347.444), np.float32(3066.723), np.float32(2924.533), np.float32(3363.2644), np.float32(3004.7505), np.float32(3078.4111)]
2025-09-14 17:23:05,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:23:05,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 41 minutes, 25 seconds)
2025-09-14 17:26:57,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 17:27:05,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3167.56104 ± 159.506
2025-09-14 17:27:05,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3199.2795), np.float32(3299.44), np.float32(3155.4485), np.float32(3278.3228), np.float32(3413.9646), np.float32(2808.525), np.float32(3240.8035), np.float32(3005.9414), np.float32(3119.2275), np.float32(3154.6553)]
2025-09-14 17:27:05,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:27:05,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 37 minutes, 16 seconds)
2025-09-14 17:32:04,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 17:32:12,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3020.03223 ± 451.547
2025-09-14 17:32:12,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1700.4259), np.float32(3182.9329), np.float32(3048.7236), np.float32(3219.4868), np.float32(3251.812), np.float32(3336.7563), np.float32(3004.4883), np.float32(3208.9397), np.float32(3221.1575), np.float32(3025.5974)]
2025-09-14 17:32:12,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:32:12,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 34 minutes, 11 seconds)
2025-09-14 17:36:41,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 17:36:49,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3084.06543 ± 501.779
2025-09-14 17:36:49,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3162.3008), np.float32(3265.6973), np.float32(3404.6892), np.float32(3222.2944), np.float32(1639.3806), np.float32(3370.4229), np.float32(2902.786), np.float32(3208.5403), np.float32(3242.5076), np.float32(3422.0354)]
2025-09-14 17:36:49,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:36:49,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 30 minutes, 9 seconds)
2025-09-14 17:41:56,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 17:42:04,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3132.05762 ± 421.784
2025-09-14 17:42:04,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3255.413), np.float32(3153.742), np.float32(3336.6108), np.float32(3223.9917), np.float32(3194.7373), np.float32(3065.6052), np.float32(3361.0093), np.float32(1910.7504), np.float32(3350.6396), np.float32(3468.0754)]
2025-09-14 17:42:04,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:42:04,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 27 minutes, 10 seconds)
2025-09-14 17:46:04,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 17:46:12,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3106.54419 ± 166.737
2025-09-14 17:46:12,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2874.6487), np.float32(2878.9724), np.float32(2952.487), np.float32(3062.953), np.float32(3244.9229), np.float32(3127.7366), np.float32(3220.141), np.float32(3203.7698), np.float32(3428.447), np.float32(3071.363)]
2025-09-14 17:46:12,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:46:12,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 23 minutes, 6 seconds)
2025-09-14 17:49:53,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 17:50:01,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3180.30640 ± 96.512
2025-09-14 17:50:01,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3244.418), np.float32(3168.9817), np.float32(3182.5198), np.float32(3138.7886), np.float32(3355.0488), np.float32(3096.6177), np.float32(3092.4429), np.float32(3339.1084), np.float32(3118.1323), np.float32(3067.0063)]
2025-09-14 17:50:01,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:50:01,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 18 minutes, 21 seconds)
2025-09-14 17:54:22,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 17:54:30,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3054.42236 ± 402.276
2025-09-14 17:54:30,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2955.636), np.float32(2863.1262), np.float32(3287.5535), np.float32(1941.3293), np.float32(3204.1792), np.float32(3224.749), np.float32(3284.098), np.float32(3247.7542), np.float32(3419.3445), np.float32(3116.4553)]
2025-09-14 17:54:30,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:54:30,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 13 minutes, 22 seconds)
2025-09-14 17:58:43,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 17:58:51,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3047.97144 ± 609.260
2025-09-14 17:58:51,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3132.1646), np.float32(3145.7144), np.float32(3021.7615), np.float32(3304.7102), np.float32(3104.5652), np.float32(3464.7153), np.float32(3308.1), np.float32(3334.8474), np.float32(1264.9336), np.float32(3398.2034)]
2025-09-14 17:58:51,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:58:51,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 8 minutes, 49 seconds)
2025-09-14 18:02:59,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 18:03:07,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 3182.11670 ± 174.451
2025-09-14 18:03:07,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3116.7773), np.float32(3205.65), np.float32(3452.226), np.float32(3392.4592), np.float32(3004.9478), np.float32(2904.9263), np.float32(3355.5933), np.float32(3138.1885), np.float32(3258.5771), np.float32(2991.8184)]
2025-09-14 18:03:07,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:03:07,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 12 seconds)
2025-09-14 18:07:03,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 18:07:11,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2386.64624 ± 1028.577
2025-09-14 18:07:11,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3254.8193), np.float32(1225.0199), np.float32(3188.989), np.float32(3198.4421), np.float32(1170.5363), np.float32(917.74146), np.float32(3165.5332), np.float32(1213.499), np.float32(3329.0664), np.float32(3202.8171)]
2025-09-14 18:07:11,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:07:11,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1251 [DEBUG]: Training session finished
